I am currently preparing some educational material for Arria on building NLG systems (video and written). I can’t give details about this here (as usual for my Arria work), but one thing I have noticed is that I am spending a lot of time talking about handling edge cases. That is, trying to ensure that NLG systems can robustly deal with unusual, boundary, and special cases.
Of course, this is hardly unusual in software. The 80-20 (Pareto) principle says that 80% of cases are handled by 20% of the code. The corollary of this is that 80% of the code deals with the remaining 20% of cases. In other words, most of the engineering effort in building a software system goes into dealing with unusual cases. And this is as true of NLG as of any other software artefact.
Assuming of course that we want the system to be robust. If we only want our system to work in 80% of cases, then we may not need to worry about edge cases. On the other hand, if we want our system to work in 99% of cases, we are going to have to put a lot of effort into ensuring that our system handles edge cases correctly. And if we want it to work correctly in 99.99% of cases, then almost all of our effort will be focused on dealing with edge cases.
The target reliability (80%, 99%, 99.99%) of course depends on the application (eg, news summarisation vs medical decision-support). It also depends on whether a person can be expected to detect and fix problems in our system’s output (eg, is our NLG system producing final texts which get sent to users, or drafts which are edited by a human author). Last but not least, reliability targets for reusable components are usually higher than for systems. For example, if we want to build a 99% reliable system from 10 reusable components, then each of these components will need to be 99.9% reliable.
Building NLG Systems that Handle Edge Cases
But anyways, whatever the target reliability is, how do we ensure that our system works on sufficient edge cases to enable it to reach this target? I think its easiest to think about this in terms of Requirements, Design, Implementation, Testing, and Support (ie, the classic stages of software development).
Requirements: Ideally, at the requirements stage we will identify the edge cases we need to deal with in order to reach our reliability target, and (equally important) the edge cases we do not need to support. If we have a large representative corpus of target text outputs and/or data inputs, we can do this empirically. If we dont have such a corpus, or if we have a corpus but suspect it may not be representative of real usage, then we will need to make an informed guess as to what edge cases we need to cover (and be prepared to change this spec if our guess turns out to be optimistic).
Design: The system should be designed to cope with edge cases naturally wherever possible, instead of needing special coding. NLG technology can help with this. To take a simple example, the statement “Sales increased in N countries” has two obvious edge cases (N=0 and N=1), and may also have an edge case for small numbers which should be spelled out (eg, “two” instead of “2”). Good NLG technology should handle this kind of thing automatically, without needing special coding.
Implementation: Its hard to make generic comments about implementing code for edge cases, becase it depends on your NLG implementation framework.
Testing: We need to test the NLG system to make sure that it does indeed handle the edge cases we identified in requirements! There is nothing magic here, we basically need good testing procedures and tools (ideally NLG-specific testing tools) which verify that our system produces correct output for the edge cases (and indeed in general).
Support: Once our system is live, we will probably get bug/defect reports if people are seriously using it, and many of these will be based on improper handling of edge cases. Of course, in real-world software engineering, we don’t fix all of the bugs, we prioritise and fix the most important bugs. From this perspective, we may decide not to fix rare edge cases, unless there is a clear commercial reason (eg, a major client is seriously unhappy about the way a particular edge case is handled, even if it is rare).
Building AI Systems that Handle Edge Cases
Most of what I said above applies to all AI systems, not just NLG systems. From this perspective, I must say that I am surprised to see so little discussion about handling edge cases in the AI literature and community. Note that machine learning is not a magic bullet that solves this problem. ML can be used for implementation if we have appropriate training data, but requirements, design, testing, and support still need to be done, and above issues will need to dealt with.