Texts produced by NLG systems can be evaluated in terms of accuracy (content is correct), fluency (text is readable), and utility (text is useful). I discuss these three “dimensions” of NLG evaluation.
When we try to use ML in commercial NLG contexts, one of the challenges is that NLG developers want to be able to customise, configure, and control their systems. So we need ML approaches which do not stop devs from configuring things they are likely to want to change.
Most research software does not enter everyday operational use. In part because research projects usually do not worry about issues such as maintainability, regulatory approval, and change management, which are essential to the long-term success of commercial software.
Farewell to Richard Kittredge, who died in early April 2019. Richard was a pioneer in applied NLG, and also an inspiration to me personally.
An important difference between different approaches to building NLG systems is the skills needed to use these approaches to build systems. Machine learning requires the most skills, smart templating the least, and simplenlg-type programmatic approaches are in the middle.
Perhaps the most common reason for bad NLG output texts is low-quality input data. Ie, Garbage In, Garbage Out is true regardless of our technology.
I am now chair of ACL SIGGEN. I hope SIGGEN can help the NLG community by encouraging high-quality scientific research, strengthening interaction with the non-NLP world, and providing trusted unbiased information about NLG.
From a commercial perspective, I think NLG is currently most successful in financial reporting. Although of course there are many great NLG applications in other sectors!
Some thoughts on how to vary words in NLG text. This is aimed at practioners who are building NLG systems, not researchers.
Some musings on principled and theoretically sound techniques for automatically evaluating NLG systems.