Another academic (not an NLG expert) recently emailed me and asked “how do I build a simple NLG system” (in her case, to summarise the results of simulations). Good question, and one I have been asked many times.Lets start by looking at tools.
Commercial NLG Tools
The best NLG toolkit on the planet is Arria’s Articulator Pro. Of course I may be biased, because I helped to build Articulator Pro… It covers the complete data-to-text pipeline (including analytics as well as NLG); has a plugin architecture so its easy to add in new algorithms and modules; is supported by extensive documention, tutorials, and training material; has been used, refined, and debugged for years within Arria before being sold as a toolkit; and includes an Eclipse plugin for authoring. Enough, you probably get the idea! I am not an unbiased observer, but I do absolutely think that Articulator Pro is a great toolkit and the best way to build an NLG system in 2017.
Unfortunately for academics and students, Articulator Pro is currently only available on a commercial license at commercial pricing. Perhaps this will change in the future, if so I will let people know.
A number of Arria’s competitors also offer NLG toolkits, but I dont have first-hand experience with these. My impression is that Automated Insights Wordsmith is more a souped-up version of a mail-merge tool than what I would consider to be an NLG toolkit. I *think* Yseop Compose has a bit more NLG functionality, and Ax Semantics seem to have some linguistics in its offering. But this is all speculation since I havent actually tried any of these tools. If any of the above-mentioned companies make their tool available to me, I promise to honestly review it in this blog. As far as I know, all of these tools (like Articulator Pro) are currently only available on a commercial license.
Open-Source NLG Tools
Of course academics, students, and indeed many companies prefer open-source tools for building systems. And there are quite a few good open-source toolkits available for building NLP systems, including NLTK and GATE.
If we turn to NLG, there are a number of open-source surface realisers available, including simplenlg, openccg, and KPML. Simplenlg is probably the most widely used open-source realiser, especially by system-builders (as opposed to NLG researchers); it has the least functionality but also is the easiest to use and best documented. I should again warn people that I may be biased about simplenlg (as well as Articulator Pro), since I created the original versions of simplenlg. But I’m still pretty sure that simplenlg is the most widely used open-source realiser.
There are also several “lightweight NLG” packages which do things like morphological generation, such as inflect and inflection, which can be invoked from scripting languages such as Python or Ruby.
Of course what a lot of people want is a complete NLG system/framework which they can configure to their needs, without doing much programming. The closest I have seen to this in the open-source world is NaturalOWL, which can be used to generate descriptions of OWL classes and individuals.
There is a lot of interest in the research community in statistical NLG, but I am not aware of any open-source statistical NLG toolkit which has been widely used outside the group which created it. If I am wrong about this, let me know and I will update this blog! The one partial exception is openccg, which supports statistical language models.
We are starting to see good commercial tools for building NLG systems, such as Arria’s Articulator Pro, but open-source NLG tools are limited.