One of the organisers of the INLG2017 conference recently asked me if I would be willing to participate in a possible panel or discussion session on open-source NLG software (presumably because of my experience with simplenlg). He also asked me to suggest other possible panelists who had developed successful open-source NLG software (which I took to mean software which was used beyond the developer’s own research group). I thought about this, and it was disheartening how few such people and projects came to mind, especially if I discounted systems which seemed to be “dead” in the sense of no longer being maintained, such as FUF/SURGE and TG/2. It was also striking that most of the open-source systems I could think of where realisers. For example, despite decades of research on referring expression generation in NLG, I couldnt think of a single open-source referring-expression generator which met the above criteria. Lots of code was been written to generate referring expressions, but none of it (to my knowledge) has become successful open-source software. Why not?
Too much effort?
Creating open-source software is a lot of work. As I learnt with simplenlg, turning something we used internally at Aberdeen into a tool which others could use required a lot of extra effort to make the tool more flexible and robust, and of course better-documented. Plus a fair amount of continuing work to support users who had problems or questions, and who wanted bugs to be fixed. So why should a developer do this for something which he gives away for free? Good question, but this hasnt stopped people from writing effective open-source software in other areas.
More commercial NLG developers needed?
Over most of its history NLG has been dominated by academics, with very little commercial work. I suspect commercial developers are more likely to develop good open-source software, because they have a better background in software engineering, and also because of CV/career benefits. If a developer is applying for a job, its really helpful to list open-source software he or she has worked on; I’ve never seen this mentioned in an academic job interview. Anyways, there is a lot more commercial interest in NLG these days, lets see if it leads to more open-source NLG software. A related point is that some companies effectively pay their staff to work on open-source software which the company uses; again hopefully commercial interest in NLG will encourage this kind of thing.
More awareness of open-source needed?
There are in fact a number of NLG-related projects on GitHub and other open-source platforms (including for referring expression generation), but few of them are widely used. Of course, this is probably true of most Github projects! But I wonder if more of these projects would be used if people were more aware of them. The NLG community is pretty bad at letting its members know about software resources; the closest thing I’ve seen to a community resource listing is the ACL NLG Resources Portal, which is years out of date. Lack of awareness is something the community can do something about, by publicising resources better (maybe I should do this in a future blog entry??)
The optimistic view is that we’ll see more open-source NLG software in the future, as the fields grows and becomes more commercial, especially if the community supports this by doing a better job of publicising open-source. Hopefully this will happen!
Appendix: Some open-source NLG systems which I like
- KPML (http://www.fb10.uni-bremen.de/anglistik/langpro/kpml/README.html
- Openccg (https://github.com/OpenCCG/openccg)
- Simplenlg (https://github.com/simplenlg/simplenlg)
- NaturalOwl (http://nlp.cs.aueb.gr/software.html)