How do I Learn about NLG?

People occasionally ask me how best to learn about NLG, so I thought I’d write some general advice about this.

Basic NLG concepts

I like to think that the Wikipedia pages on NLG are a good starting point.  I wrote most of the top-level Wikipedia NLG page, and almost all of the second-level pages (realisation, lexical choice, etc).   Over the years I have generally gotten quite good feedback from people who read these pages.

Many people still read my book Building Natural Language Generation systems, despite the fact that it is dated (published in 2000, so almost 20 years old).   I would not recommend reading it in detail since so much has changed, but some people have told me that they found it useful as an introduction to high-level NLG concepts.

Things to read

There are a number of recent surveys of academic research in NLG, some of which are pretty bad.  By far the best such survey is the one by Gatt and Krahmer.  It is accurate and comprehensive (unlike some other recent surveys of NLG I have seen), and written by very well-established and respected NLG researchers.   However, it is 74 pages long (excluding references) and not always easy to read, especially for newcomers to the field.

I wish I could point people to an up-to-date summary and introduction to NLG which was longer and more comprehensive than the Wikipedia NLG pages, but shorter and more accessible than the Gatt and Krahmer survey.  Unfortunately, there is nothing along these lines which I can wholeheartedly recommend.   If anyone has a suggestion for such a resource, please let me know.

There are a growing number of business-oriented articles about NLG (some of which even quote or interview me), but I am not aware of any such article which gives a comprehensive and unbiased survey of commercial NLG.  However there are some reasonable surveys of NLG in specific application areas such as automated journalism.

Things to play with

Of course developers and researchers like to play with code and systems, as well as read about algorithms and ideas!  Unfortunately there is not a lot of open-source NLG software.  Probably the mostly widely used open-source NLG software is the simplenlg realiser.  This comes with a tutorial, and seems reasonably accessible to newcomers to the field.  However it only covers surface realisation.

Many NLG vendors provide free access to some of their tools, for a limited trial period and/or usage amount.  For example, Arria provides free access for a one-month trial period to its Studio tool for NLG developers, and also to its NLG realisation microservices.

The latest thinking

Probably the best source for the latest research on NLG is the proceedings of the International NLG conferences, which are available in the ACL Anthology.  There are also some NLG papers published in the main ACL conferenced (ACL, EACL, NAACL), but they tend to focus on machine learning, and some of these papers are really bad.

So, how should I learn about NLG?

If you’re an academic researcher, I suggest that you

  1. read through the NLG wikipedia pages
  2. download simplenlg and work through its tutorial, to get some hands-on experience
  3. read the Gatt and Krahmer review
  4. have a look at the topics covered in recent INLG conferences.
  5. if possible, discuss your interests with someone who is knowledgeable about NLG.  Alternatively, attend an INLG conference and discuss your interests there

If you are interested in NLG from a commercial perspective, I suggest that you

  1. read through the NLG wikipedia pages
  2. read a survey of NLG in your field, if you can find one
  3. try out tools from Arria or other NLG vendors


One thought on “How do I Learn about NLG?

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s