Unfortunately, I see many students (and indeed other people) make some basic mistakes when evaluating machine learning, for classifiers as well as NLG.
15 years ago, I siad a grand challenge for CS/AI./NLG was to help the general public effectively understand and use data. Progress on this has been less than I hoped, but this remains a worthwhile and important challenge!
Farewell to Richard Kittredge, who died in early April 2019. Richard was a pioneer in applied NLG, and also an inspiration to me personally.
An important difference between different approaches to building NLG systems is the skills needed to use these approaches to build systems. Machine learning requires the most skills, smart templating the least, and simplenlg-type programmatic approaches are in the middle.
Perhaps the most common reason for bad NLG output texts is low-quality input data. Ie, Garbage In, Garbage Out is true regardless of our technology.
Someone recently asked me for details of an experiment I did 12 years ago, and it was not easy to get this information, because I had not properly archived it. Lesson: properly archive detailed information about experimental design, material, results, etc.
I was recently asked by someone if it was possible to easily determine whether an NLP system was good enough for a specific use case. Currently this is very hard. Making it easy could be a “grand challenge” for evaluation!