I’m just back from INLG 2019 in Tokyo, where I was very happy to see an increased emphasis on evaluation (and other methodological issues), including several papers on improving human evaluations.
Someone recently asked me for details of an experiment I did 12 years ago, and it was not easy to get this information, because I had not properly archived it. Lesson: properly archive detailed information about experimental design, material, results, etc.
An approach I often take to NLG and indeed AI is to try to understand underying linguistic, NLG, and AI issues, and then to look for simple solutions to these issues.
In response to a previous blog, many people expressed concerns to me about the quality of many papers they saw on ML in NLP. I summarise some of these concerns, which are worrying.
I was recently asked if machine learning requires evaluation metrics. The answer is no, and the fact that people are asking such questions suggests that some newcomers to the field may have a limited perspective on NLP research methodology.
Good software engineering is criticial when building NLG systems, including requirements analysis, design, testing, and support.