I really liked Grishman’s recent paper on 25 years of research in information extraction, and summarise a few of the key insights here, about relative progress in different areas of NLP, reluctance of researchers to use complex evaluation techniques, and corpus creation vs rule-writing.
When we try to use ML in commercial NLG contexts, one of the challenges is that NLG developers want to be able to customise, configure, and control their systems. So we need ML approaches which do not stop devs from configuring things they are likely to want to change.
In response to a previous blog, many people expressed concerns to me about the quality of many papers they saw on ML in NLP. I summarise some of these concerns, which are worrying.
People who use corpora to build NLG systems need to understand what is in the corpora. The widely used Weathergov corpus, for example, probably contains computer-generated texts rather than human-written texts. So learning from it is essentially reverse-engineering a rule-based NLG system.