In response to a previous blog, many people expressed concerns to me about the quality of many papers they saw on ML in NLP. I summarise some of these concerns, which are worrying.
People who use corpora to build NLG systems need to understand what is in the corpora. The widely used Weathergov corpus, for example, probably contains computer-generated texts rather than human-written texts. So learning from it is essentially reverse-engineering a rule-based NLG system.