A colleague asked me if it was true that building neural NLG systems was faster than building rule-based NLG systems. The answer is that we dont know, because we dont have good data on this question. However the weak evidence we do have suggests that building rules-based NLG is no slower and may be faster than building neural NLG, at least for data-to-text systems.
I’d love to see more people using machine learning to provide insights about NLG problems and related linguistic issues. I personally think this is much more useful than tweaking models to show a 1% increase in state-of-art in a very artificial context.
NLP in 2020 is dominated by papers which report small improvements in state-of-art. I suspect that a lot of these improvements are due to overfitting test data, not to genuine scientific advances.
There is a military saying that “amateurs discuss tactics, professionals discuss logistics”. Similarly I think AI professionals should focus on data more than models. I suggest four simple initial questions to ask about your data if you want to build an ML system.
I really liked Grishman’s recent paper on 25 years of research in information extraction, and summarise a few of the key insights here, about relative progress in different areas of NLP, reluctance of researchers to use complex evaluation techniques, and corpus creation vs rule-writing.
When we try to use ML in commercial NLG contexts, one of the challenges is that NLG developers want to be able to customise, configure, and control their systems. So we need ML approaches which do not stop devs from configuring things they are likely to want to change.
Unfortunately, I see many students (and indeed other people) make some basic mistakes when evaluating machine learning, for classifiers as well as NLG.
An important difference between different approaches to building NLG systems is the skills needed to use these approaches to build systems. Machine learning requires the most skills, smart templating the least, and simplenlg-type programmatic approaches are in the middle.
In both NLG and MT contexts, deep learning approaches can result in texts which are fluent and readable but also incorrect and misleading. This is problematical if accuracy is more important than readability, as is the case in most NLG contexts.
Many neural NLG systems “hallucinate” non-existent or incorrect content. This is a major problem, since such hallucination is unacceptable in many (most?) NLG use cases. Also BLEU and related metrics do not detect hallucination well, so researchers who rely on such metrics may be misled about the quality of their system.