I’d love to see more people using machine learning to provide insights about NLG problems and related linguistic issues. I personally think this is much more useful than tweaking models to show a 1% increase in state-of-art in a very artificial context.
NLP technology has changed and advanced over the past two decades, but it often seems that NLG evaluation has not. Why is the 18-year old BLEU metric still so dominant?
We’re thinking of organising a shared task on evaluating the accuracy of texts produced by NLG systems. Comments welcome, also let me know if you might participate.
NLP in 2020 is dominated by papers which report small improvements in state-of-art. I suspect that a lot of these improvements are due to overfitting test data, not to genuine scientific advances.
If we want to deploy AI in the real world, we need to think about “change management” issues. Eg if users think that AI threatens their jobs or adds extra hassle, then uptake will be slow. This has been a problem for AI and statistical algorithms since the 1950s.
There is a military saying that “amateurs discuss tactics, professionals discuss logistics”. Similarly I think AI professionals should focus on data more than models. I suggest four simple initial questions to ask about your data if you want to build an ML system.
I really liked Grishman’s recent paper on 25 years of research in information extraction, and summarise a few of the key insights here, about relative progress in different areas of NLP, reluctance of researchers to use complex evaluation techniques, and corpus creation vs rule-writing.