Do people “cheat” by overfitting test data
NLP in 2020 is dominated by papers which report small improvements in state-of-art. I suspect that a lot of these improvements are due to overfitting test data, not to genuine scientific advances.
NLP in 2020 is dominated by papers which report small improvements in state-of-art. I suspect that a lot of these improvements are due to overfitting test data, not to genuine scientific advances.
If we want to deploy AI in the real world, we need to think about “change management” issues. Eg if users think that AI threatens their jobs or adds extra hassle, then uptake will be slow. This has been a problem for AI and statistical algorithms since the 1950s.
There is a military saying that “amateurs discuss tactics, professionals discuss logistics”. Similarly I think AI professionals should focus on data more than models. I suggest four simple initial questions to ask about your data if you want to build an ML system.
I really liked Grishman’s recent paper on 25 years of research in information extraction, and summarise a few of the key insights here, about relative progress in different areas of NLP, reluctance of researchers to use complex evaluation techniques, and corpus creation vs rule-writing.
The BBC used Arria NLG to generate stories about the recent UK election. In this application, texts communicated a meaning, there was no corpus, accuracy was paramount, and domain experts wanted to control the system. Most applied NLG systems I have worked on have had similar constraints.
I’ve spent much of the past few weeks marking, but nonetheless was unable to give my students detailed feedback and critiques, My apologies!
I’m looking for a PhD student to work on explaining Bayesian Reasoning, as part of the NL4XAI project. Should be a great project!
There are lots of opportunites in Aberdeen for people interested in NLG, including faculty positions and PhD studentships at the university, and commercial software development jobs at Arria. Come join me and my colleagues!
I’m just back from INLG 2019 in Tokyo, where I was very happy to see an increased emphasis on evaluation (and other methodological issues), including several papers on improving human evaluations.
It can be very exciting to apply powerful analytics and ML techniques to analyse data sets, but we need to be careful, otherwise we will make mistakes.