Reviewing for big NLP conferences has changed drastically since 1990, when 11 senior researchers reviewed all ACL submissions. Perhaps our expectations about conference papers also need to change, and become more similar to expectations in other scientific fields.
Many people have asked me if OpenAI’s GPT3 will have a big impact on NLG. I suspect its overall impact will be limited (outside of a few niches), but of course time will tell.
I was very impressed by a paper we recently read in our reading group, which showed that small differences in BLEU scores for MT usually dont mean anything. Since lots of academic papers justify a new model on the basis of such small differences, this is a real problem for NLP.
NLG texts need to communicate good content as well as be accurate. Rule-based NLG systems are very good at accuracy, but sometimes struggle to reliably choose appropriate content in a wide variety of circumstances.
Most reviewing is a chore, but reviewing for TACL is fun. I learn things and feel I “add value”, which is much rarer in conference reviewing. Plus I can focus on one paper at a time, since TACL reviewing is spread out across the year.
If an NLG system produces inferior texts once in a while, should we ask a human writer to “post-edit” NLG texts? I review some of the literature and give some advice.
The Tibco Covid dashboard is a nice example of how NLG narratives can “add value” to complex visualisations. Hopefully we’ll see more dashboards like this!
A colleague asked me if it was true that building neural NLG systems was faster than building rule-based NLG systems. The answer is that we dont know, because we dont have good data on this question. However the weak evidence we do have suggests that building rules-based NLG is no slower and may be faster than building neural NLG, at least for data-to-text systems.
Accuracy errors in NLG texts go far beyond simple factual mistakes, for example they also include misleading use of words and incorrect context/discourse inferences. All of these types of errors are unacceptable in most data-to-text NLG use cases.
The Covid crisis gives us a chance to rethink and change our conference culture. I would like to see fewer large international conferences, and also have these focus on discussion and interaction rather than on oral presentations.