I am concerned that some people seem to ignore quality issues in training data.
What are the ethical issues when academics do A/B testing?
The first phase of my systematic review of BLEU shows that BLEU-human correlations are all over the place, and that none of the studies in my review have correlated BLEU with real-world utility or user satisfaction.
Some obervations on how people react to NLG systems (which is a very different issue than scientific evaluation).
What happens if we think of evaluations as a way of helping users choose the best NLP tech for their needs?
A summary of students who have gotten PhDs under my supervision.
A visitor recently asked me what NLG research topics I found most interesting and exciting. Great question, I’ve written here an expanded version of what I told him.