My father died recently, and I spoke at his funeral about a trip we took together to Baja California (Mexico) when I was 11 years old.
I am concerned that some people seem to ignore quality issues in training data.
What are the ethical issues when academics do A/B testing?
The first phase of my systematic review of BLEU shows that BLEU-human correlations are all over the place, and that none of the studies in my review have correlated BLEU with real-world utility or user satisfaction.
Some obervations on how people react to NLG systems (which is a very different issue than scientific evaluation).
What happens if we think of evaluations as a way of helping users choose the best NLP tech for their needs?
A summary of students who have gotten PhDs under my supervision.