Testing Multiple Hypotheses
The NLP/AI community needs to do a better job of dealing with multiple hypotheses, otherwise a lot of our results will be garbage.
The NLP/AI community needs to do a better job of dealing with multiple hypotheses, otherwise a lot of our results will be garbage.
I think we should use rules to make simple high-value decisions, and learning to make complex low-value decisions, within an architecture where ML decision makers are embedded in a rules-based framework.
Imformation graphics work well for domain experts. but they are not nearly as useful for junior professionals. And the “man in the street” may struggle to understand anything more complex than a simple bar chart.
My father died recently, and I spoke at his funeral about a trip we took together to Baja California (Mexico) when I was 11 years old.
I am concerned that some people seem to ignore quality issues in training data.
What are the ethical issues when academics do A/B testing?
The first phase of my systematic review of BLEU shows that BLEU-human correlations are all over the place, and that none of the studies in my review have correlated BLEU with real-world utility or user satisfaction.
Some obervations on how people react to NLG systems (which is a very different issue than scientific evaluation).
What happens if we think of evaluations as a way of helping users choose the best NLP tech for their needs?
A summary of students who have gotten PhDs under my supervision.