Some thoughts on language grounding, especially choosing words to express data, and how this depends on context.
Unfortunately I suspect many researchers make their results looks better by using poor baselines. I give some thoughts on this, based on a recent discussion with a PhD student.
Some thoughts about when I feel comfortable being a coauthor on a paper, expressed as a letter to someone who put me on a paper as a co-author without asking me frst,
Some musings on principled and theoretically sound techniques for automatically evaluating NLG systems.
My advice on how to perform a high-quality validation study, which assesses whether a metric (such as BLEU) correlates well with human evaluations.
BLEU works much better for MT systems and NLG systems. In this blog I present some speculations as to why this is the case.
My structured survey of BLEU suggests that BLEU-human correlations are worse in German than in many other languages. But there are many caveats, so we need to be cautious in interpreting this result.