The correlation between BLEU and human evaluations of MT systems seems to be increasing over time. Since BLEU has not changed, how is this possible, and what does it mean?
In response to a previous blog, many people expressed concerns to me about the quality of many papers they saw on ML in NLP. I summarise some of these concerns, which are worrying.
I was recently asked if machine learning requires evaluation metrics. The answer is no, and the fact that people are asking such questions suggests that some newcomers to the field may have a limited perspective on NLP research methodology.
Some comments on how different components in the NLG pipeline can “add value” by contributing to the ultimate goal of generating texts that easy for people to read and understand.
I think surface realisation becomes especially challenging when syntax depends on semantics or pragmatics. From engineering perspective, handling phenomena that only occur in a few languages can be painful.
Many students get stressed about their PhD viva (oral exam) even though they are very unlikely to fail. I present some rules and a flowchart to suggest when there is real cause for concern, and when there is not.
A few comments on how I review papers (what I actually do, not what I am supposed to do), and associated advice for authors.