BLEU in Different Languages: Dont use it for German

Jun 20, 2018Aug 7, 2018 ehudreiter1 Comment

My structured survey of BLEU suggests that BLEU-human correlations are worse in German than in many other languages. But there are many caveats, so we need to be cautious in interpreting this result.

Uncategorized

BLEU-Human Correlation is Increasing: What does this Mean?

Jun 14, 2018Aug 7, 2018 ehudreiter6 Comments

The correlation between BLEU and human evaluations of MT systems seems to be increasing over time. Since BLEU has not changed, how is this possible, and what does it mean?

Uncategorized

Is BLEU valid? First observations and concerns

Aug 8, 2017 ehudreiter2 Comments

The first phase of my systematic review of BLEU shows that BLEU-human correlations are all over the place, and that none of the studies in my review have correlated BLEU with real-world utility or user satisfaction.

Uncategorized

Study Design for Systematic Review of BLEU Validity: Comments Welcome!

Jun 13, 2017Jun 13, 2018 ehudreiter5 Comments

I’m planning to do a systematic review of the validity of BLEU, and am very keen to get comments and suggestions on study design from others!

Ehud Reiter's Blog

Ehud's thoughts about Natural Language Generation. Also see my book on NLG.

Tag: systematic review

BLEU in Different Languages: Dont use it for German

BLEU-Human Correlation is Increasing: What does this Mean?

Is BLEU valid? First observations and concerns

Study Design for Systematic Review of BLEU Validity: Comments Welcome!