Skip to content

Ehud Reiter's Blog

Ehud's thoughts and observations about Natural Language Generation

  • Home
  • Blog Index
  • About
  • What is NLG
  • Publications
  • Resources
  • University
  • Contact

Tag: evaluation

Uncategorized

Evaluation Grand Challenge: Is NLP System Good Enough for a Use Case?

Feb 21, 2019Feb 21, 2019 ehudreiterLeave a comment

I was recently asked by someone if it was possible to easily determine whether an NLP system was good enough for a specific use case. Currently this is very hard. Making it easy could be a “grand challenge” for evaluation!

Uncategorized

Does Deep Learning Prefer Readability over Accuracy?

Jan 8, 2019Feb 1, 2019 ehudreiter1 Comment

In both NLG and MT contexts, deep learning approaches can result in texts which are fluent and readable but also incorrect and misleading. This is problematical if accuracy is more important than readability, as is the case in most NLG contexts.

Uncategorized

Hallucination in Neural NLG

Nov 12, 2018Nov 12, 2018 ehudreiter13 Comments

Many neural NLG systems “hallucinate” non-existent or incorrect content. This is a major problem, since such hallucination is unacceptable in many (most?) NLG use cases. Also BLEU and related metrics do not detect hallucination well, so researchers who rely on such metrics may be misled about the quality of their system.

Uncategorized

Use Proper Baselines!

Aug 30, 2018 ehudreiter1 Comment

Unfortunately I suspect many researchers make their results looks better by using poor baselines. I give some thoughts on this, based on a recent discussion with a PhD student.

Uncategorized

How Would I Automatically Evaluate NLG Systems?

Jul 25, 2018Aug 7, 2018 ehudreiterLeave a comment

Some musings on principled and theoretically sound techniques for automatically evaluating NLG systems.

Uncategorized

How to Validate Metrics

Jul 10, 2018Aug 7, 2018 ehudreiter3 Comments

My advice on how to perform a high-quality validation study, which assesses whether a metric (such as BLEU) correlates well with human evaluations.

Uncategorized

Why doesnt BLEU work for NLG?

Jul 2, 2018Aug 7, 2018 ehudreiter5 Comments

BLEU works much better for MT systems and NLG systems. In this blog I present some speculations as to why this is the case.

Uncategorized

BLEU in Different Languages: Dont use it for German

Jun 20, 2018Aug 7, 2018 ehudreiter1 Comment

My structured survey of BLEU suggests that BLEU-human correlations are worse in German than in many other languages. But there are many caveats, so we need to be cautious in interpreting this result.

Uncategorized

BLEU-Human Correlation is Increasing: What does this Mean?

Jun 14, 2018Aug 7, 2018 ehudreiter6 Comments

The correlation between BLEU and human evaluations of MT systems seems to be increasing over time. Since BLEU has not changed, how is this possible, and what does it mean?

Uncategorized

Many Papers on Machine Learning in NLP are Scientifically Dubious

Jun 6, 2018 ehudreiter1 Comment

In response to a previous blog, many people expressed concerns to me about the quality of many papers they saw on ML in NLP. I summarise some of these concerns, which are worrying.

Posts navigation

Older Posts
Newer posts
  • LinkedIn
  • Twitter

Top Posts & Pages

  • Real-World Neural NLG
  • "Will I Pass my PhD Viva"
  • Human editing of NLG texts
  • Why I do not Want to be a Co-author on Your Paper
  • Hallucination in Neural NLG
  • Exciting NLG Research Topics (June 2017)
  • Best Papers I Read in 2020
  • Publish in Journals!
  • How do I Build an NLG System: Requirements and Corpora
  • How do I Learn about NLG?
Blog at WordPress.com.
Ehud Reiter's Blog
Blog at WordPress.com.
Cancel