Skip to content

Ehud Reiter's Blog

Ehud's thoughts about Natural Language Generation. Also see my book on NLG.

  • Home
  • Blog Index
  • About
  • What is NLG
  • Publications
  • Resources
  • University
  • Book
  • Contact

Category: evaluation

evaluation

Do LLM coding benchmarks measure real-world utility?

Jan 13, 2025Jan 22, 2025 ehudreiter6 Comments

LLM benchmarks for coding are closer to real-world use than other LLM benchmarks, but they still do not measure real-world utility. I explain this by contrasting what is measured by SWE-bench with what is measured by a recent study of real-world utility in software development.

evaluation

We need better LLM benchmarks

Jan 3, 2025Jan 31, 2025 ehudreiter9 Comments

Current benchmark (suites) for evaluating LLMs are disappointing. I describe the properties that I think good benchmarks and benchmark suites should have, but often do not, such as being correct, challenging, diverse, and real-world.

evaluation

Do LLM benchmarks ignore NLG?

Dec 26, 2024Dec 27, 2024 ehudreiter2 Comments

I was very disappointed to realise that the evaluation suite for Amazon Nova (and I assume for other LLMs) has poor coverage of NLG tasks. Which is surprising since LLMs are largely used to generate texts; shouldnt they be evaluated, at least in part, on their ability to do this well?

evaluation

MQM shows the power of a gold-standard evaluation

Dec 2, 2024 ehudreiter2 Comments

I am very happy to see that the MT community is adopting the annotation-based MQM protocol as a gold-standard evalution technique. Having such a gold standard both strengthens evaluation and also supports exciting new research in evaluation.

evaluation

Qualitative evaluation

Oct 7, 2024Oct 7, 2024 ehudreiter1 Comment

In NLG we focus on quantitative evaluation, but qualitative techniques can also be used. Quantatitive hypothesis testing is essential, but its also really useful to ask people what they think of an NLG system in an open-ended way.

evaluation

One-day class on NLG evaluation

Sep 9, 2024Sep 9, 2024 ehudreiter3 Comments

In early Sept I ran a one-day class on evaluation. I summarise what I did in this class and give links to my presentations, in case this is useful to other people.

evaluation

Challenges in Evaluating LLMs

Jul 10, 2024Jul 19, 2024 ehudreiter2 Comments

I list five challenges to evaluating LLMs, which unfortunately seem to be ignored by many researchers. Which means that many published LLM evaluations cannot be trusted. This blog is based on a recent workshop talk.

evaluation

Can LLM-based eval replace human evaluation?

Jun 11, 2024 ehudreiter3 Comments

I suspect we may be reaching the point where the most common type of human evaluation in NLG (ratings/rankings by crowdworkers or students) are less meaningful than evaluations using LLMs. But better forms of human evaluation, based on annotation or impact, are still very useful and give insights which we cannot get from LLMs.

evaluation

Human eval: Subjects must understand the task

May 28, 2024May 28, 2024 ehudreiter2 Comments

In human evaluation, it is absolutely essential that subjects understand what they are supposed to do; otherwise evaluations will not be meaningful or replicable. This may sound obvious, but it was repeatedly raised as a concern in the replication shared task in the 2024 Human Evaluation workshop.

evaluation

Ten tips on doing a good evaluation

Apr 8, 2024 ehudreiter2 Comments

I present some suggestions for doing good evaluations, which are based on previous blogs I have written.

Posts navigation

Older Posts
Newer posts
  • LinkedIn
  • Twitter

News: Come to my retirement symposium on NLG evaluation! https://retroeval.github.io/

Top Posts & Pages

  • Good diagrams for research papers
  • Dont ignore omissions!
  • My Eureka moments in research
  • "Will I Pass my PhD Viva"
  • Qualitative evaluation
  • What LLMs cannot do
  • MQM shows the power of a gold-standard evaluation
  • LLM hype brings memories of IBM Watson
  • Blog Index
  • Google: Please Stop Telling Lies About Me
Blog at WordPress.com.
Ehud Reiter's Blog
Blog at WordPress.com.
  • Subscribe Subscribed
    • Ehud Reiter's Blog
    • Join 102 other subscribers.
    • Already have a WordPress.com account? Log in now.
    • Ehud Reiter's Blog
    • Subscribe Subscribed
    • Sign up
    • Log in
    • Report this content
    • View site in Reader
    • Manage subscriptions
    • Collapse this bar
 

Loading Comments...