academics

I hate pay-to-publish

Aug 19, 2025Aug 19, 2025 ehudreiter5 Comments

The academic world has changed in many ways since I got my PhD in 1990. One of the worst changes is that researchers in 2025 usually need to pay thousands of pounds to publish their work. This is unfair to researchers with limited funding, and not good for science.

evaluation

More on evaluating impact

Aug 5, 2025 ehudreiter3 Comments

I recently published a paper and gave a talk about evaluating real-world impact. I got some great feedback from this, and summarise some of the suggested papers (including more examples of impact eval) and insightful comments (eg, about eval “ecosystem”) I received.

personal

Cycling in Netherlands

Jul 13, 2025Jul 14, 2025 ehudreiter1 Comment

This is a personal blog, about a recent bike trip I did which was mostly in Netherlands.

AI in Healthcare

Patients want to know what information an AI model considers

Jun 25, 2025Jun 25, 2025 ehudreiter2 Comments

My student Adarsa Sivaprasad is looking into what questions users of an AI prediction model actually have, and how these should be answered. Amongst other things, users seem to have more questions about what information a model considers than about how a model works.

academics

The Aberdeen NLP Research Group

Jun 5, 2025 ehudreiterLeave a comment

We have a really nice NLP research group in University of Aberdeen, with a dozen researchers who work on topics such as evaluation, interpretability, health applications, cognitive aspects, and cross-temporal research. We regularly publish and win awards in top venues. Its exciting!

Uncategorized

Key messages from my NLG book

May 14, 2025 ehudreiterLeave a comment

Its been around 6 months since my new NLG book was released. I summarise what I now think are its key messages, for rule-based NLG, ML and neural NLG, requirements, evaluation, safety/testing/maintainability, and applications.

evaluation

Even good leaderboards may not be useful, because they are gamed

May 5, 2025May 5, 2025 ehudreiter3 Comments

Most LLM benchmarks and leaderboards are garbage. Unfortunately, it now seems that even the few “good” benchmarks (such as SWEBench and Chatbot Arena) are compromised because they are being gamed by the big LLM vendors, who tweak the benchmarks and rules so that their systems do better.

evaluation

Examples of evaluating real-world impact

Apr 8, 2025Aug 3, 2025 ehudreiter4 Comments

I describe several papers which measure real-world impact of NLP systems, using different methodologies (A/B test, before/after eval, clinical trial, observational study). I hope these examples inspire and encourage more people to consider evaluating real-world impact!

evaluation

Benchmarks distract us from what matters

Mar 26, 2025 ehudreiter7 Comments

I suspect that our fixation with LLM benchmarks may be driving us to optimise LLMs for capabilities that are easier to benchmark (such as math problems) even if they are not of much interest to users; and also to ignore capabilities (such as emotional appropriateness) which are important to real users but hard to assess with benchmarks.

other

People do not understand how LLMs can/cannot help them

Mar 13, 2025 ehudreiter1 Comment

People will make much better use of LLMs if they understand what the technology can and can not do. Unfortunately many people have little understanding of this; I make a few suggestions which perhaps could help a bit.

Ehud Reiter's Blog

Ehud's thoughts about Natural Language Generation. Also see my book on NLG.

Author: ehudreiter

I hate pay-to-publish

More on evaluating impact

Cycling in Netherlands

Patients want to know what information an AI model considers

The Aberdeen NLP Research Group

Key messages from my NLG book

Even good leaderboards may not be useful, because they are gamed

Examples of evaluating real-world impact

Benchmarks distract us from what matters

People do not understand how LLMs can/cannot help them