evaluation

One-day class on NLG evaluation

Sep 9, 2024Sep 9, 2024 ehudreiter3 Comments

In early Sept I ran a one-day class on evaluation. I summarise what I did in this class and give links to my presentations, in case this is useful to other people.

building NLG systems

The latest/trendiest tech isnt always appropriate

Aug 26, 2024 ehudreiter2 Comments

Sometimes the latest technology is *not* appropriate for an NLG task. I saw this very strongly in the late 2010s with LSTMs (which do not work well for data-to-text), and continue to see this in 2024 (GPT4 is not always the best approach). Both researchers and developers need to be open-minded about alternative approaches.

personal

Heading towards Retirement

Aug 9, 2024 ehudreiter1 Comment

I’m now part-way towards retirement, and working fewer hours (for less money) while have more time for trips and other personal activities. A few people have asked about this, so I thought I’d explain in a blog.

other

Why is adoption of AI in healthcare so slow?

Jul 23, 2024 ehudreiter4 Comments

AI has many promising applications in healthcare, but adoption of AI in healthcare is very slow. One message from a recent workshop I attended is that it would help if AI researchers had a better understanding of requirements of the health sector, including evaluation, challenges, and business cases.

evaluation

Challenges in Evaluating LLMs

Jul 10, 2024Jul 19, 2024 ehudreiter2 Comments

I list five challenges to evaluating LLMs, which unfortunately seem to be ignored by many researchers. Which means that many published LLM evaluations cannot be trusted. This blog is based on a recent workshop talk.

personal

Cycling from Perth to Preston

Jul 1, 2024 ehudreiter1 Comment

This is a personal blog about a recent bike trip I did from Perth (Scotland) to Preston (England).

evaluation

Can LLM-based eval replace human evaluation?

Jun 11, 2024 ehudreiter3 Comments

I suspect we may be reaching the point where the most common type of human evaluation in NLG (ratings/rankings by crowdworkers or students) are less meaningful than evaluations using LLMs. But better forms of human evaluation, based on annotation or impact, are still very useful and give insights which we cannot get from LLMs.

building NLG systems

Well structured input data helps LLMs

Jun 3, 2024 ehudreiter1 Comment

My student Barkavi Sundararajan has shown that LLMs do a better job at data-to-text if the input data is well structured. She will present a paper about this at NAACL.

evaluation

Human eval: Subjects must understand the task

May 28, 2024May 28, 2024 ehudreiter2 Comments

In human evaluation, it is absolutely essential that subjects understand what they are supposed to do; otherwise evaluations will not be meaningful or replicable. This may sound obvious, but it was repeatedly raised as a concern in the replication shared task in the 2024 Human Evaluation workshop.

other

We can learn from the past in AI/Medicine

May 6, 2024May 6, 2024 ehudreiter5 Comments

People working in AI in Medicine (and indeed AI more generally) should be aware of the long history of previous work in this area. Our technology is much better in 2024, but real-world success is still challenging, as has been the case for the past 70 years (the first claims that models could be better than doctors were made in 1954).

Ehud Reiter's Blog

Ehud's thoughts about Natural Language Generation. Also see my book on NLG.

One-day class on NLG evaluation

The latest/trendiest tech isnt always appropriate

Heading towards Retirement

Why is adoption of AI in healthcare so slow?

Challenges in Evaluating LLMs

Cycling from Perth to Preston

Can LLM-based eval replace human evaluation?

Well structured input data helps LLMs

Human eval: Subjects must understand the task

We can learn from the past in AI/Medicine