personal

Heading towards Retirement

Aug 9, 2024 ehudreiter1 Comment

I’m now part-way towards retirement, and working fewer hours (for less money) while have more time for trips and other personal activities. A few people have asked about this, so I thought I’d explain in a blog.

other

Why is adoption of AI in healthcare so slow?

Jul 23, 2024 ehudreiter4 Comments

AI has many promising applications in healthcare, but adoption of AI in healthcare is very slow. One message from a recent workshop I attended is that it would help if AI researchers had a better understanding of requirements of the health sector, including evaluation, challenges, and business cases.

evaluation

Challenges in Evaluating LLMs

Jul 10, 2024Jul 19, 2024 ehudreiter2 Comments

I list five challenges to evaluating LLMs, which unfortunately seem to be ignored by many researchers. Which means that many published LLM evaluations cannot be trusted. This blog is based on a recent workshop talk.

personal

Cycling from Perth to Preston

Jul 1, 2024 ehudreiter1 Comment

This is a personal blog about a recent bike trip I did from Perth (Scotland) to Preston (England).

evaluation

Can LLM-based eval replace human evaluation?

Jun 11, 2024 ehudreiter3 Comments

I suspect we may be reaching the point where the most common type of human evaluation in NLG (ratings/rankings by crowdworkers or students) are less meaningful than evaluations using LLMs. But better forms of human evaluation, based on annotation or impact, are still very useful and give insights which we cannot get from LLMs.

building NLG systems

Well structured input data helps LLMs

Jun 3, 2024 ehudreiter1 Comment

My student Barkavi Sundararajan has shown that LLMs do a better job at data-to-text if the input data is well structured. She will present a paper about this at NAACL.

evaluation

Human eval: Subjects must understand the task

May 28, 2024May 28, 2024 ehudreiter2 Comments

In human evaluation, it is absolutely essential that subjects understand what they are supposed to do; otherwise evaluations will not be meaningful or replicable. This may sound obvious, but it was repeatedly raised as a concern in the replication shared task in the 2024 Human Evaluation workshop.

other

We can learn from the past in AI/Medicine

May 6, 2024May 6, 2024 ehudreiter5 Comments

People working in AI in Medicine (and indeed AI more generally) should be aware of the long history of previous work in this area. Our technology is much better in 2024, but real-world success is still challenging, as has been the case for the past 70 years (the first claims that models could be better than doctors were made in 1954).

other

Real-world usage of LLMs in Journalism

Apr 23, 2024 ehudreiter1 Comment

I really liked a recent survey of gen AI in journalism, which looks at issues such as how journalists use/interact with LLMs, and what impact this has on journalists. Some unexpected (to me) findings, for example the most common ethical concern is that news organisations will use LLMs without human supervision.

evaluation

Ten tips on doing a good evaluation

Apr 8, 2024 ehudreiter2 Comments

I present some suggestions for doing good evaluations, which are based on previous blogs I have written.

Ehud Reiter's Blog

Ehud's thoughts about Natural Language Generation. Also see my book on NLG.

Heading towards Retirement

Why is adoption of AI in healthcare so slow?

Challenges in Evaluating LLMs

Cycling from Perth to Preston

Can LLM-based eval replace human evaluation?

Well structured input data helps LLMs

Human eval: Subjects must understand the task

We can learn from the past in AI/Medicine

Real-world usage of LLMs in Journalism

Ten tips on doing a good evaluation