evaluation

Evaluating chatGPT

Apr 4, 2023Apr 27, 2023 ehudreiter11 Comments

I love getting questions about how to evaluate chatGPT, they are much more constructive than speculations about whether it is a threat to humanity. We need to understand what LLM technology can and cannot do, and rigorous experiments are the best way to do this. I give some advice and caveats about evaluating chatGPT in this blog, and am happy to answer questions from people who want to do high-quality evaluations.

academics

Does chatGPT make leaderboards less meaningful?

Mar 27, 2023Mar 27, 2023 ehudreiter1 Comment

I dont like leaderboards, which encourage academics to write papers about small improvements on established tasks and datasets. I suspect (and hope) that chatGPT and similar systems will encourage people to move away from leaderboards. If so this would be great!

academics

Could some NLP research be fraudulent?

Mar 7, 2023Mar 7, 2023 ehudreiterLeave a comment

Is fraud (eg fabricating or falsifying data) a problem in NLP? It certainly is a problem in other scientific areas, and it wouldnt surprise me if it affected NLP as well.

academics

What Should Academic NLP Researchers Focus on?

Feb 28, 2023 ehudreiterLeave a comment

Since commercial researchers dominate the “hot” area of large language models, I’ve seen a number of people ask “what should academic researchers focus on”. There are of course huge numbers of exciting and valuable scientific research questions which are not of much commercial interest, including long-term work which wont pay off commercially for 10+ years, high quality evaluation, socially useful but low-profit applications, and using NLP to research fundamental cognitive science questions.

other

How accurate do chatGPT texts need to be?

Feb 14, 2023Feb 14, 2023 ehudreiterLeave a comment

A reader asked me how accurate chatGPT texts need to be. The answer is that this depends on context, including use case, workflow, and error type.

evaluation

Evaluating factual accuracy in complex data-to-text

Feb 7, 2023Feb 11, 2023 ehudreiter6 Comments

CSL journal has just published a paper “Evaluating factual accuracy in complex data-to-text”, which summarises our work in this area. I strongly recommend the paper to anyone who is interested in evaluating the accuracy of texts produced by neural NLG systems.

building NLG systems

Can ChatGPT do Data-to-Text?

Jan 23, 2023Jun 29, 2023 ehudreiter4 Comments

Last week I played around with using chatGPT for data-to-text, and to be honest overall I was disappointed. A few people have asked me about this, so I’ve written up some of my notes here.

evaluation

Texts can be accurate but still not appropriate

Jan 16, 2023Jan 16, 2023 ehudreiter7 Comments

An example from MedPaLM highlighted to me that generated texts can contain information which is factually accurate but still not appropriate, because (in this case) of its negative psychological impact. There are other such cases, and we should ensure that our evaluation criteria are sensitive to them.

building NLG systems

chatGPT: Great science, unclear commercials, hate the hype

Dec 29, 2022Dec 29, 2022 ehudreiter4 Comments

I get asked a lot about chatGPT, so I thought I’d write a blog explaining my views, which focus on its impact on data-to-text NLG. Basically I think chatGPT is really exciting science which shows major progress on many of the challenges in neural NLG. However, commercial potential is unclear, and the media hype is annoying…

academics

Our 2022 Publications: NLG Evaluation, Requirements, Resources

Dec 20, 2022 ehudreiterLeave a comment

I thought I’d end 2022 with a summary of the papers written by my students and I in 2022. All of them are about requirements, resources, and/or evaluation of NLG.

Ehud Reiter's Blog

Ehud's thoughts about Natural Language Generation. Also see my book on NLG.

Evaluating chatGPT

Does chatGPT make leaderboards less meaningful?

Could some NLP research be fraudulent?

What Should Academic NLP Researchers Focus on?

How accurate do chatGPT texts need to be?

Evaluating factual accuracy in complex data-to-text

Can ChatGPT do Data-to-Text?

Texts can be accurate but still not appropriate

chatGPT: Great science, unclear commercials, hate the hype

Our 2022 Publications: NLG Evaluation, Requirements, Resources