academics

ACL vs TACL Reviewing

Jun 6, 2023 ehudreiter2 Comments

This year I was both a TACL Action Editor and an ACL Senior Area Chair. This experience has reinforced my belief that the journal review process is better!

evaluation

Future of NLG evaluation: LLMs and high quality human eval?

May 22, 2023May 22, 2023 ehudreiter3 Comments

We may see a big change in NLG evaluation over the next few years, with LLM-based evaluation replacing metrics such as BLEU and BLEURT, and a renewed emphasis on high-quality human evaluation to assess semantic and pragmatic correctness. Would be a step forward if this happens!

academics

Limits of pre-publication reviewing

May 9, 2023 ehudreiter4 Comments

Many problems in NLP papers can *not* be detected by reviewers who are checking submissions to conferences and journals. In medicine and many other field of science, people can raise concerns about papers *after* they are published, and authors are expected to take this seriously. This is not the practice in NLP, which is a shame.

academics

Unresponsive Authors and Experimental Flaws

May 3, 2023May 3, 2023 ehudreiter8 Comments

In our ReproHum project, we have found that many NLP experiments are flawed, and many authors do not respond to requests for more information about their work. This is depressing and hinders scientific progress in NLP.

Uncategorized

chatGPT in Health: Exciting if we ignore the hype

Apr 9, 2023Apr 11, 2023 ehudreiterLeave a comment

I think there is a lot of potential in using chatGPT in healthcare, provided that we focus on real use cases instead of trying to debate whether chatGPT is somehow better than a doctor.

evaluation

Evaluating chatGPT

Apr 4, 2023Apr 27, 2023 ehudreiter11 Comments

I love getting questions about how to evaluate chatGPT, they are much more constructive than speculations about whether it is a threat to humanity. We need to understand what LLM technology can and cannot do, and rigorous experiments are the best way to do this. I give some advice and caveats about evaluating chatGPT in this blog, and am happy to answer questions from people who want to do high-quality evaluations.

academics

Does chatGPT make leaderboards less meaningful?

Mar 27, 2023Mar 27, 2023 ehudreiter1 Comment

I dont like leaderboards, which encourage academics to write papers about small improvements on established tasks and datasets. I suspect (and hope) that chatGPT and similar systems will encourage people to move away from leaderboards. If so this would be great!

academics

Could some NLP research be fraudulent?

Mar 7, 2023Mar 7, 2023 ehudreiterLeave a comment

Is fraud (eg fabricating or falsifying data) a problem in NLP? It certainly is a problem in other scientific areas, and it wouldnt surprise me if it affected NLP as well.

academics

What Should Academic NLP Researchers Focus on?

Feb 28, 2023 ehudreiterLeave a comment

Since commercial researchers dominate the “hot” area of large language models, I’ve seen a number of people ask “what should academic researchers focus on”. There are of course huge numbers of exciting and valuable scientific research questions which are not of much commercial interest, including long-term work which wont pay off commercially for 10+ years, high quality evaluation, socially useful but low-profit applications, and using NLP to research fundamental cognitive science questions.

other

How accurate do chatGPT texts need to be?

Feb 14, 2023Feb 14, 2023 ehudreiterLeave a comment

A reader asked me how accurate chatGPT texts need to be. The answer is that this depends on context, including use case, workflow, and error type.

Ehud Reiter's Blog

Ehud's thoughts about Natural Language Generation. Also see my book on NLG.

Author: ehudreiter

ACL vs TACL Reviewing

Future of NLG evaluation: LLMs and high quality human eval?

Limits of pre-publication reviewing

Unresponsive Authors and Experimental Flaws

chatGPT in Health: Exciting if we ignore the hype

Evaluating chatGPT

Does chatGPT make leaderboards less meaningful?

Could some NLP research be fraudulent?

What Should Academic NLP Researchers Focus on?

How accurate do chatGPT texts need to be?