academics

Limits of pre-publication reviewing

Many problems in NLP papers can *not* be detected by reviewers who are checking submissions to conferences and journals. In medicine and many other field of science, people can raise concerns about papers *after* they are published, and authors are expected to take this seriously. This is not the practice in NLP, which is a shame.

evaluation

Evaluating chatGPT

I love getting questions about how to evaluate chatGPT, they are much more constructive than speculations about whether it is a threat to humanity. We need to understand what LLM technology can and cannot do, and rigorous experiments are the best way to do this. I give some advice and caveats about evaluating chatGPT in this blog, and am happy to answer questions from people who want to do high-quality evaluations.