evaluation

Evaluating chatGPT

I love getting questions about how to evaluate chatGPT, they are much more constructive than speculations about whether it is a threat to humanity. We need to understand what LLM technology can and cannot do, and rigorous experiments are the best way to do this. I give some advice and caveats about evaluating chatGPT in this blog, and am happy to answer questions from people who want to do high-quality evaluations.

academics

What Should Academic NLP Researchers Focus on?

Since commercial researchers dominate the “hot” area of large language models, I’ve seen a number of people ask “what should academic researchers focus on”. There are of course huge numbers of exciting and valuable scientific research questions which are not of much commercial interest, including long-term work which wont pay off commercially for 10+ years, high quality evaluation, socially useful but low-profit applications, and using NLP to research fundamental cognitive science questions.