chatGPT in Health: Exciting if we ignore the hype
I think there is a lot of potential in using chatGPT in healthcare, provided that we focus on real use cases instead of trying to debate whether chatGPT is somehow better than a doctor.
I think there is a lot of potential in using chatGPT in healthcare, provided that we focus on real use cases instead of trying to debate whether chatGPT is somehow better than a doctor.
I love getting questions about how to evaluate chatGPT, they are much more constructive than speculations about whether it is a threat to humanity. We need to understand what LLM technology can and cannot do, and rigorous experiments are the best way to do this. I give some advice and caveats about evaluating chatGPT in this blog, and am happy to answer questions from people who want to do high-quality evaluations.
I dont like leaderboards, which encourage academics to write papers about small improvements on established tasks and datasets. I suspect (and hope) that chatGPT and similar systems will encourage people to move away from leaderboards. If so this would be great!
Is fraud (eg fabricating or falsifying data) a problem in NLP? It certainly is a problem in other scientific areas, and it wouldnt surprise me if it affected NLP as well.
Since commercial researchers dominate the “hot” area of large language models, I’ve seen a number of people ask “what should academic researchers focus on”. There are of course huge numbers of exciting and valuable scientific research questions which are not of much commercial interest, including long-term work which wont pay off commercially for 10+ years, high quality evaluation, socially useful but low-profit applications, and using NLP to research fundamental cognitive science questions.
A reader asked me how accurate chatGPT texts need to be. The answer is that this depends on context, including use case, workflow, and error type.
CSL journal has just published a paper “Evaluating factual accuracy in complex data-to-text”, which summarises our work in this area. I strongly recommend the paper to anyone who is interested in evaluating the accuracy of texts produced by neural NLG systems.
Last week I played around with using chatGPT for data-to-text, and to be honest overall I was disappointed. A few people have asked me about this, so I’ve written up some of my notes here.
An example from MedPaLM highlighted to me that generated texts can contain information which is factually accurate but still not appropriate, because (in this case) of its negative psychological impact. There are other such cases, and we should ensure that our evaluation criteria are sensitive to them.
I get asked a lot about chatGPT, so I thought I’d write a blog explaining my views, which focus on its impact on data-to-text NLG. Basically I think chatGPT is really exciting science which shows major progress on many of the challenges in neural NLG. However, commercial potential is unclear, and the media hype is annoying…