chatGPT in Health: Exciting if we ignore the hype

I’ve always been interested in using NLG in healthcare contexts; I’d like to think that NL technology can help both clinicians and patients understand what is going on and take appropriate actions! I think there are a lot of potentially useful applications of chatGPT in healthcare, although there are real challenges in getting the technology to work acceptably. But these use cases are much more limited and focused than the “chatGPT is better than doctors” hype that seems to dominate the media. Unfortunately, the hype seems to distract people from thinking about how to actually use the technology in “boring” but realistic ways.

Useful: Explain medical information to patients

One of the ideas that is worth exploring is using GPT technology to explain medical data to patients. Back in 2010, Saad Mahamood (then a PhD student with me) developed an NLG system which summarised data about sick babies in neonatal intensive care for their parents (Mahamood and Reiter 2011); this system was deployed and used in the hospital for a few years, and parents in general were very appreciative. Wendy Moncur (also a PhD student with me at the time) explored generating summaries for grandparents and other relatives of the same data (Moncur et al 2013).

I think this kind of thing is useful and indeed needed, especially in a world where patients are expected to make informed decisions and take responsibility for managing their health. I also think that chatGPT technology could do a better job at this task, in part because (A) it allows dialogue and questions, (B) it can summarise free-text medical notes as well as stuctured data, and (C) its much easier to adapt to a new health domain. We’ve done a few small experiments, and I think there is definitely potential here! Some clinicians are already using ChatGPT to help create material for patients, but impact would be greater if we could use chatGPT without a “human in the loop”.

There are challenges. Hallucination and getting information wrong is of course an issue, although we need to bear in mind that patients in 2023 are often pretty uninformed and use dubious information sources. Another issue is emotional sensitivity. Both Saad and Wendy found that they needed to be careful about what they said in order to avoid putting unnecessary stress and depression on users. Especially in cases (eg, communicating with an elderly grandmother with heart problems) where bad news could have a major negative impact on the user. Unfortunately, what have seen so far is that LLM chatbots have little sensitivity to this issue (blog); this needs to be addressed before we can use chatbots in this context.

Security is also an issue. We don’t want confidential patient information to leak out and be seen by other people; again this is currently a problem with chatGPT but hopefully will be fixed. Another problem which I suspect is harder to fix is that malicious agents could put material on the internet which infiltrates into the GPT outputs. For example, we dont want desperate parents of very sick babies to be told “Dr Quack’s patented Miracle Cure has helped thousands of babies like yours!”

So there are some real challenges to address before we can use chatGPT in this way, without human supervision. But if these issues can be addressed, the technology will provide real benefits to patients!

Crazy: Better than doctors

So there are some promising-but-challenging use cases for chatGPT in healthcare. Above is just one example; another is using the technology to summarise doctor-patient consultations (similar to Knoll et al 2022; see also Wall Street Journal article). And many more! But no one talks about these, instead what people seem to focus on is whether chatGPT is somehow better than doctors and perhaps can replace them. Which is NOT the case!

Typically this starts with a claim that chatGPT can pass medical exams as well as (or better than) most humans. Most of the studies I’ve seen are dubious because they ignore the fact that a lot of questions on medical exams are available (perhaps in a different format) on the internet, so hence testing chatGPT on these questions amounts to testing it on its training data (Kung et al claim this isnt an issue because their data is from 2022, but I suspect chatGPT’s training material includes some internet content from 2022). Some of the studies Ive seen (including Kung et al) also only tested chatGPT on a subset of the questions in the medical exam. Perhaps there are better studies which I have not seen, although in general most of the evaluations I’ve seen of chatGPT are pretty dubious.

Also, we’ve had “super-human” diagnostic algorithms since before I was born. Meehl 1954 showed that simple regression models could do better than the average doctor on some diagnosis task; Kahneman argued that this made sense because we know that humans (including domain experts such as doctors) are very bad at some types of probabilistic reasoning. The whole point of evidence-based medicine is to use the latest scientific findings (including data science results) in medicine, instead of over-relying on doctors intuition.

Perhaps the most important point is that diagnosis is only part of what a doctor does. Dr. Josh Tamayo-Sarver has a great blog on the problems he has seen in using chatGPT to diagnose patients in real-world contexts, with the key insights being that it works well if it has perfect information, but this is unrealistic; other doctors have also told me that getting the necessary information (especially if patients are confused, elderly, sick, etc) is the hardest part of diagnosing a patient.

Other key tasks in dealing with a patient including managing emotion/stress issues (as above), and also deciding on appropriate care. The last is a big challenge in the UK at the moment because the health system is struggling; a doctor told me of cases where he wanted a patient to receive specialist care but knew there was a one-year waiting list for this care, so had to look for alternatives.

To summarise, its possible that chatGPT can do well at diagnosis if it has perfect information and doesnt care about emotional/stress or practical issues such as care availability; however I am still waiting for solid experimental confirmation of this. But even if this were true, this is hardly news (Meehl showed this could be done in in 1954) and is only a small part of what a doctor does.

I’ve also heard people say that we should use the technology in poor countries where doctors are scarce. Again this is not a new idea, the e-health company Babylon has offered e-health services for years in Rwanda on this justification. I am under NDA with Babylon, so I wont say more about this here, but I encourage interested readers to learn about Babylon’s experiences.

Final Thoughts

There is a lot of potential in using LLM and chatGPT technology in healthcare, but we need to focus on real use cases and their associated challenges. Fixating on whether chatGPT is better than doctors is ridiculous, and gets in the way of real progress in this space.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s