ChatGPT error or human error?

My PhD student Mengxuan “Summer” Sun is exploring whether chatGPT can explain complex medical notes to patients. As a first step, she has obtained some medical notes from doctors (for fictional patients, but inspired by real patients), asked chatGPT to explain them, and then asked doctors (including the authors of the medical notes) to evaluate chatGPT’s response for appropriateness and accuracy. This is ongoing work, but there are already some interesting findings.

As usual with such endeavours, the summaries look very impressive at first, but problems appear when domain experts (doctors) check them in detail. Also as usual with chatGPT, the problems are mostly in content. There is some content which would be fine in USA, but isnt really appropriate in the UK health system (not surprising considering that there is a lot more US medical content than UK medical content in chatGPT’s training data); there is also content which may not be appropriate for the patient. Some of the content is very generic, and it would be better if it was more personalised to the patient. Also, generic content arguable should come from a patient information leaflet (which is carefully crafted to be appropriate for UK and indeed Aberdeen, minimise emotional upset, etc) instead of chatGPT. Perhaps some of these problems can be fixed with better prompting, model fine-tuning, etc; Summer is going to look into this,

ChatGPT also misunderstood some acronyms (medical notes are full of these). For example, chatGPT thought that “CNP” referred to “Clinical Nurse Practitioner” when in fact it referred to a doctor who had initials CNP. Again this is not surprising since I suspect that in chatGPT’s training data, CNP means “Clinical Nurse Practitioner” in 99% of cases.

Human error?

So far what Summer found was more or less in line with my expectations. However, when we discussed the “CNP” issue with two other doctors (in addition to the doctor who wrote the note), they told us that they also would have interpreted CNP to mean “Clinical Nurse Practitioner”. In other words, the usage of CNP to refer to refer to a doctor with these initials was confusing to other doctors as well as to chatGPT. Since medical notes should be understandable to other doctors (eg, in case the patient moves to a new city), arguably the problem is in the note more than chatGPT; and indeed the doctor who wrote the note agreed that it would be better to make the usage of CNP clearer.

There was another interesting case where the note contained a phrase which we did not understand, and which chatGPT simply copied verbatim. We assumed this was because the phrase was medical jargon, but when we asked the author, he told us that it was a nonsense phrase included by accident; human error again.

Another potential issue (this is from me, not the doctors) is what would happen if there were mistakes (wrong content) in the note. Another PhD student, Francesco Moramarco, found mistakes in human written summaries of doctor-patient consultations (earlier blog); these are very different from the notes that Summer is looking at, but unfortunately mistakes seem possible here as well. We cannot ask the doctors about this because Summer’s notes are for fictional patients, so unclear what would count as a mistake.

Thoughts

We started this exercise in order to understand what kind of mistakes chatGPT made in interpreting and explaining complex medical notes. We’ve certainly learned something about this, but we’ve also seen that a big challenge in explaining medical notes to patients is that the notes themselves may be confusing, contain nonsense phrases, and indeed be wrong. And this problem will not go away no matter how good our LLMs become.

Indeed, some of the doctors suggested that perhaps a good way of using chatGPT in this context is to help doctors find problems in notes. It can of course be hard for authors to see problems in texts they have written themselves, so perhaps reading chatGPT explanations of medical notes could help doctors find and mistakes in their notes?

Ehud Reiter's Blog

Ehud's thoughts and observations about Natural Language Generation

ChatGPT error or human error?

Human error?

Thoughts

2 thoughts on “ChatGPT error or human error?”

Leave a comment Cancel reply

Human error?

Thoughts

Share this:

Related

Share this:

2 thoughts on “ChatGPT error or human error?”

Leave a comment Cancel reply