People do not understand how LLMs can/cannot help them

I’ve had several conversations about using LLMs over the past few weeks where the people I talked to had little idea of what LLMs could and could not do, and how LLMs could and could not help them. Which is worrying, because if we want AI to actually help people, them the people being helped need to understand what to use AI for! A few examples:

A student was trying to use obscure software library, could not not find info. I suggested he try GPT, he had not realised that GPT could do this.
A researcher (not Aberdeen) in digital humanities was trying to analyse texts in Aramaic. I suggested that LLMs could do part of the task, he had not realised that (larger) LLMs could deal with Aramaic.
A lawyer oscillated (when talking to me) between “LLMs can do almost anything in law” and “LLMs cannot be trusted to do anything in law”
A random person (friend of a friend) said AI could replace doctors, since he had read that they do better diagnosis than human doctors. I tried to explain that good classifier performance on a test set may not mean much (blog), I’m not sure he understood me.

So there seems to be massive confusion about LLMs and AI; people don’t realise when the tech can actually help them, and also may think that the tech can do things that it cannot.

I suspect this is largely due to the way media, gurus, and tech companies talk about LLMs and AI. Lots of talk about AGI (not relevant to users today), glowing articles about medical AI which ignore the fact that tet-set performance may not mean much (see above), etc. Even the better venues (like the Economist) highlight LLM benchmarks which have little relevance to how people actually use LLMs (blog). I guess the focus of media/etc is on attracting eyeballs instead of educating people…

Anyways, below are a few suggestions on how people perhaps could assess whether LLMs can help them.

Approach 1: What do other people use LLMs for

The simplest approach is to look at how other people are using LLMs. There is a fascinating recent paper from Anthropic which analyses how people are actually using Claude (Arxiv) (summary). Key insights:

Most common use (by far) was for software development. The most common task (5% of conversations) was “modify existing software to correct errors, to adapt it to new hardware, or to upgrade interfaces and improve performance.”
Second most common use was writing and editing documents
Third most common use was education, including preparing material and tutoring
Limited use in law or medicine; eg, 0.04% of conversations were about “prepare legal documents”

Incidentally, above is a very good match to my own observations of how people I know use LLMs.

Anyways, this kind of thing is very broad-brush, but it does tell us that software developers, people who write or edit documents, and educators should absolutely investigate how LLMs can help them. However, doctors and lawyers should be more cautious and not expect too much.

Approach 2: Small-scale experiments

A more hands-on approach is to do small-scale experiments. For example, just ask an LLM for help with a software library, or to analyse an example text in Aramaic. Using obvious and simple prompts, without worrying about prompt engineering.

I wouldnt do this for anything safety-critical (eg, medicine or law), but in other contexts I have found this a useful way to quickly get an understanding of how much an LLM can help me. Obviously this kind of thing doesnt tell you about hallucinations, robustness, etc! But it can give a sense for whether an LLM can help.

I find that some people I talk to are already doing this, but others need encouragement, perhaps because the media stories about AI (AGI, extinction risk, etc) make the tech seem pretty daunting.

Approach 3: Proper requirements and evaluation

The ideal approach is to do a proper analysis of requirements and evaluate the effectiveness of different technical solutions; this is something I discuss in detail in my book. The basic approach is to:

Understand requirements: What tasks do you want to do, what quality criteria do you care about, will LLM be used on its own or as part of a human+AI workflow?
Identify possible technical approaches: out-of-the-box LLMs, fine-tuned models, rule-based approaches (which still make sense in many cases)
Evaluate effectiveness of approaches in terms of achieving quality criteria, also considering (if appropriate) safety and practical issues such as user acceptance.

Above is overkill for tasks such as finding info about obscure software libraries! But it makes sense for understanding how LLMs can help in law and medicine, and more generally I think its a useful conceptual framework for thinking about assessing utility.

Final thoughts

I want AI to help people, so I find it frustrating that the technology helps less than it could because people dont understand what it can and cannot do. And unfortunately tech companies, gurus, and media seem to have little interest in educating people.

So perhaps educators and researchers, ie people like me, need to take more responsibility for helping people from all walks of life understand what are sensible uses of LLM and AI tech?

Ehud Reiter's Blog

Ehud's thoughts about Natural Language Generation. Also see my book on NLG.

People do not understand how LLMs can/cannot help them

Approach 1: What do other people use LLMs for

Approach 2: Small-scale experiments

Approach 3: Proper requirements and evaluation

Final thoughts

One thought on “People do not understand how LLMs can/cannot help them”

Leave a comment Cancel reply

Approach 1: What do other people use LLMs for

Approach 2: Small-scale experiments

Approach 3: Proper requirements and evaluation

Final thoughts

Share this:

Related

Share this:

One thought on “People do not understand how LLMs can/cannot help them”

Leave a comment Cancel reply