A few people have recently asked me if my long-term vision for AI is close to being achieved. This is a very general question, but I thought I would respond in the context of my vision of AI Personal Health Assistants. I first proposed this 25 years ago when I was asked to suggest a “grand challenge” for Computer Science. I think it remains quite a challenge, although definitely closer to reality and more tangible now than 25 years ago.
The basic vision is that people have personal health assistants, which give advice, make suggestions, interact with the broader health system, and perhaps intervene directly (eg, to realise medication into bloodstream). They have access to medical records and sensor data from the patient. They also are available to *everyone*, including poor people living in deprived areas of UK and rural farmers in low-income African nations, as well as well-educated UK academics.
So what are the challenges in 2026 to realising this vision?
Requirements: What do people actually want?
One thing I have learnt over the years is that it is essential to find out what people actually want an AI health assistant to do, and not assume that we know what people want (blog)!
At a high level, we know from many studies, including ones at Aberdeen, that AI health assistants
- Should be accurate (no hallucinations), not omit important information, express information clearly, and give unbiased advice (ie, not copy marketing material from a health provider website).
- Should not cause unnecessary emotional stress, and should be safe and secure
- Should adapt to the user, including context (culture, country and health care system, family circumstances, etc), health needs and concerns, knowledge and literacy level, personality, etc
- Should have access to health sensor data from the user, and possibly effectors (eg insulin pump)
- Should integrate with the wider health care system, including electronic patient records
- Should be affordable and deployable by people around the world
Above are very high level, of course we need more detailed requirements in order to build apps. One very promising source of requirements is usage logs from current bots used in health contexts. This is very commercially and personally sensitive information, but I was excited to see a new paper from Microsoft which gave data about the kind of health queries received by Microsoft Copilot (I wish the paper had been more detailed, but it is a start).
Another possibility is to directly elicit requirements, using surveys, focus groups, prototypes, etc. Ideally this would be based on actual usage of apps (possibly at prototype or mockup level, or using Wizard-of-Oz techniques), but I have not seen such studies. There are some general surveys (paper) of patient attitudes to AI, but these are not based on actual usage of AI health apps.
WAY FORWARD: We need better data about how people currently use AI in health contexts, and we need studies of what different kinds of people want from AI health. There is no technical barrier in doing this, challenges are getting access to usage data and getting resources to do the studies.
Technology
If we look at the high level technical requirements, how close are modern LLM health chatbots to satisfying them? Obviously this is changing as models evolve, but my impression is that
- Accurate, no important omissions, clear, unbiased: I think good progress is being made (I was impressed by Brodeur et al 2026), but problems remain. These include quality of information extracted from the internet, and whether it is up-to-date (eg, ChatGPT recently gave me accurate but misleading information about abortion laws in England, because it did not say that they will shortly change). Another shortcoming is the ability to effectively work with patients who may be confused or misguided (Bean et al 2026).
- No unnecessary emotional stress, safe, secure: These certainly have been major problems in the past (blog), I dont have a good feeling for current status. But I suspect that letting an Internet-based LLM control an insulin pump (for example) would not be acceptable in 2026 from a safety and security perspective.
- Adapting to users: I suspect current models do not this well. Even at the simplest level, in the past ChatGPT (etc) would regularly advise me to talk to my insurance provider, which makes sense in a US context but not a UK context. Perhaps this has been fixed, but I suspect I would still see Americanisms if I pushed. I also suspect the situation is a lot worse in Nigeria and India, especially since models have limited support for many local languages.
So the technology has certainly made huge advances, but further advances are needed!
WAY FORWARD: I think the above problems can be addressed with sufficient R&D. It won’t happen tomorrow, but perhaps it could happen by 2030 if effective AI health assistants became an R&D priority.
Integration
Personal AI assistants are much more effective if they integrate with the wider health care system.
For example, 10 years ago Babylon launched its Babyl health app in Rwanda, which was used by millions of people for many years. Although Babyl included some AI advice (vision was to use AI to provide medical expertise in remote areas), over the years the focus of Babyl shifted to remote access to health professionals. Such access was more important to rural Rwandans than AI advice, perhaps because health professionals could provide practical help such as medication, tests, nursing and midwife support, etc. So Babyl would have been much less useful if it only offered AI advice.
A UK example is our ASICA project, which is developing an app for people with melanoma. The app would be much more useful if it was integrated into National Health Service (NHS) patient records and workflows, so information could be sent from the app to doctors, but achieving this is much harder than technically building the app.
Integrating apps into healthcare systems is challenging (at least in the UK), in part because workflows and IT systems differ. There are differences even within Scotland; eg, skin cancer NHS workflows are a bit different in the city of Dundee, 100km from Aberdeen. There are much bigger differences between Scotland and USA, and enormous differences between Scotland and Rwanda!
Regulatory, privacy, and safety concerns are also very important. For example 15 years ago parents really liked our Babytalk system which gave them updates about the status of a sick baby and wanted the reports to be on the web so they could read them at home. However the hospital refused on IT security grounds. More recently, the NHS IT security people in Aberdeen tried to stop GPs from using the popular HeidiHealth system, evem though it was widely used by other health systems around the world.
WAY FORWARD: Integration will take time, and it would really help to have robust RCTs which showed strong health benefits, this should be a research priority. But ultimately this may be more of a policy and organisational issue than a technical one. The people running health organisations need to decide that integration with personal AI health apps is important, and make achieving this a priority, otherwise it will not happen.
Adoption
Last but not least, an AI personal health assistant must actually be used in order to provide benefits. This means people have to trust it, and many people do not trust health AI. It also must be cheap enough to be affordable, which may go against commercial interests. If we want it to be deployed worldwide, then it needs to be able to run on cheap smartphones which do not have reliable Internet connections.
WAY FORWARD: From a technical perspective, a key challenge is to run the bot on cheap phones with intermittent Internet. It is possible to run small models on phones (indeed we do this in some of our projects), but running a large Llama 4 model on a phone would be an enormous challenge! Trust issues and business models also need to be addressed.
Summary
Achieving my vision and building global personal health assistants requires:
- Thorough understanding of what people want from such apps, perhaps based on analysis of usage logs supplemented with explicit studies.
- LLMs which are safe, use high-quality up-to-date knowledge, can work with confused patients, and (perhaps most challenging) can adapt to individual patients (context, circumstances, culture, health needs, skills, etc).
- Robust RCTs which show strong health benefits, leading to organisational commitment to use the technology.
- Health LLM models which can run on cheap smartphones with limited Internet.
None of the above is easy, but the exciting thing to me is that all of these are possible, with sufficient time and effort!