Retirement Plans: Travel and some academics
I hope to retire soon, and many people are asking about my plans. Basically I want to do lots of travel, say involved in academia, and perhaps do some writing.
I hope to retire soon, and many people are asking about my plans. Basically I want to do lots of travel, say involved in academia, and perhaps do some writing.
I strongly recommend that researchers do “sanity checks” on data, model outputs, and evaluation results, looking for anomalies. This can help detect data errors, model cheating, software bugs, and other flaws which distort experiments.
LLMs often “cheat” on benchmarks via data contamination and reward hacking. Unfortunately, this problem seems to be getting worse, perhaps because of perverse incentives. If we want to genuinely and meaningfully evaluate LLMs, we need to move beyond benchmarks and start measuring real-world impact.
Research culture is very important but also very hard to change. I suspect this is one reason why it is so difficult to get people to do more rigorous and meaningful experiments.
When building an NLG system, it really helps to understand what users want; this came up several times at the recent INLG conference. I discuss some of our work in this space, and give a few suggestions.
I review some data on usage of AI in healthcare, and conclude that the most common uses in 2025 are probably (A) giving personalised health information to patients and (B) helping clinicians write documents. We’ve worked on these topics at Aberdeen, but most researchers focus on AI for decision support, which is not widely used.
Ive seen a number of diagrams recently which are too complicated and difficult to understand. I explain some of the problems I see and give advice.
I am often asked about my experience blogging, sometimes by people who are considering writing their own blog. In this “meta” blog, I summarise my thoughts and experiences about my blog.
Most academic work assumes that hallucination is a binary feature: either something is a hallucination or it is not a hallucination. But this is too simplistic. In real-world contexts we see many subtleties, eg some hallucinations are much more damaging than others, statements which are literally true can still mislead readers because of context, and there are many borderline cases.
I am very excited by recent positive evaluations of NLG apps developed by my students to encourage safer driving in UK and Nigeria. We see statistically significant reductions in unsafe driving incidents in both UK and Nigeria. This has real potential to help address a major worldwide problem!