Reflections on blogging
I am often asked about my experience blogging, sometimes by people who are considering writing their own blog. In this “meta” blog, I summarise my thoughts and experiences about my blog.
I am often asked about my experience blogging, sometimes by people who are considering writing their own blog. In this “meta” blog, I summarise my thoughts and experiences about my blog.
Most academic work assumes that hallucination is a binary feature: either something is a hallucination or it is not a hallucination. But this is too simplistic. In real-world contexts we see many subtleties, eg some hallucinations are much more damaging than others, statements which are literally true can still mislead readers because of context, and there are many borderline cases.
I am very excited by recent positive evaluations of NLG apps developed by my students to encourage safer driving in UK and Nigeria. We see statistically significant reductions in unsafe driving incidents in both UK and Nigeria. This has real potential to help address a major worldwide problem!
The academic world has changed in many ways since I got my PhD in 1990. One of the worst changes is that researchers in 2025 usually need to pay thousands of pounds to publish their work. This is unfair to researchers with limited funding, and not good for science.
I recently published a paper and gave a talk about evaluating real-world impact. I got some great feedback from this, and summarise some of the suggested papers (including more examples of impact eval) and insightful comments (eg, about eval “ecosystem”) I received.
This is a personal blog, about a recent bike trip I did which was mostly in Netherlands.
My student Adarsa Sivaprasad is looking into what questions users of an AI prediction model actually have, and how these should be answered. Amongst other things, users seem to have more questions about what information a model considers than about how a model works.
We have a really nice NLP research group in University of Aberdeen, with a dozen researchers who work on topics such as evaluation, interpretability, health applications, cognitive aspects, and cross-temporal research. We regularly publish and win awards in top venues. Its exciting!
Its been around 6 months since my new NLG book was released. I summarise what I now think are its key messages, for rule-based NLG, ML and neural NLG, requirements, evaluation, safety/testing/maintainability, and applications.
Most LLM benchmarks and leaderboards are garbage. Unfortunately, it now seems that even the few “good” benchmarks (such as SWEBench and Chatbot Arena) are compromised because they are being gamed by the big LLM vendors, who tweak the benchmarks and rules so that their systems do better.