Hard to Change Poor Research Culture
Research culture is very important but also very hard to change. I suspect this is one reason why it is so difficult to get people to do more rigorous and meaningful experiments.
Research culture is very important but also very hard to change. I suspect this is one reason why it is so difficult to get people to do more rigorous and meaningful experiments.
Ive seen a number of diagrams recently which are too complicated and difficult to understand. I explain some of the problems I see and give advice.
The academic world has changed in many ways since I got my PhD in 1990. One of the worst changes is that researchers in 2025 usually need to pay thousands of pounds to publish their work. This is unfair to researchers with limited funding, and not good for science.
We have a really nice NLP research group in University of Aberdeen, with a dozen researchers who work on topics such as evaluation, interpretability, health applications, cognitive aspects, and cross-temporal research. We regularly publish and win awards in top venues. Its exciting!
There were lots of interesting papers in 2024. I describe a few of them, and also list others I have mentioned in previous blogs; all are about evaluation, experimental rigour, real-world utility, and/or healthcare applications.
A lot of experiments in NLP are neither rigorous nor replicable, and some NLP researchers don’t seem to care. Which is depressing… But I do see interest in incrementally improving experimental rigour and replicability, I strongly encourage people to try to do a bit better, even if they cant fix everything.
The UK government wants to reform the UK health system by digitisation, shifting care to communities, and focusing on prevention. I think there is a lot of potential for AI to help with this, if AI/Medicine researchers become less fixated on using AI to improve diagnoses in hospitals.
Systematic literature reviews are a powerful and useful methodology for investigating many research questions. I give a high-level overview for NLP researchers who are not familiar with this technique.
Our latest paper from the ReproHum project discusses experimental flaws we have encountered while reproducing earlier experiments, including code bugs, UI problems, inappropriate exclusion of data, reporting errors, and ethical lapses. Pretty depressing. These types of errors are not detected by usual NLP reviewing practices, so I suspect they may be pretty common…
One very positive aspect of 2023 for me was that I saw lots of really interesting research papers, much more than in previous years. Perhaps because the emergence of LLMs have encouraged some people to move away from scientifically dubious leaderboard chasing and towards more interesting research on scientific fundamentals? I describe a few of these papers here.