academics

I am worried by NLP research culture

I was asked to give a closing talk at Retroeval last week, and decided to talk about research culture. In most respects NLG and NLP have made huge improvements since my PhD in 1990. However, research culture in 2026 is worse than in 1990. This is largely because in 1990 we cared about scientific findings, while in 2026 many people focus on getting N papers in xACL in order to meet a career goal (start PhD, finish PhD, get promoted, etc). Poor reseach culture really concerns me.

My talk led to a lot of discussion, so I thought I would explain my concerns in a blog. See also my blog on Hard to Change Poor Research Culture

What is Research Culture?

The Royal Society states that

Research culture encompasses the behaviours, values, expectations, attitudes and norms of our research communities. It influences researchers’ career paths and determines the way that research is conducted and communicated.

Research culture is a specific type of organisational culture. Organisational culture is very important for all sorts of organisations, including businesses and sports teams as well as research labs. At Arria I worked for a while with Barbara Kendall, who is an Olympic gold medalist who consults for companies about organisational culture. Barbara was passionate about the importance of culture, and had numerous examples of the importance of culture from both her sporting career and companies she worked with.

Keep in mind that culture is a property of organisations or communities, not individuals! Research culture can specify how community members typically behave, but individuals in the community can choose to behave differently.

Problem: NLP Research Culture in 2026 does not prioritise scientific rigour

I do not think that NLP research culture in 2026 prioritises scientific rigour. As I have written in papers and previous blogs, I see many problems in scientific rigour in current NLP research, including

  • Usage of poor datasets, even when better ones are available (blog).
  • Use of benchmarks that do not very meaningful (blog).
  • Continued use of obsolete benchmarks such as ROUGE because they are “standards” and used in leaderboards (blog).
  • Lack of interest or support for reproducibility (blog and paper).
  • Poorly executed experiments  (blog and paper).
  • Major data contamination issues (blog).
  • Minimal interest in real-world impact  (paper).
  • Fixation on beating leaderboads (blog).

I could easily expand this list! From a culture issue, the problem is that NLP culture allows and in some cases even encourages the above behaviours; it does not encourage good scientific “behaviours, values, expectations, attitudes and norms” (from Royal Society definition). A research culture that focused on scientific rigour would discourage use of poor data sets, encourage meaningful benchmarks, insist that research be reproducible, etc. But NLP culture does not do this. It also places huge pressure on many younger researchers to publish large numbers of papers in xACL conferences, which encourages scientific “corner cutting”. Unfortunately this can then can become internalised as normal behaviour for the researchers concerned.

Of course many researchers still try to do careful science! Again culture is about the community, not individuals.

Problem: Cheating and Fraud

One of the most depressing things about the NLP community in 2026 is rapidly increasing cheating and fraud. I wrote a blog 3 years ago where I expressed concern that NLP was very susceptible to fraudulent research. At the time this was theoretical, fraud was still very rare. Unfortunately, in 2026 it is happening. The clearest sign of fraud is hallucinated citations in LLM-written papers, and we see a lot of this, indeed ACL in 2026 desk-rejected 100 submissions because they contained hallucinated citations. A much more serious problem is hallucinated data or analyses. This is much harder to detect, especially when authors resist attempts to replicate experiments (see above), but unfortunately I suspect there is a lot of this as well.

From a cultural perspective, the above shows that our research culture does *not* strongly discourage cheating and fraud. In medical research, fraud is taken very seriously and researchers who commit fraud suffer serious career penalties. In NLP, if reviewers detect that a paper is fraudulent then it will be rejected, but no other action is taken, and the authors are free to submit the paper elsewhere and indeed to continue hallucinating fraudulent papers. This encourages a culture of “fraud and cheating are OK as long as you dont get caught”. Having said this, I was happy to see a recent announcement by Arxiv that “authors” of dubious LLM-written papers would be banned from submitting to Arxiv for a year; perhaps this is a sign that things are changing.

We need a research culture which strongly discourages fraud and cheating as unethical even if you do not get caught. I think we had this in the 1990s, not least because fraud makes no sense if you are motivated by science. But in 2026, many people are mostly motivated by need to publish large numbers of papers (as above), and this can encourage fraud.

Problem: Openness

Last but not least, NLP research culture is not open to new ideas and new people. ACL has always suffered from a fixation on whatever is trendy (tree-adjoining grammars in early 1990s, LLMs in 2026), but this has gotten worse recently, to the point where people who do not jump on the LLM “bandwagon” struggle to get published. I have seen reviews from senior and respected members of the community which recommend rejecting a paper purely because it does not use neural models (and hence cannot be interesting or significant regardless of results); I have even seen reviewers flag non-neural papers as “out-of-scope” for ARR. I have also been told by PhD students that they have abandoned promising non-neural approaches because it was very hard to get published. This is atrocious research culture; academics should be “scouts” who are exploring crazy new ideas and approaches! A related issue is that use of AI tools to support research can reduce the scope of scientific research (Hao et al 2026).

From a people perspective, the NLP community’s fixation on mega conferences such as xACL make life difficult for researchers who

  • do not have funding to attend such conferences
  • find travel difficult because of caring responsibilites (which impacted me for most of my career) or visa problems
  • have disabilities (blind, mute, etc) which make conference presentation difficult

We also make life difficult for researchers from other scientific fields by insisting that papers are written using Latex, which few people outside of CS use (in the past xACL supported submissions in MS Word, but this is no longer the case).

The above types of people can all make valuable contributions to NLP research, so excluding them makes no sense if our research culture genuinely values science!

Final thoughts

The research culture of the ACL community is deeply flawed, which is very depressing. However, I do think that the research culture of the “INLG” community is much better. I hope this can be preserved and indeed will spread!

Also, culture is about communities, not about individuals. I encourage my readers to prioritise science, do rigorous experiments, avoid cheating/fraud, and be open to new ideas and people; and to encourage colleagues and students to do likewise!

Leave a comment