I recently read a shocking article on the Economist on fraud in (bio)medical research. No one knows for sure, but some people think 2% of published papers in biomedical research are fraudulent. Also, one survey in UK showed that almost 20% of academics admit to fabricating or falsifying data. Pretty scary, especially since people have probably died because of incorrect medical care driven by fraudulent research.
The driving factors for fraud are (A) intensive pressure to publish large numbers of papers and (B) reviewing/oversight processes which do not detect fraud. And of course this applies to NLP as well as medicine!! Indeed, I suspect the situation may be worse in NLP.
Problem: Conference publications
(Bio)medical researchers publish their findings in journals, while most NLP results are published in conferences. This makes it harder to detect fraud (since conference reviewing is less thorough), and it also makes it harder to act on suspected fraud.
One problem is lack of time. Many years ago I was slightly involved in a case where there were concerns that an xACL submission was fraudulent. Since fraud is a serious charge which could potentially lead to someone losing his job, it must be throroughly investigated, and the author must be able to respond to the concerns. We ended up deciding that because conference reviewing is on a very tight fixed timeline (in contrast to journals, which can take more time to process submissions in such cases), it was impossible to do a thorough and fair investigation, and hence we had to accept the paper because of the presumption of innocence. I believe that on a few occasions it has been possible to properly investigate concerns of fraud in xACL submissions, but I suspect the more common outcome is that investigations are dropped because of lack of time.
Another problem is unclear responsibility. If fraud is suspected in a journal paper after it has been published, then the journal editors are responsible for investigating it, even if they were not in the charge of the journal at the time the paper was published. For ACL conferences, however, this is not true. Lets suppose fraud is suspected in a paper published in ACL 2021. Investigating this is neither the responsibility of the ACL 2023 PC (they are just responsible for 2023 submissions), nor of the ACL 2021 PC (which has disbanded). My understanding is that the ACL Professional Conduct Committee would be responsible for investigating such cases, although the description of the committee on the ACL Wiki makes no mention of this responsibility.
Problem: Authors do not respond to questions
When I was a young researcher, I was always excited when I got an email from another who was interested in one of my papers and had some questions, wanted details, or was looking for data. I always responded, and saw this as a way of interacting and learning from other researchers, even when they pointed out problems in my work.
Unfortunately, this is not true in 2023. In my experience, most authors do not respond when asked for more information, details, or data about their work. This applies to PhD students as well as senior researchers; I remember emailing a student working on human evaluation with some questions, and he acknowledged my email but refused to answer questions (and did not respond to a followup email). In all honesty, I suspect a lot of NLP researchers see papers as “CV enhancers” more than scientific contributions. If you see a paper as science, you will respond to questions; if you see it primarily in CV terms, you will find it strange that anyone who is not thinking of hiring you would ask questions about your paper.
This matters for fraud because the way to detect and investigate fraud is to ask authors for details and data; this doesnt work in NLP since most authors ignore such requests. Also, in NLP we cannot use failure to respond as an indicator of potential guilt.
Does it matter?
Does fraud matter? After all, unlike medicine, fraud in NLP cannot kill people. However, other kinds of harm are possible, such as Theranos-style situation where fraud msleads investors and caused them to lose a lot of money.
Arguably, a much bigger problem in NLP than outright fraud is the low scientific quality of many papers, due to poor experimental design, experimental execution, or data analysis; this affects many more papers than outright fraud. Of course there is a grey zone between these. For example, if a researcher runs an experiment 1000 times and only reports the best result (which of course makes the results scientifically meaningless), is this dubious science or outright fraud? I guess my view is that its dubious science if the researcher explicitly acknowledges doing this in his paper, and fraud if he hides this fact; but others might disagree.
But anyways, I dont see this as either/or situation; we want to reduce both dubious science and outright fraud! And the factors I mentioned above (conference publication, authors not responding to questions) encourage dubious science as well as fraud.
I never thought much about fraud in NLP until I read the above-mentioned article, but I suspect we are very susceptible to it, for the above reasons. I’m not sure what to do about this, but it does worry me!