ACL vs TACL Reviewing
This year I was both a TACL Action Editor and an ACL Senior Area Chair. This experience has reinforced my belief that the journal review process is better!
This year I was both a TACL Action Editor and an ACL Senior Area Chair. This experience has reinforced my belief that the journal review process is better!
Many problems in NLP papers can *not* be detected by reviewers who are checking submissions to conferences and journals. In medicine and many other field of science, people can raise concerns about papers *after* they are published, and authors are expected to take this seriously. This is not the practice in NLP, which is a shame.
In our ReproHum project, we have found that many NLP experiments are flawed, and many authors do not respond to requests for more information about their work. This is depressing and hinders scientific progress in NLP.
I dont like leaderboards, which encourage academics to write papers about small improvements on established tasks and datasets. I suspect (and hope) that chatGPT and similar systems will encourage people to move away from leaderboards. If so this would be great!
Is fraud (eg fabricating or falsifying data) a problem in NLP? It certainly is a problem in other scientific areas, and it wouldnt surprise me if it affected NLP as well.
Since commercial researchers dominate the “hot” area of large language models, I’ve seen a number of people ask “what should academic researchers focus on”. There are of course huge numbers of exciting and valuable scientific research questions which are not of much commercial interest, including long-term work which wont pay off commercially for 10+ years, high quality evaluation, socially useful but low-profit applications, and using NLP to research fundamental cognitive science questions.
I thought I’d end 2022 with a summary of the papers written by my students and I in 2022. All of them are about requirements, resources, and/or evaluation of NLG.
I dont like academic leaderboards. Poor scientific techniques, poor data, and poor evaluation means leaderboard results may not be worth much. I also suspect that the community’s fixation on leaderboards also means less research on important topics that do not fit the leaderboard model, such as understanding user requirements.
Quality assurance processes for academic research, notably peer review by unpaid volunteers, are very lightweight and miss many problems. Better quality assurance processes would require more resources and efforts, but would result in more trustworthy papers.
I was very happy to win an INLG Test of Time award for my paper “An Architecture for Data-to-Text Systems”, so I thought I’d write a few comments on it.