On several occasions over the past few weeks, I’ve struggled to read a paper because either (A) the author made mistakes in numbers, tables, references or formulas; or (B) the author used words or graphical symbols to mean different things in different parts of the paper.
For example, on one occasion I struggled because the paper referred to Fig 4 in Section 3, and I could not understand why Fig 4 was relevant to the point being made in Section 3. Eventually I realised that the author meant Fig 5.
On another occasion I struggled to understand a set of diagrams in a paper. I eventually realised that although the diagrams looked similar, arrows meant different things in different diagrams.
These kinds of mistakes are really easy to make as an author. You may not even notice if you use words or arrows to mean different things in different places, because its obvious to you what is meant when you read the draft. Such mistakes can also be hard to detect, for proof-readers as well as authors (since proof-readers need domain knowledge to check such mistakes). But they are also really distracting and confusing for a reader who is trying to understand what you have done. Its easy for me as a reader to detect and resolve spelling and grammar mistakes; its much harder for me as a reader to realise that a table or formula is wrong, and may be impossible for me to resolve such mistakes if I detect them.
There is no easy solution to this problem, but I do recommend that people writing papers do the following.
Check boring details: Check for incorrect figure (etc) references (eg, Figure 4 when the right figure is Figure 5), incorrect citations (eg, Jones 2017 when the right citation is Jones 2018), and incorrect values in tables (eg, 0.27 where the correct value is 0.72). This kind of thing is pretty boring, but should not be difficult. And it is important; otherwise your reader may spend ages staring at Figure 4 trying to figure why it is relevant to Section 3, or spend a lot of time reading Jones 2017 when this paper has nothing to do with the readers interests.
If you’ve written your paper in Latex, check the final version, dont just check the Latex source.
Check formulas: Check that mathematical formulas are correct. This is harder than checking references and citations, but it still needs to be done. Otherwise your readers may get seriously confused (this has happened to me). If your formula is implemented in code, you should check what is written in your paper against what is in your code.
Check consistency: Check that you use words and graphical constructs (eg, arrows) consistently. This again is hard, because language is inherently ambiguous, and we’re very good at seeing the intended meaning of a word. Which makes it hard to detect inconsistency and ambiguity! This is an area where it can be useful to get someone else to check your paper, provided this person has enough knowledge of the domain and terminology to detect consistency problems.
Get your co-author to check your paper: Last but not least, get your co-authors to check the paper. Most papers in NLP and CS have multiple authors, and (at least in principle) all authors of a paper are responsible for its correctness and quality. I realise that some student have supervisors who insist on putting their name on the student’s paper without doing much to help write the paper. This is wrong, and feel free to refer such supervisors to my blog on this topic if you think this will help.
I realise that checking “boring details” of papers is, well, boring, and not nearly as much fun as doing experiments or writing up the amazing discoveries you have made. But is important, if you want people to understand and build on your work!