Accuracy errors in NLG texts go far beyond simple factual mistakes, for example they also include misleading use of words and incorrect context/discourse inferences. All of these types of errors are unacceptable in most data-to-text NLG use cases.
We’re thinking of organising a shared task on evaluating the accuracy of texts produced by NLG systems. Comments welcome, also let me know if you might participate.
I’ve been shocked by the fact that many neural NLG researchers dont seem to care that their systems produce texts which contain many factual mistakes and hallucinations. NLG users expect accurate texts, and will not use systems which produce inaccurate texts, not matter how well the texts are written,