Texts produced by NLG systems can be evaluated in terms of accuracy (content is correct), fluency (text is readable), and utility (text is useful). I discuss these three “dimensions” of NLG evaluation.
I’ve been shocked by the fact that many neural NLG researchers dont seem to care that their systems produce texts which contain many factual mistakes and hallucinations. NLG users expect accurate texts, and will not use systems which produce inaccurate texts, not matter how well the texts are written,
I am getting so fed up with UK politics that I will break my “no politics” rule in this blog and express my frustration with the Brexit mess and the way politicians have handled it.
I’m impressed by the diversity of NLG researchers at Capetown, from a gender, race, and disability perspective. An inspiration the rest of us!
When we try to use ML in commercial NLG contexts, one of the challenges is that NLG developers want to be able to customise, configure, and control their systems. So we need ML approaches which do not stop devs from configuring things they are likely to want to change.
I’m beginning to think that in some ways the NLP community *encourages* researchers to use poor-quality or otherwise inappropriate data sets. Which is a truly depressing thought…
Some thoughts on key NLG challenges in explainable AI: evaluation, conceptual alignment, narrative. Comments are welcome!