Linguists make a distinction between semantics and pragmatics. In an NLG context, semantics is the information explicitly communicated by a text, while pragmatics is information which users infer from a text and its context. For example, consider the below text, which is based on the results of a basketball game.
The Memphis Grizzlies defeated the Phoenix Suns. Mike Conley led the Grizzlies with 18 points, and Isaiah Thomas scored 15 points.
“Mike Conley led the Grizzlies with 18 points” is semantically incorrect, because in this game Conley in fact scored 24 points. “Isaiah Thomas scored 15 points” is pragmatically incorrect, because while Thomas did indeed score 15 points (so this statement is semantically correct), the context of this phrase implies that Thomas played for the Grizzlies, when in fact he played for the Suns.
To take another example, when we generated clinical summaries for doctors years ago in the Babytalk project, it was clear that doctors often interpreted statements such as “The nurse gave the baby morphine. The baby’s core temperature increased” to be causal, ie the morphine caused the temperature increase. If there was no causal link, then this statement was pragmatically misleading even if it was semantically correct (ie, the nurse did give the baby morphine, and the baby’s temperature did go up), and doctors were concerned about such incorrect inferences.
In short, when people read a sentence, they assume that it must fit into the narrative being told by the document as a whole, and in particular probably has a relation to the previous sentence. This leads to pragmatic inferences such as the ones mentioned above.
It is of course essential that NLG texts be accurate, and this means pragmatic correctness as well as semantic accuracy. A text which is literally/semantically true but nonetheless misleads the reader because of pragmatic problems is not acceptable.
I mention this because I am seeing an increasing amount of work on increasing accuracy in NLG, and also evaluating accuracy, but most of this focuses on simple semantic accuracy (“24 points” instead of “18 points“). Of course semantic accuracy is very important, but we need to enhance pragmatic correctness as well. I suspect that pragmatic correctness is going to be more challenging than semantic accuracy!
My focus here is on data-to-text, but I have seen similar problems in text-to-text. In extractive summarisation, for example, summaries are always semantically accurate (if the source was accurate), but they can still be pragmatically misleading.
Evaluating pragmatic accuracy is hard
Pragmatic correctness can be evaluated using careful human evaluation. But from a metric perspective, while there are some metrics that can evaluate semantic accuracy to some degree, current metrics are not able to evaluate pragmatic correctness. Craig Thomson and I ran a shared task on evaluating accuracy, and the best performing metric (Kasner et al 2021) spotted 75% of simple semantic errors (numbers and named entities), 50% of more complex semantic errors (inappropriate use of a word), but was not able to detect any pragmatic “context” errors. Indeed, none of the metrics submitted to the shared task could detect context errors. The metrics also struggled to detect “incorrect word” errors which included a contextual component, such as usage of “only other” (eg, “The only other Net to reach double figures in points was Ben McLemore“); note that “only other” has a contextual element because it can only be used when another player with this characteristic has already been mentioned.
Because pragmatic correctness is difficult and expensive to measure, very few researchers do this. Which is concerning; if we want to build NLG systems which are useful in the real world, these systems cannot mislead people through inappropriate pragmatics!
We dont know how to generate pragmatically accurate texts
One aspect of pragmatics has been investigated in depth in NLG, which is the generation of referring expressions such as “it” or “the big black dog“. We have had decades of research in NLG on referring expressions, including rule-based, neural, and psycholinguistically motivated approaches. Indeed, my most cited journal paper is about generating referring expressions.
However, referring expressions are just one aspect of pragmatics. There has been a bit of work on other types of pragmatics in rule-based NLG, but as far as I can tell pragmatic correctness has been completely ignored by the neural NLG community (except for generation of referring expressions).
I raise this issue because I’d love to see more researchers work on pragmatics, not because I have answers. The human evaluation protocol which Craig Thomson and I developed at least makes it possible to identify pragmatic problems, which I hope will give other researchers a starting point. But we need to create techniques for generating pragmatically accurate texts, as well as cheaper ways of finding pragmatic problems!