When I talk to people who use NLG systems, usually what they most care about is content; they want texts to communicate interesting, useful, and accurate information. But I see little work on the content side of NLG in the research community, which is a shame. Certainly distilling useful insights out of complex data is absolutely essential to achieving my vision of using NLG to humanise data and AI.
Content is more important (and harder) than language
In order for an NLG text to be useful to a reader, it needs to communicate something which is useful in a way which is understandable. Texts which do not communicate information are not useful, and neither are texts which are impossible to understand.
Once we have reached the minimum thresholds for useful content and understandable language, in my experience NLG users usually place more importance on better content than on better language. In other words, lets say users have a choice between
- A text which is awkwardly written but understandable, and communicates a lot of useful information
- A text which is very well written but communicates a minimum of useful information
In my experience, at least with data-to-text systems, (1) is preferable to (2) in almost all use cases that I have worked in. Of course ideally we want both well-written language and insightful content! The point I’m making is just that (once we have passed minimum requirement for understandability), users want better content more than they want better language.
A perhaps related point is that in the vast majority of the applied NLG projects I have worked on, we spent more time building and tuning algorithms for content generation than we did on algorithms for producing high-quality language. Partially of course because this was the part of the NLG system which our users cared about the most!
I know current NLG shared tasks (such as GEM) focus on tasks where content generation and selection is usually very straightforward, such as E2E and WebNLG. This may be OK for research purposes, but people should keep in mind that such tasks are not representative of real-world NLG (at least in my experience). Even for research purposes, I for one would like to see shared tasks which include significant content generation, so that people become more aware of the challenges of content determination.
Articulate analytics
Its also worth noting that we need content which is easily expressible in language, and perhaps content which is better expressed in words than in graphs. So we cannot just plug in a random data-analysis module to generate content, we need to have some idea of what content works well in NLG texts.
This is sometimes called “articulate analytics”. Yaji Sripada and I wrote a paper about this almost 20 years ago, for KDD 2003, where we tried to use the Gricean maxims to help define which analytics were articulate in the above sense. I’ve seen little such work recently, which is a shame, because an understanding of articulate analytics is hugely important in data-to-text. Perhaps I’m being too cynical, but I wonder if part of the problem is that trying to understand which analytics works best in NLG texts is fundamentally an HCI (human-computer interaction) as much as (or more than?) a computational linguistics question, and NLP researchers seem less interested and aware of HCI in 2021 than they were in 2003.
We need better insights!
I guess the “bottom line” is that I am disappointed to see so little research in “articulate analytics” or indeed content creation more generally in NLG. As mentioned above, from a pragmatic perspective this is very important, and absolutely essential to achieving my vision of using NLG to humanise data and AI. From a scientific perspective, there are lots of very interesting research questions, many of which are on the border between NLP and HCI. So lets see more research in this area!
Good points Ehud! I think the key question is what is “useful information” and how an NLG system can make a decision on that.
LikeLiked by 1 person
I would posit that information is useful if and only if it leads to correct decisions when plugged into a decision-making function.
Plugging the entire dataset into our decision-making brains isn’t useful, because we miss things. There are also a lot of true statements you can make about a dataset which yield incorrect decisions (e.g. correctly observing that the AC is running full blast in hot rooms, but turned off in cold rooms might lead to the incorrect decision to turn off the AC in order to cool the room down.)
The challenge is that there are lot of scenarios where the decision-making function isn’t well defined, and it is therefore unclear what information will trigger correct decisions.
I personally don’t think usefulness is the only criteria by which NLG can be judged. We could instead demand entertainment, for example. As an example, if we took a videogame and recorded the game state, we might want an NLG system to generate a maximally entertaining account of the game events.
LikeLike