Last week someone ask me for advice about a setup where an NLG system produced a draft text, which would be edited by a human domain expert before being released; this is what the MT community calls post-editing. They also wanted to know if it was possible for the NLG system to automatically learn from human post-edits.
This kind of thing is often suggested for contexts where an NLG system produces texts which are mostly OK, but occasionally need fixing before being released to users (for example, because handling of edge cases is limited).
I’ll first review some of the literature on the topic, and then conclude with some practical advice.
Post-editing weather forecasts
In the early 2000s, a meteorological company used our SumTime weather-forecast generator to generate draft forecasts, which were post-edited by human meteorologists before they were sent to clients. We analysed the post-edits, and reported the results in a paper (https://www.aclweb.org/anthology/W05-1615/ ). Perhaps the most important finding was that the post-edits were very variable. Some forecasters just fixed mistakes in edge cases, but others rewrote large numbers of forecasts to be in the style which they personally preferred. Some of the post-edits probably *decreased* text quality for users, which may reflect the fact that the computer system often had a better understanding of appropriate word choice than some of the human forecasters.
In short, the post-edits were very “noisy”. Some were valuable, especially for appropriate handling of edge cases. Others seemed less useful (eg, idiosyncratic stylistic edits), and a few probably made the texts worse. This meant that
- Manually analysing the post-edits gave us some good insights on improving the NLG system.
- The post-editing process was very wasteful, since I believe that most of the edits did not significantly improve the texts.
- Automatically learning from the post-edits would be challenging because of the noise.
I did wonder at the time whether a better approach would be for the NLG system to itself select texts for post-editing (which I believe happens in some MT systems). Ie, the NLG system would give a confidence to its texts, and only texts with low confidence (which typically means unusual edge cases) would be sent to human domain experts for post-editing. We didnt try this, though.
I’m not aware of much other published research on post-editing NLG texts. A PhD student I am helping, Allmin Susaiyah, is looking at using feedback from users to improve NLG texts, which is related to post-editing. Allmin is focusing on content selection; ie he asks users to indicate which content is useful, and uses this to improve content-selection models. This is in work in progress so not a lot written so far, but there is a bit in http://ceur-ws.org/Vol-2596/paper9.pdf.
Another student (Craig Thomson) and I are looking at the amount of effort required for a person to check an NLG text for accuracy. What we are finding is that this takes a considerable amount of effort, far more than fixing grammatical and other linguistic mistakes.
The Machine Translation community has looked at human post-editing of MT texts for decades. The Wikipedia article is a bit dated but describes the basic concepts. Basically post-editing makes sense in some cases, but in other cases its faster for the human domain expert to translate from scratch than to post-edit the MT output.
The MT community is also looking at learning from post-edits, see for example https://www.aclweb.org/anthology/W18-6452/ . This work seems to suggest that learning post-edits is more useful when improving the output of rule (phrase) based MT systems, and less useful when improving neural MT systems.
As I have mentioned in other blogs, many years ago the Scottish commercial AI pioneer Rob Milne told me that the best way to use AI commercially was to fully automate “boring” tasks that people preferred not to do. From this perspective (and also bearing in mind that fact-checking NLG texts is very time-consuming), I suspect that if we want humans and NLG to jointly create a text, then the best approach may be for the NLG system to create the “straightforward” content in manner which is guaranteed to be readable and accurate (so the human doesn’t need to fact-check or revise wording), and then for the person to add more complex content.