building NLG systems

Language is diverse!

I am working on a new NLG component for Arria, and recently I talked to one of Arria’s product groups about using this component. They gave me some scenarios and data sets, and I realisted that my component didnt process these correctly, because the language in their domain was different from the language/corpora I had used to develop and train my system. I can fix this, but the episode reminded me of the importance of taking domain/genre diversity into account when developing NLG components.

I think this kind of thing happens a lot. We all realise that language is diverse, and different words, symbols and syntactic constructs are used in different domains. But its all too easy, especially for researchers who primarily work on a particular data set or shared task (as many do), to focus and optimise for this data set, and not really think about whether the techniques would work in a different genre.

So in this spirit, I thought I would show some texts from different genres (NLG outputs, not corpus texts) I have worked on over the years, to emphasise the point that these are very different!


Below is an extract from an NLG-generated election text published by the BBC.

Florence Eshalomi has been elected MP for Vauxhall, meaning that the Labour Party holds the seat with a decreased majority.

The new MP beat Liberal Democrat Sarah Lewis by 19,612 votes. This was fewer than Kate Hoey’s 20,250-vote majority in the 2017 general election.

Sarah Bool of the Conservative Party came third and the Green Party’s Jacqueline Bond came fourth.

We can see that such texts include well-written sentences with relatively simple vocabulary, structured into topical paragraphs.

Sports stories

Below is an extract from a sports story produced by a neural NLG system. It has many factual errors, but for my purposes here lets ignore this and just look at the language.

The only other Raptor to reach double figures in points was Dwyane Dragic, who came off the bench for 22 points (9-17 FG, 3-7 3Pt, 3-3 FT), six rebounds and five assists.

As with the media stories, we have well-written sentences structured into paragraphs. The vocabulary is more domain-specific, eg general readers may not understand what “off the bench” or “assist” mean. The texts also also present numbers in a domain-specific manner, eg “(9-17 FG, 3-7 3Pt, 3-3 FT)”.

Weather forecasts

Below is an extract from a marine weather forecast produced by the SumTime system

WIND(KTS) 10 M W 8–13 backing SW by mid afternoon and S 10–15 by midnight.
WIND(KTS) 50 M W 10–15 backing SW by mid afternoon and S 13–18 by midnight.
WAVES(M) SIG HT 0.5–1.0 mainly SW swell.
WAVES(M) MAX HT 1.0–1.5 mainly SW swell falling 1.0 or less mainly SSW swell by afternoon,
then rising 1.0–1.5 by midnight

We can see that these texts are not conventional sentences, they follow their own “weatherese” grammar. They also are structured as table entries, not as paragraphs.

Medical decision support

Below is an extract from a text produced by the Babytalk BT45 system, to help a doctor decide how to treat a baby in a neonatal intensive care unit.

By 14:27 there had been 2 successive desaturations down to 56. As a result, Fraction of Inspired Oxygen (FIO2) was set to 45%. Over the next 20 minutes T2 decreased to 32.9. A heel prick was taken. Previously the spo2 sensor had been re-sited.
At 14:31 FIO2 was lowered to 25%. Previously TcPO2 had decreased to 8.4. Over the next 20 minutes HR decreased to 153

Here the sentences follow conventional English syntax, but the language is very technical, with specialised medical vocabulary. This text is also organised as a timeline.

Summary of doctor-patient consultation

Below is an extract from a summary of a doctor-patient consultation.

3/7 hx of diarrhea, mainly watery.

No blood in stool. Opening bowels x6/day.

Associated LLQ pain – crampy, intermittent, nil radiation.

In this example, we see both highly specialised vocabulary, and also non-standard syntax.


I could go on and give examples from chatbots, patient information systems, educational feedback reports, business intelligence texts, etc, but hopefully the above make the point that the language in different NLG applications can be very different! In particular

  • Syntax: can be normal English, but can also be genre-specific (eg, weather reports) or highly abbreviated (consultation summaries).
  • Vocabulary: can be generic (media), involve some domain-specific terminology (sports), or be very technical and specialised (clinical).
  • Numbers: many domains have specialised ways of presenting numbers, including sports, weather, and consultation summaries.
  • Document structure: possibilities include topical paragraphs (news stories), tables (marine weather), and timeline (clinical)

So going back to my original point, if we develop an NLG component based on one of the above domains, it may not work well on the others even if it does brilliantly in our training data. For example, a component develope from newswire data may not work well for weather forecasts or consultation summaries.

So if our goal is to develop generic NLG technology, we need to keep the above diversity in mind, and ensure that our systems are developed and tested on many different domains and genres!

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s