building NLG systems

What are the Problems with Rule-Based NLG?

I have read many academic papers which start off by criticising rule-based NLG, often in a way which shows that the authors have no understanding of what is actually involved in building rule-based NLG systems. I’ve often complained about this in the past, so I thought I would try to be more constructive and explain what is involved (at a very high level) in building a rule-based NLG system, and what some of the challenges and problems are.

NLG Pipeline

Rule-based NLG systems usually break up the NLG task into different stages. For data-to-text NLG, these stages can include (Reiter 2007)

  • Signal analysis: Looking for patterns in the data, usually using standard signal analysis and pattern detection algorithms.
  • Data interpretation: Producing insights (messages) about the data; this can include causal and diagnostic reasoning.
  • Document planning: Selecting which insights to include in the generated text, and ordering them.
  • Microplanning: Deciding how insights should be expressed, eg which words to use.
  • Surface realisation: Generating grammatically correct sentences. This is usually done with an existing software package, such as simplenlg.

For example, if the NLG system is summarising medical data, processing might include following

  • Signal analysis: identify a slow rise in body temperature.
  • Data interpretation: infer insight that a slow rise in temperature could be a sign of infection.
  • Document planning: decide that the “possible infection” insight is important and needs to be included in the text.
  • Microplanning: decide to use the word “increase” to describe the rise in temperature.
  • Surface realisation: produce grammatical text such as “an increase” (instead of “a increase”).

Of course there are variants. For example, signal analysis may not be necessary if the input dataset is simple and small, and data interpretation and document planning can be combined if we know exactly what insights users are interested in.

Anyways, when building a rule-based NLG system based on this pipeline, we usually need

  • Rules for creating insights (data analysis).
  • Rules for selecting and ordering insights (document planning).
  • Rules for expressing insights (microplanning).

We usually dont need rules for signal analysis, although we may tune or train pattern detectors. Likewise we usually dont need rules for surface realisation, although we may need to add lexicons (dictionaries) to cover domain terminology.

Rule-building process

Oversimplifying, there are three sources of rules: corpora, domain experts, and users.

Corpora: I usually recommend that people building rule-based NLG systems start by (manually) analysing a corpus of human-written texts (and underlying input data). At a high level, this process could include

  • Creating insights: Look for insight that appear in the corpus texts, and create rules for them.
  • Selecting/ordering insights: Analyse texts to understand when specific insights appear, and in which order.
  • Expressing insights: Create rules based on how insights are expressed in the corpus texts.

Domain experts: After an initial rule set (and ideally prototype) has been created from a corpus, the next step is to ask domain experts if the rules/prototype are correct, and to fill in gaps. There inevitably will be edge cases and unusual situations more generally which are not present in the corpus, and domain experts can advise on how to deal with these.

Users: The result of working with domain experts should be a refined prototype. The next step is to show the prototype to actual users and get their feedback. This is important because sometimes corpora contain texts which users dislike, and domain experts may have a limited understanding of what users actually want to see in an NLG text.

Of course the above process can iterate and indeed be agile in a software-engineering sense, it does not need to be a “waterfall” process.

Problems: Corpus

So what the problems in building a rule-based NLG system using the above process? The first set of issues arise with corpora.

Small or non-existent corpus: In most applied NLG projects I have worked on, it was impossible to get a good existing corpus of human-written texts. Often we ended up with a small set of examples (less than 10) which were written by clients or colleagues at our request, to help us build the NLG system.

Corpus doesnt match data: Human-written texts often include extra information which is not available in the input data and hence should not be generated by the NLG system. For example, a human-written sports story may include information about player injuries which is known to the human sports-writer but not present in the NLG system’s input data.

Corpus texts arent very good: Sometimes texts in a corpus are not very good, ie they are not perceived as useful and easy to read by end users. This is more likely if corpus texts are written under time pressure by people who are not skilled writers.

Corpus too large: In a few cases (this is less common), the corpus is too large to manually analyse. The kind of manual analysis described above takes time, and becomes infeasible for a corpus containing thousands or millions of texts.

Machine learning perspective: From an ML perspective, a small corpus is also a problem. Techniques such as fine-tuning reduce the need for a large corpus, but it is still the case that manual analysis can usually extract more “value” from a very small corpus than ML. On the other hand, a large corpus is not a problem, in fact its what we want for ML!

Corpus quality issues also impact ML. Not matching the data is a big problem, because when a neural NLG system is trained on a corpus which includes insights that go beyond the data, it usually tries to generate these insights regardless, which leads to hallucination. Poor corpus texts are also a problem, since an ML system will try to replicate what it sees in the corpus even if users dislike this.

More generally, the rule-based approach can compensate for corpus problems by relying more on domain experts and users; this is not possible with neural approaches.

Problem: Complex narratives

Writing a rule-based NLG system becomes painful when the texts need to communicate many diverse insights, and handle edge cases well. Authors basically need to write rules for each insight and edge case, and then (the real challenge) try to ensure that these rules fit together, in the sense that a text that communicates several insights will come across as a well-written and coherent narrative no matter which insights are selected and which edge cases apply. Its easy to get a combinatoric explosion and a huge number of scenarios which authors need to consider.

For example, if an NLG system is describing a sports event, it can communicate insights about individual events, players, teams, and the league. Each category includes a diverse range of insights; for example at the team level we can have insights about winning streaks, season record, results of previous matches with the opposing team, etc. Expressing these insights individually may be straightforward, although handling edge cases can be a pain. But robustly tying a sizable set of diverse insights together into a good story can be hard to do in a rule-based system, especially if different stories communicate different insights.

Machine learning perspective: It would be great if neural NLG systems could reliably generate high-quality narratives which integrated a substantial set of diverse insights into a good story. Unfortunately most neural data-to-text work seems to focus on generating short texts (1 or 2 sentences) which communicate a small number of insights.

Problem: Maintenance

When people start using an NLG system, they inevitably want changes made to the generated texts, in all of the above aspects: additional insights, different prioritisation and selection of insights, different expression of insights. Like any software system, NLG systems evolve over time as the world changes and as more people start to use the system.

Maintaining a complex rule-based NLG system can be hard, in similar ways to maintaining software systems more generally. It can be difficult to figure out which rules need to change and (even worse) changing a rule may unexpectedly break another rule. Having a well-structured and organised set of rules makes a big difference.

Machine learning perspective: Its not clear to me how neural systems are maintained even in principle, I dont think I’ve ever seen this discussed in the literature.


I started this blog complaining about uninformed criticism of rule-based NLG in academic papers. I will finish with the following rules for paper authors:

IF your paper addresses working with inadequate corpora, generating complex narratives that tie together a diverse set of insights, maintaining NLG systems, or other real challenges of rule-based NLG

THEN you are welcome to say that these are major problems for rule-based NLG.

ELSE do not criticise rule-based NLG or claim that what you are doing is not possible in rule-based NLG.

One thought on “What are the Problems with Rule-Based NLG?

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s