Embedding Machine Learning in a Rules-Based NLG System

At Aberdeen University, we have an NLP reading group which meets 2-3 times a month, usually to discuss a research paper (sometimes we have presentations by visitors, or general discussions not linked to a paper). Its a nice group, usually we get some people from Arria as well as from the university.  Its open to all, contact me if you’re in the Aberdeen area and want to attend.  Very recently I have started keeping a record of papers discussed in this website, on Aberdeen NLG/NLP Reading Group: Papers Read

Anyways, last week we discussed A Statistical, Grammar-Based Approach to
Microplanning by Gardent and Perez-Beltrachini.  Putting aside the details of this paper, one thing I really liked was the high-level approach, of having a grammatical/rules framework which defined a space of sensible alternatives (which was not huge), and then using ML essentially to choose alternatives.  One of the people in the reading group, who has a background in KPML, commented that it felt like using ML to build choosers in a systemic functional grammar.

Combining Rules and Learning

I am a strong believer in “hybrid” architectures that combine rules and learning.  If we think of NLG as a decision-making process, there will always be some simple high-value decisions, which should be made by rules.  A learning approach to such decisions would be risky, and would seriously inflate quality assurance and testing costs (and for many NLG systems, QA/test is more expensive than writing the code).  On the other hand, we also usually have complex low-value decisions (which individually are not very important, but collectively make a difference).  Its not cost-effective to write rules for all of these, learning definitely makes sense.

To give a concrete example, consider verb choice in weather forecasts.  A colleague once showed me an ML-based forecast generator which produced “winds 8-12 decreasing to 16-20”.  This is completely unacceptable, we cannot use the verb “decreasing” when the wind speed is going up!  Using a verb which correctly communicates direction of change is an example of a simple high-value decision (“high-value” because getting this wrong will destroy readers confidence in the forecasts), and this should be explicitly specified in a rule.

So we need to use a verb which communicates that the direction of change is up; but should we say “increasing” or “rising”?  This is an example of a complex low-value decision.  It’s complex because it depends on context, especially the previous phrase.  For example, if we consider the below statements

  1. “wind speed 12-16 decreasing 8-12 then increasing 16-20″
  2. “wind speed 12-16 decreasing 8-12 then rising 16-20″
  3. “temperature increasing to 20 and wind speed increasing to 16-20″
  4. “temperature increasing to 20 and wind speed rising to 16-20″

(1) is a bit better than (2), because increase/decrease are a pair, so this expresses the contrast a bit better.  However, (4) is a bit better than (3),  because it’s good to vary the verb used to express upward trend.  But anyways, although some of these choices are a bit better than others, none of them are catastrophic, so this is a “low-value” rather than “high-value” choice, and learning may be an attractive option.

Hybrid Architectures

Hybrid NLG architectures that combine rules and learning have been around for decades.  One of the earliest approaches was Langkilde’s overgenerate-and-select model, where a rule-based system proposed several possible texts, and a statistical language model chose one of these texts as the actual output.  But although this architecture is intellectually very appealling, it does not seem to have been very successful, and certainly I hear less about this now than 10 years ago.  I personally have only once seriously tried this approach (to enforce a length constraint), and even here we concluded it was not the best approach.

The approach I favour, which is the one used in the Gardent and Perez-Beltrachini paper we read, is where the overall architecture is rule-based structure, and ML modules invoked to make specific choices.  In other words, to paraphrase my colleagues, ML “choosers” embedded in a rule-based framework.  I have used this aproach many times over the years, and been very happy with it.

I guess another architecture, at least in theory, would be to add rules to a learning-based architecture.  But I dont think I’ve ever seen this done well, and its not obvious to me how one would even do this with a neural network or deep learning appraoch, which is all the rage in 2017.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s