My new NLG book was released around 6 months ago. I’ve been reflecting on what I see as key messages in the book, based on comments from other people and my own thoughts. Below is a summary. If it looks interesting, please do look at the book; there is a draft on Arxiv if you dont want to buy it.
General: One guiding principle is that the book tries to focus on things which will still be relevant in 5 or 10 years. So the focus is on core concepts, not the latest developments in LLM technology!
Introduction (chapter 1): This chapter basically summarises what is in the rest of the book. It also includes a short history of NLG, which some people have found to be interesting.
Rule-based NLG (chapter 2): Not very sexy, but I think everyone who is serious about NLG should have some understanding of rule-based NLG. Partially because a lot of real-world systems still use it, especially in mission or safety critical contexts, sometime in combination with LLMs. Eg, rule-based NLG is used to generate core content which must be correct (and correctness is testable and verifiable), and LLMs are used to improve this and/or add additional (less critical) information. But also, I think understanding rule-based NLG gives people a conceptual grounding in the sorts of things that NLG systems need to do, which is useful even if they use other techniques.
Machine learning and neural NLG (chapter 3): This chapter is largely based on my frustration in using ML/neural techniques (which has been the case for 25 years). They can do great things, but they have limitations which proponents (and media) often ignore. So anyways, after a brief introduction to ML/neural techniques, I talk about fundamental limitations due to data issues, domain shift, auditability, etc. All of these are inherent to ML/neural, they will not go away tomorrow when the next version of GPT is released.
Requirements (chapter 4): If we are serious about building (and evaluating) useful NLG systems, we need to understand what users want! This is absolutely fundamental, but is often ignored by academics. Of course there is a large generic literature on software requirements, I focus on key issues which are more NLG-specific, including quality criteria, workflow, and combining text and graphics. I also talk about techniques for acquiring requirements for NLG systems.
Evaluation (chapter 5): I find that a lot of people have a very narrow view of evaluation, my goal here is to describe the big picture, and also give practical advice. So I talk about automatic, human, impact, and commercial evaluation. I describe different techniques, experimental design issues, and what can go wrong. I also talk about key generic issues such as statistical significance, replicability, test data, ecological validity, and stakeholder perspectives. I think a lot of the evaluations I see (both academic and commercial) would benefit from a better understanding of the big picture.
Safety, testing, and maintenance (chapter 6): This chapter covers some other important topics which NLG practitioners should be aware of. My focus on safety is product safety, ie reducing the chance that the NLG system will harm users or third parties (I dont look at society-level issues such as stopping terrorists). Testing is something which is very important in real-world apps, but largely ignored by academics. Maintenance is also almost completely ignored by academics, but is very important in real-world; how do we ensure an NLG system still does the right thing when the world changes?
Applications (chapter 7): I’ve been on-and-off involved in building real-world applied NLG systems for over 30 years. While the technology has radically changed over this period, the criteria for successful applications is still largely the same. I discuss key criteria such as scalability, data availability, adaptability, acceptability, and trust. I then look at some types of NLG applications whch have been around for a while (automatic journalism, business intelligence, summarisation, medical), and discuss what we have learnt and how the the above criteria apply in these domains.