I have been working on natural language generation (NLG), that is using artificial intelligence techniques to produce texts in English and other human languages, since I got my PhD in this area in 1990. In late 2022, NLG became much more prominent because of the impressive capabilities of large language models such as ChatGPT, which was exciting. However, discussions about NLG in both academic and commercial circles have become focused on the latest developments in language models, with little attention was paid to what had been learned about NLG before ChatGPT.
My goal in this book is to present a broad overview of NLG which talks about language models, but also looks at alternative approaches to NLG, requirements (what users want NLG systems to do), evaluation, safety and testing, and sensible applications of NLG. I hope that this broad perspective, which builds on decades of work on NLG, will be helpful to both researchers and developers who work in this area.
I co-authored a book about NLG in 2000, and I saw that while the technology content of my 2000 book quickly became out of date, people kept on using it in 2010 and even 2020 because the high-level conceptual and methodological material was still useful. With this experience in mind, I have focused this book on high-level concepts and methodologies; my hope is that this material will still be useful in 2030 and perhaps even 2040. I do not attempt to describe the latest technologies, because this information quickly becomes dated (indeed, anything I write in June 2024 will probably be out-of-date by the time the book is published); readers interested in the latest developments in language models should look elsewhere.
In many places I show outputs from ChatGPT and other large language models. Most of these were all produced in 2023 (I deliberately do not include version numbers or specific dates), and readers should bear in mind that models in 2025, let alone 2030, may produce different outputs. However, the high-level points I am making should still be valid.
I also focus in this book on my own experiences. Where possible I use examples from systems which I have worked on or otherwise been involved with, even if they are not the best known system in their area; for instance for this reason I talk about the (somewhat obscure) BLOOM language model as well as better-known ones such as GPT. More generally the book focuses on data-to-text NLG (systems which use NLG to summarise and explain non-linguistic data), because this is my personal interest. I also include personal notes throughout the book. I hope this personal focus makes the book more interesting to readers.
Specifically, the book has the following chapters:
- Introductionto NLG: I present some example NLG systems, summarise the content of the rest of the book, and also give a short history of NLG.
- Rule-Based NLG: I describe how AI systems can generate texts using algorithms and rules which explicitly make decisions about the content and language of generated texts. Rule-based NLG has been overshadowed by neural NLG in recent years, but it is still the best way to build some NLG applications. Rule-based NLG also shows the types of decisions which need to be made in text generation, and I think a good understanding of this helps anyone working in NLG, even if they use other approaches.
- Machine Learning and Neural NLG: I give an overview of machine learning and neural approaches to NLG, including language models. This area is changing very rapidly, and models which were exciting state-of-art a few years ago are now obsolete and forgotten. Because of this, I just give a high-level overview of basic concepts behind models, and then discuss data and other issues which are important regardless of the model used.
- Requirements: As with any type of software, knowing what users and stakeholders are looking for is essential in building a successful NLG application. I look at some of the different quality criteria that people may care about, workflows for using NLG (including ‘human-in-loop’), textual vs graphical presentation of information, and methodologies for understanding (acquiring) requirements.
- Evaluation: This is the longest chapter of this book, which reflects my interest in the topic as well as its importance. From both a scientific and practical perspective, it is essential to evaluate how well NLG algorithms, models, and techniques work, using experiments which are rigorous and replicable. I discuss basic evaluation concepts, and then describe techniques for human and automatic (metric) evaluation. I also look at evaluating real-world impact of NLG systems, as well as commercial evaluation.
- Safety, Testing, and Maintenance: Society expects that AI systems used in the real-world will be safe (not harm users or third parties); systems which are not safe will not be allowed by governments and regulators. I examine safety concerns and techniques in NLG, and also look at software testing, which is used to identify bugs and other problems which could lead to unacceptable behaviour. I conclude with a section on maintaining NLG systems, which is very important (most of the lifecycle costs of software systems are in maintenance) but poorly understood.
- Applications: NLG is not just an academic discipline, it is also a technology which can be used to build useful applications which help people. In this chapter I discuss some fundamental issues (such as scalability), and then look in more detail at four areas which NLG has been used commercially for many years: journalism, business intelligence, summarisation, and medicine. Lessons from these long-standing NLG use cases can be applied to newer applications of NLG.