In a recent blog, I said that I thought NLG was currently most commercially successful in financial reporting. However, at the INLG conf in early Nov (2018), the application that was talked about the most was product descriptions. This was partially because of the recent End-to-End (E2E) challenge, which attracted a lot of interest and focused on describing restuarants. But there were also plenty of papers about product description that had nothing to do with E-to-E, including papers from companies and from my PhD students.
I’m using product description in a broad sense here, to include hotels and restaurants as well as cameras and phones.
Anyways, some of the INLG product-description papers which made an impression on me personally were:
[Related to End-to-end Challenge]
- Presentation about the E2E challenge: interesting high-level insights about topics such as evaluation and hallucination (I liked the presentation more than the formal paper in the proceedings).
- Can Neural Generators for Dialogue Learn Sentence Planning and Discourse Structuring? showed that a neural network could be trained to use aggregation to generate texts with a target number of sentences.
- E2E NLG Challenge: Neural Models vs. Template: built both a deep-learning and template bases E2E system, and discovered that it took a lot *longer* to build the deep learning system.
- Generating E-Commerce Product Titles and Predicting their Quality: A team from EBay describes a technique for automatically generating titles for items sold on EBay.
- Multi-Language Surface Realisation as REST API based NLG Microservice: A demo from Ax Semantics, which does a lot of product description work for e-commerce.
- Comprehension Driven Document Planning in Natural Language Generation Systems: Document planning based on psycholinguistic theory of language comprehension, applied to product descriptions.
- Generating Summaries of Sets of Consumer Products: Learning from Experiments: Generating descriptions of sets of products (instead of individual products)
There were many other interesting papers about product descriptions at INLG, check the proceedings to learn more.
Product Descriptions vs Financial Reporting
Financial reporting and product descriptions, are very different applications of NLG, both commercially and technically. Differences include:
- Readers: Financial reports are usually read by people with domain expertise, whereas product descriptions are usually read by the general public.
- Length: Product descriptions are usually (with some exceptions) just a few sentences long, while financial reports can be tens of pages.
- Multimodality: Financial reports usually include graphs and tables as well as words, and are often embedded in interactive business intelligence (BI) tools. This is less common in product descriptions.
- Analytics: Financial reporting is a data-to-text task, where data analytics need to be integrated with NLG. Product description is more of a pure NLG task.
- Language: Most financial reporting systems just produce reports in English. Supporting multiple languages is more important in product descriptions.
- Accuracy: Financial reports have to be accurate; hallucination(for example) is completely unacceptable. At least in some contexts, it may be acceptable for a small number of product descriptions to be inaccurate.
- Corpora: It is easier to create large corpora of product descriptions (since zillions are available on the web) than of financial reports.
So financial reporting and product descriptions are situated in very different places in the “space” of NLG applications?
Does this mean that different NLG solutions are appropriate for these different applications? An interesting case is neural end-to-end appproaches. which are used for product descriptions (although we dont know whether they are more effective than other approaches), but I doubt could be used for financial reporting (except perhaps for very short pieces of financial news?). This is because of lack of corpora, intolerance of inaccuracy and hallucination, need to generate multimodal documens of non-trivial length (instead of 50 words of text). However, I think the NLG “pipeline” would work for both, since in both product descriptions and financial reporting there is a need for surface realistion, lexical choice, reference, etc.
I personally think its great to see people working on two very different applications of NLG, from a perspective of both technology development and commercial productisation. This should give us better insights into NLG than if the whole community focused on just one application!