Salesforce has announced that it is buying the NLG company Narrative Science, which will become part of the Tableau team which provides business intelligence tools. This highlights that NLG is being taken very seriously in the business intelligence world, and indeed BI looks like it could be a “killer app” for NLG.
The real world usefulness of NLG systems depends on many different factors, not just accuracy and fluency of generate texts. We should evaluate real-world utility of our systems, and check how well existing evaluation techniques (metrics and Turker-based human evaluation) correlate with real-world utility.
The fundamental challenges of building useful data-to-text NLG systems are the same regardless of whether we build systems with rules or transformers. We need to understand where NLG is useful, choose good content to communicate, robustly deal with edge cases, allow users to configure and control the system, and evaluate properly. I’d like to see more research on these fundamental issues, regardless of technology used.
(Personal blog) I’m 61, so I’m starting to think about retirement. I’m not planning to retire until 2025 at the earliest, but that’s close enough that its starting to have an impact on my commitments and activities,
When I asked participants what they most liked at the recent INLG conference, people highlighted events and sessions which focused on discussion and interaction, not technical research papers. Perhaps there is a lesson here that conferences should focus more on interaction and community, and not simply be regarded as venues for presenting research papers.
One of the highlights of INLG for me was the panel on “What users want from real world NLG”. I summarise a *few* of the really interesting points made about trust, authoring, configurability, human-in-loop, and other key issues for real-world NLG users.
Anya Belz and I are looking for a research fellow to work on a new project on reproducibility of human evaluations of NLP systems. This is a great opportunity for a researcher who wants to improve the scientific quality of human evaluations in NLP!
We’ve just completed a shared task on evaluating accuracy of NLG texts. This was really interesting, and amongst other things showed that current neural data-to-text systems struggle to learn how to use some words which have clear but relatively complex definitions.
I encourage students to have “exercises” where they critically read an academic paper, looking for problems in evaluations. This will help develop skills for writing as well as reading papers. So give it a go!
In 2016, I was shocked by the poor scientific quality of research in neural NLG. Fortunately, the situation is better in 2021! However, progress has been less than I had hoped, I think in part because the “leaderboard” culture does not encourage good science.