Understanding what users want from NLG
When building an NLG system, it really helps to understand what users want; this came up several times at the recent INLG conference. I discuss some of our work in this space, and give a few suggestions.
When building an NLG system, it really helps to understand what users want; this came up several times at the recent INLG conference. I discuss some of our work in this space, and give a few suggestions.
Deployed software systems need to be maintained as bugs emerge and the domain and user needs evolve; this perhaps is especially challenging for systems based on LLMs. Unfortunately little is known about maintaining NLG systems,
Sometimes the latest technology is *not* appropriate for an NLG task. I saw this very strongly in the late 2010s with LSTMs (which do not work well for data-to-text), and continue to see this in 2024 (GPT4 is not always the best approach). Both researchers and developers need to be open-minded about alternative approaches.
My student Barkavi Sundararajan has shown that LLMs do a better job at data-to-text if the input data is well structured. She will present a paper about this at NAACL.
I see lots of big-picture talk about what LLMs can do, but at a practical level there are real challenges in using them in commercial applications. These include cost, stability, and need for human-in-loop, as well as use-case-specific challenges.
At this moment in time, chatGPT and other LLMs seem to be much better at the “language” side of data-to-text than the “content” side, Even on the language side, there are important caveats about real-world usage. Of course, the above may change as the technology improves.
Last week I played around with using chatGPT for data-to-text, and to be honest overall I was disappointed. A few people have asked me about this, so I’ve written up some of my notes here.
I get asked a lot about chatGPT, so I thought I’d write a blog explaining my views, which focus on its impact on data-to-text NLG. Basically I think chatGPT is really exciting science which shows major progress on many of the challenges in neural NLG. However, commercial potential is unclear, and the media hype is annoying…
I was very impressed by a recent talk about the power of simple white-box models in tasks such as medical diagnosis. I’d love to see more work on simple models in NLP and NLG!
Thge most populat datasets used in summarisation (CNN/DailyMail and XSum) do not actually contain summaries. I find this worrying. Surely the best way to make make progress on summarisation is to use actual summarisation datasets, even if these are less convenient from a “leaderboard” perspective.