Can ChatGPT do Data-to-Text?
Last week I played around with using chatGPT for data-to-text, and to be honest overall I was disappointed. A few people have asked me about this, so I’ve written up some of my notes here.
Last week I played around with using chatGPT for data-to-text, and to be honest overall I was disappointed. A few people have asked me about this, so I’ve written up some of my notes here.
An example from MedPaLM highlighted to me that generated texts can contain information which is factually accurate but still not appropriate, because (in this case) of its negative psychological impact. There are other such cases, and we should ensure that our evaluation criteria are sensitive to them.
I get asked a lot about chatGPT, so I thought I’d write a blog explaining my views, which focus on its impact on data-to-text NLG. Basically I think chatGPT is really exciting science which shows major progress on many of the challenges in neural NLG. However, commercial potential is unclear, and the media hype is annoying…
I thought I’d end 2022 with a summary of the papers written by my students and I in 2022. All of them are about requirements, resources, and/or evaluation of NLG.
I was very impressed by a recent paper that compared prompting-based MT to MT based on trained models. Results are very interesting; prompting-based MT generates fluent texts which however have accuracy problems. Also the paper itself is an excellent example of a high-quality NLP evaluation, and I recommd it to anyone who wants to do good NLP evaluations.
I dont like academic leaderboards. Poor scientific techniques, poor data, and poor evaluation means leaderboard results may not be worth much. I also suspect that the community’s fixation on leaderboards also means less research on important topics that do not fit the leaderboard model, such as understanding user requirements.
Quality assurance processes for academic research, notably peer review by unpaid volunteers, are very lightweight and miss many problems. Better quality assurance processes would require more resources and efforts, but would result in more trustworthy papers.
I was very impressed by a recent talk about the power of simple white-box models in tasks such as medical diagnosis. I’d love to see more work on simple models in NLP and NLG!
Thge most populat datasets used in summarisation (CNN/DailyMail and XSum) do not actually contain summaries. I find this worrying. Surely the best way to make make progress on summarisation is to use actual summarisation datasets, even if these are less convenient from a “leaderboard” perspective.
I’m considering writing a book on NLG (a mere 22 years after my last one), and would welcome feedback from the community on this project.