Quite a few people have asked me what I think about chatGPT, so I thought I’d write down some thoughts, about what chatGPT and earlier GPT technologies mean for the kind of data-to-text applications which I am interested in. Of course tons has been written about chatGPT and other GPT technologies, I dont think I have any startling new insights, but at least I can point people to this blog if they ask me about chatGPT.
Basically I think chatGPT (A) is very exciting science, (B) has unclear commercial potential, and (C) has attracted ridiculous amounts of hype. I expand a bit on these below (I could easily write much more on these topics!).
I must admit that I have been disappointed in the effectiveness of end-to-end neural techniques based on trained or fine-tuned models, for the types of data-to-text NLG I am interested in. Huge numbers of papers over the past 5 years, but the below problems still stand.
- Poor quality texts: The quality of texts generated by neural data-to-text NLG remains poor from an accuracy and content perspective, except for very simple texts (which can easily be generated using templates) (with a few exceptions). Its also frustrating that most researchers in this space refuse to properly evaluate accuracy, despite the fact it is usually the biggest problem with their system!
- Too much training data needed: In most of the NLG projects I’ve been involved with, we have had fewer than 10 input-output examples, not the thousands needed to train or fine-tune models.
- Cannot do complex texts: I’m most interesting in generating complex multi-paragraph texts which communicate insights about complex data sets, and neural NLG systems do badly on complex data-to-text tasks.
- Not robust: Over the years I’ve downloaded and experimented with several models I read about in papers, and I’ve always been disappointed with their robustness. If I make even small adaptations for my use case and dataset, text quality collapses.
From this perspective, I think the GPT technologies are really exciting because they show major progress on 3 of the above 4 issues.
- Training data: the few-shot prompting technique developed around GPT3 shows that is possible to condition a large language model with just a handful of examples, and (at least in some cases) produce reasonable quality output texts.
- Text complexity: ChatGPT seems to be able to generate multi-paragraph texts which communicate complex information.
- Robustness: People throw all sorts of things at chatGPT and it seems pretty good at dealing with a huge range of tasks, far better than any other neural NLG system I’ve seen.
Accuracy remains a huge unsolved problem with chatGPT! But its great to see progress on on the other issues.
Also, if chatGPT does emerge as the best neural tech for NLG, maybe we’ll see less of a focus on SOTA-chasing leaderboards in academic research, and more interest in novel use cases, user requirements, better evaluation, improving safety, etc. Which would also be great!
Perhaps because of my involvement in Arria, I’m also sometimed asked whether I think chatGPT is going to have a big commercial impact within the next few years. I wont speculate about the long term, but to be honest I dont see a huge impact within the next few years.
Ignoring dubious case cases such as generating fake news and helping students cheat on essays (which have limited profit potential as well as being ethically dubious), I think the most likely successes over the next few years are in coding and writing assistants. Coding is especially promising (which other people have also said), I’ve already seen an example where chatGPT genuinely helped someone I know with a coding task.
Having said this, there is a lot of controversy over the effectiveness of coding assistants based on GPT3 technology such as Github Copilot, with some people claiming productivity improvements, others claiming no effect on productivity, and still others saying that code developed with Copilot has quality flaws (paper). There are also legal issues and even lawsuits from developers who are angry that their GPL-licensed code (which does not allow commercial usage) was used without their permission to develop a commercial tool (Copilot). My best guess is that the technology will be used, but uptake will be limited until we figure out where its most effective and also resolve legal issues. In short, as with most new technologies, large-scale real-world usage is going to take time!
Search of course is a bigger market (Google is a trillion dollar company), and there is a lot of talk about using chatGPT to find information, instead of Google search. I am skeptical, because of provenance, control, and safety issues. ChatGPT essentially creates its content by stitching together material from the Interet, and there is a *lot* of bad and indeed malicious content on the Internet. Google devotes a huge amount of effort to screening search to block harmful content. Perhaps chatGPT could do the same, but how would malicious content be removed once detected? Google can simply block a malicious web site, but I dont think it is possible to extract a malicious website from a language model once it is trained,
I hate the hype
Lastly, there is an insane amount of hype around chatGPT, which really annoys me. Of course OpenAI is really good at generating hype, as shown when they got a lot of media attention for GPT2 by claiming it was very good at generating fake news. Which is not a use case I value, and also not one of much commercial importance! ChatGPT is interesting and exciting, but its not going to radically change the world tomorrow, which much of the media coverage suggests.
I know I shouldnt let annoying media coverage distract me from genuinely exciting science, but it is annoying… Especially when the media pay so little attention to the genuine ways in which language technology is already changing the world. Machine translation is used by a billion people and is changing society by reducing language barriers, speech technology likewise is used by a billion people and is changing the way we work with computers. But presumably the media find large-scale real-world impact to be boring, so they ignore it and talk about chatGPT instead.
To finish on a positive note, I think chatGPT is a great scientific achievement and probably a better approach to neural data-to-text NLG than fine-tuning models. If its accuracy problems can be solved (which may or may not be possible, its a huge challenge), then it will revolutionise data-to-text NLG!
4 thoughts on “chatGPT: Great science, unclear commercials, hate the hype”
Good points Ehud. One ugly scenario is that they build a business model based on inserting ads and product placements into the generated texts. This may be also become the model for dall-E and other creative AI engines released to public use.
Recently I was impressed by a viral TikTok video in which a rheumatologist claimed that using ChatGPT for writing letters to insurance companies to justify the need for exams or treatments.
The letter looked great. However, the scientific articles reference therein where completely made up by ChatGPT. This tendency for confabulations (or “hallucinations”) also seems to haunt ChatGPT.
In the mean time, the physician is much less convinced:
This kind of hallucination is a type of what I call accuracy problems, which (as I said in my blog) is the problem and area where research and progress is needed, from science perspective. Also means commercial success will be limited at least in short term.