Last week I played around a bit with chatGPT to see if it could do data-to-text tasks. A few people have asked me about this, so I thought I’d write up some of what I did. Please keep in mind that this was an informal exercise, not a proper experiment! I mostly worked in the weather domain (generating textual weather forecasts from weather data), which of course is a classic data-to-text task. I assume chatGPT’s training data includes plenty of weather forecasts.
All of this work was done in the week of 16 Jan. I mostly used a very simple prompt structure, of the form:
[weather data in JSon]
Describe the weather.
I tried some variations of the above, and also adding an example, but it didnt make much difference.
Very simple data-to-text
I started off by giving chatGPT some super-simple made-up weather data, giving wind speed, precipitation, and temperature at 3 hr intervals for one location, for a calendar day. It produced mostly reasonable texts from this, such as
This weather data shows a mostly stable weather pattern with little precipitation and temperatures ranging from 6 to 15 degrees Celsius. The wind speed fluctuates between 9-12 mph. The temperature increases during the day and decreases at night. It appears to be a sunny day.
The statement “It appears to be a sunny day” is hallucinated, since I did not include data about cloud cover or overall weather state. But otherwise this is not bad. I also tried chatGPT on very simple sports data, and got reasonable results there as well.
Moderately simple data-to-text
I then got some real weather data from the UK Met Office, predicting Aberdeen weather on 20 Jan 2023 (data files). You can see the kind of data visually at https://www.metoffice.gov.uk/weather/forecast/gfnt07u1s .
I gave this data to chatGPT, below are two example texts (the second was produced by my student Simone Balloccu, who is using chatGPT in his research, with a different prompting strategy)
This weather data is for a 24 hour period, with the chance of precipitation, temperature, “feels like” temperature, and wind direction provided for each hour. The chance of precipitation ranges from 10% to 30% at the beginning of the period, then drops to less than 5% for the remainder of the period. The temperature ranges from 3 to 4 degrees Celsius, with the “feels like” temperature ranging from -2 to 1 degrees Celsius. The wind direction is primarily NWAL and NWAL WEST, but later on, it changes to SOUTH WEST.
This weather data includes information on the chance of precipitation, temperature, “feels like” temperature, and wind direction for the next 24 hours, with data points given for every hour. The chance of precipitation ranges from 10% at midnight to less than 5% for the rest of the forecast period. Temperature ranges from 3 degrees at midnight to 1 degree at 11pm. Feels like temperature ranges from -2 degrees at midnight to -2 degrees at 11pm. Wind direction starts off as NWAL NORTHWEST and changes to WEST NORTHWEST and WEST.
These texts are disappointing. Problems include
- Language: NWAL makes no sense. My best guess is that this is a reference to an airline , Ill let others speculate on why chatGPT added this.
- Incorrect content: Numerous examples, including “The temperature ranges from 3 to 4 degrees Celsius” in first text and “Feels like temperature ranges from -2 degrees at midnight to -2 degrees at 11pm” in second. The latter is also a bizarre sentence; if chatGPT thinks feels-like temp has not changed, it should say this explicitly. “-2 throughout” is better than “ranges from -2 to -2”.
- Very basic insights: No insights which bring together different data channels (eg wind speed, direction, gust; or temp and feels-like temp); no insights which talk about daytime and night-time.
Moderate complexity data-to-text
I then tried to get chatGPT to generate summaries of basketball box score data. Generating sports summaries is something of interest both academically and commercially (see previous blog for more info). I teach an MSc class on NLG, and ask students to work on sports story generation in labs and also in one assessment, using data from SportSett. So I gave SportSett data from one game to chatGPT. Unfortunately, it refused to process the data, saying that it was too large.
From a commercial NLG perspective, SportSett data for one game is not a very large data set, a typical game is perhaps 30KB in JSon format. I have worked on data-to-text systems where the input data set was a thousand times larger than this.
Above is just me messing around to get an idea of the big picture, its not a careful controlled experiment. And I’n not an expert on chatGPT, perhaps there is a prompting strategy which would give better results. But anyways, taken at face value, it does suggest that chatGPT on its own is not suitable for data-to-text (at least in this point in time). It hallucinated (“sunny day”) even in the very simple case; it gave very disappointing output in the moderately simple case, and it refused to process moderately complex data.
Of course, it may well be that its a mistake to use chatGPT on its own for data-to-text. Facebook deployed a neural NLG weather report generator (blog), in an architecture where
- the neural NLG system just handled language, data analysis and content selection were done elsewhere; and
- a checking module evaluated the output of the neural system, and if any potential problems were detected, the neural output was discarded and a backup template-based generator was used instead.
Perhaps it would make more sense to use chatGPT in this type of architecture, instead of as an “end-to-end” data-to-text system? Of course there are other possibilities as well, such as generating a text initially using rules or templates, and getting chatGPT to improve text quality.
Just to conclude, I absolutely think that chatGPT is exciting science! But as scientists and system builders we need to have a clear understanding of where it works and where it doesnt, and I hope the above sheds some light on this for data-to-text.