Researchers in Berlin are investigating how reliably ChatGPT provides science-based information on climate change. They discover that AI usually gives correct answers, but that under no circumstances should it be blindly trusted. Verifying sources is more important than ever, but not easy.
ChatGPT and other large language models based on machine learning and large data sets are penetrating almost all areas of society. Companies or researchers that do not resort to their help are increasingly considered anachronistic. But is the information from artificial intelligence reliable enough? Scientists at the Technical University of Berlin tested this using climate change. To do this, they asked ChatGPT questions about the topic and examined the responses for accuracy, relevance, and possible errors and contradictions.
Its impressive capabilities made ChatGPT a potential source on many different topics, the Berlin team writes in the article published in “Ecological Economics.” However, even the developers themselves could not explain how a certain response arises. This can still be fine for creative tasks like writing a poem. However, this is a problem when it comes to topics such as the consequences of climate change, where accurate, fact-based information is important.
Therefore, according to the researchers, it is important to examine the quality of the answers that ChatGPT gives in these topic areas. Among other things, it is important to separate misinformation in public debate and the media from scientifically based findings.
Hallucinations and useless conjectures.
That is not easy. To make matters worse, AI can “hallucinate.” That is, ChatGPT makes factual claims that cannot be corroborated by any source. Additionally, the linguistic model tends to “make meaningless assumptions instead of rejecting unanswered questions,” according to the TU team.
The big danger is that ChatGPT users take incorrect or incorrect answers at face value because they are formulated in a plausible and semantically correct way. Previous research has shown that people gave more weight to AI advice if they were unfamiliar with the topic being discussed, had used ChatGPT before, and had received accurate advice from the model, the researchers write.
The Berlin team is particularly interested in this topic because, within the framework of the Green Consumption Assistant research project, they are developing an AI-based assistant that helps consumers make more sustainable purchasing decisions on the Internet. Previous research has only shed light on the possibilities of ChatGPT, but does not reflect its ability to answer questions about climate change, the researchers write.
To clarify this, they asked ChatGPT a total of 95 questions. They rated the responses in terms of accuracy, relevance, and consistency. The team verified the quality of the responses using public and reliable sources of information on climate change, such as the current report of the Intergovernmental Panel on Climate Change (IPCC).
Mostly high quality answers.
The researchers took into account that the language model is constantly developing. Among other things, they checked whether an input (request) delivered different results at different times. The first round was conducted last February using ChatGPT-3.5, while the second set of questions was conducted in mid-May of this year using the later version of the model. Its knowledge base recently received an update and now extends until April 2023. Previously, the model only had information until September 2021.
So the results could be different today. For follow-up studies, researchers suggest more rounds of questions at shorter intervals. The researchers see further limitations to their work in the possibly too small number of experts to evaluate the answers. Furthermore, the questions and their formulation were not based on current user data. Today, people could ask ChatGPT different questions, phrased in different ways, that would produce different results.
Research work that has now been published has shown that the quality of model responses is generally high. On average it obtained a score of 8.25 out of 10 points. “We observed that ChatGPT provides balanced and nuanced arguments and concludes many answers with a comment that encourages critical review to avoid biased answers,” says Maike Gossen of TU Berlin. For example, ChatGPT answered the question “How is marine life affected by climate change and how can negative impacts be reduced?” Not only is the reduction of greenhouse gas emissions mentioned, but also?
Reduce non-climate impacts of human activities such as overfishing and pollution.
Relevant error rate
The accuracy of more than half of the answers was even rated out of 10. But you should not rely on the results to always be so high. In 6.25 percent of the answers, the precision did not reach more than 3 points, and in 10 percent, the relevance did not reach a value higher than 3.
Of the questions answered inaccurately, the most common error was caused by hallucinations of facts. For example, ChatGPT’s answer to the question “What percentage of recyclable waste is actually recycled in Germany?” Correct in broad strokes, but not in the details. According to the Federal Environment Agency, in 2020 it was 67.4 percent, while ChatGPT said it was 63 percent.
ChatGPT is inventive, but looks credible
In some cases, ChatGPT generated false or false information, such as fabricated references or fake links, including to supposed articles and contributions in scientific publications. More errors arose in cases where ChatGPT cited specific and correct scientific sources or literature, but drew incorrect conclusions from them.
The researchers were also able to observe that ChatGPT’s inaccurate answers were phrased so plausibly that they were incorrectly perceived as correct. “Because text generators like ChatGPT are trained to give answers that seem correct to people, the safe answer style can trick people into thinking the answer is correct,” says Maike Gossen.
The team also found misinformation in social discourse or prejudices. For example, some of ChatGPT’s incorrect answers reflected misunderstandings about effective actions against climate change. This includes the overestimation of individual behavioral changes, but also individual measures of little impact that slow down structural and collective changes of greater impact. At times, responses also seemed overly optimistic about technological solutions as a key way to mitigate climate change.
Valuable but fallible source.
The scientists conclude that large language models like ChatGPT could be a valuable source of information on climate change. However, there is a risk that they will spread and promote false information about climate change because they already reflect outdated facts and misunderstandings.
Their brief study shows that checking sources of environmental and climate information is more important than ever. However, recognizing incorrect answers often requires detailed expertise in the relevant topic area, especially since they seem plausible at first glance.