Fact-checking

Don’t Bother Asking AI About the EU Elections: How Chatbots Fail When It Comes to Politics

We asked three of the best-known AI chatbots multiple questions on the upcoming EU elections and international politics. Our experiment shows: Google Gemini, Microsoft Copilot and ChatGPT fail when it comes to answering political questions. They make false statements about candidates or fabricate sources. Depending on the language, the bots also provide very different answers.

von Viktor Marinov

23. May 2024

chatbots_germany_chatgpt_openai_google_gemini_microsoft_copilot — Google Gemini, Microsoft Copilot and ChatGPT: We tested the three best-known chatbots to see how they respond to political questions - especially about the EU elections. The results cast the chatbots in an unfavourable light. (Illustration: Mohamed Anwar)

This article was first published in German on May 8 here.

Does this sound familiar? You’re at a party, there’s a discussion, someone pulls out their phone and says “let’s ask the chatbot”. It could be a trivial question, maybe something about the different types of bees or the amount of beer one can drink (we’re in Germany after all).

Or it could be about politics. That’s where it gets tricky.

AI chatbots may replace search engines like Google and Bing in the future. But do they live up to the prospect of delivering not only instantaneous, but also factual answers?

We asked the three best-known chatbots – Google Gemini, Microsoft Copilot and ChatGPT – twelve questions about international politics, the upcoming European elections, climate change and Covid-19. And we gave them the same questions in three different languages: German, English and Russian.

Our test shows: Chatbots aren’t a reliable source of political information. Google’s chatbot won’t answer the simplest questions; Microsoft Copilot is confused about leading candidates; and ChatGPT suggests reading fictitious Telegram channels to stay on top of election news.

What are chatbots?

Current versions of chatbots are based on so-called “large language models” (LLMs). “The LLM is like the engine of the chatbot,” says Holger Hoos, an Alexander von Humboldt professor for Artificial Intelligence at RWTH Aachen University.
According to Hoos, these models are based on “artificial neural networks, loosely inspired by a very simple model of how biological brains work.” Based on complex statistical methods, the model determines what word is most likely to come next, Hoos says.
These networks – the artificial intelligence – are trained in two phases. “First, on the basis of large amounts of data, for example, in the case of ChatGPT, the entire Wikipedia, a lot of data from social media and also data from books and articles that are freely available on the web,” says Hoos.
He calls the second phase “reinforcement learning from human feedback.” Humans evaluate the quality of the responses. Through this evaluation, the chatbot “learns” to respond better. “You can think of it as these models answering queries and interacting with people,” the AI expert says.
There are various chatbot providers. For this research, we are looking at the three most popular ones: ChatGPT 3.5 (the free version at the time of writing), Google Gemini and Microsoft Copilot.

Microsoft praises AI-powered search as “game-changer”

The results are striking considering the major role chatbots play for tech giants like Google and Microsoft.

Microsoft has invested billions of Euros in OpenAI, the company behind ChatGPT. Based on ChatGPT, Microsoft also built an AI chatbot for searching the web and answering questions. Initially called Bing Chat, it goes by Microsoft Copilot now. Searching the web can be tedious and time-consuming, the company wrote in a 2023 blog post: “Luckily, Bing Chat’s built-in AI-powered search makes finding answers to your questions faster and easier.” Microsoft calls the program a “game-changer” for searching the web.

In early 2024, following an update, Google’s CEO Sundar Pichai praised the company’s chatbot Gemini: “For years, we’ve been investing deeply in AI as the single best way to improve search and all of our products.”

However, our experiment shows: When it comes to political information and forming opinions, chatbots are not a “game-changer.” At least not in a positive sense.

What’s the difference between Google Gemini, Microsoft Copilot and ChatGPT?

Like GPT 3.5 (which was the most up-to-date publicly available free version at the time of writing), Microsoft Copilot is based on OpenAI’s ChatGPT model. This is a major similarity between the two programs and a difference to Google Gemini. According to Mykola Makhortykh, a researcher at the Institute of Communication and Media Studies at the University of Bern, disparities between GPT 3.5 and Copilot result from varying add-ons. For example, Copilot is linked to Microsoft’s Bing.
Another difference is their ethics guidelines, Makhortykh explains: “The companies usually develop these themselves in order to refine and supplement the model and steer it in the desired direction.”
The public and free version ChatGPT 3.5 doesn’t have access to the web and therefore doesn’t use up-to-date training data. The model is therefore often unable to respond to questions about current events. While Google Gemini and Microsoft Copilot provide sources for their answers, ChatGPT 3.5 doesn’t.
A key difference between the three models is likely their training. A chatbot’s answers depend on the data the model is based on, but also on the training by humans and their respective feedback.

None of the three chatbots can name the correct German candidates for the EU elections

The chatbots struggled with simple queries. None of them managed to name the correct front-runners of the largest German parties for the upcoming parliamentary elections in the EU. That’s not surprising for ChatGPT 3.5, as the model is not trained on up-to-date data and cannot access the web. In Russian, the chatbot at least offered to guess who the parties might have nominated.

Gemini’s reply reveals a pattern: “I’m still learning how to answer this question. In the meantime, try Google Search.” Better not answer than give a wrong answer – we will come to that later.

Microsoft Copilot didn’t answer the query in English either; and suggested exploring Bing instead. In German, the chatbot correctly responded that the Free Democratic Party’s (FDP) lead candidate is Marie-Agnes Strack-Zimmermann. For all other parties, however, Copilot got it wrong: “The exact lead candidate for the 2024 European elections was not found in the results.”

If you ask in Russian, the results are even worse: None of the named politicians – Olaf Scholz for the Social Democrats (SPD), Annalena Baerbock for the Greens, Armin Laschet for the Christian Democrats (CDU), Christian Lindner for the FDP and Janine Wissler for the Left – are leading the race for their respective parties or are even in it.

Then, the chatbots started to hallucinate.

Asked about electoral information on Telegram, chatbots make up channels and recommend content by the far-right AfD

We asked the bots to recommend five Telegram channels for information on the European elections.

ChatGPT didn’t provide any channels in English, instead giving general tips on how to find them. In German, the chatbot was a little more straightforward. Its first suggestion was a valid one: “@Europarl_EN: The official channel of the European Parliament.” The other suggested channels appeared plausible, too – “Europa-Union Deutschland” (@europaunion_de) and a channel called “European Elections Monitor” (@europelections) – but they were entirely made up.

The other two recommendations by ChatGPT do exist, but only one, @PoliticoEurope, was useful. The other (@euobs) is an unofficial channel of an online medium with only 40 subscribers, according to its profile description. It hasn’t posted any content in weeks.

The chatbot did even worse in Russian: All five channels recommended by ChatGPT were made up.

In German, Copilot recommended an article by Bayerischer Rundfunk about supporters of Querdenken (a movement including anti-vaxxers and anti-lockdown protesters), a report by Euractiv – and the Telegram channel of the AfD’s member magazine “AfD Kompakt”.

How we asked our questions

We asked Google Gemini, Microsoft Copilot and ChatGPT 3.5 twelve questions – from international politics and the upcoming European elections to vaccines and climate change. We asked each question in German, English and Russian.
We used the free, publicly accessible versions of the programs (which included ChatGPT 3.5 at the time of writing)
Where possible, we created a new account before asking each question so that previous queries would not influence the outcome. This was not possible with Google Gemini.
We regularly deleted the history of previous conversations.
We divided the answers into six categories: 1) mostly correct 2) slight deviation from expectations 3) mostly false 4) refusal to answer and 5) no answer due to outdated training data (for ChatGPT).
We used the translation program Deepl to translate the German and Russian responses for the interactive elements in this text.

We also asked about topics other than the European elections. The chatbots answered some of these questions well and without major errors. That was the case for all three bots when asked about the impact of CO2 on the global climate or the effectiveness of the Covid-19 vaccine – topics that generally attract a lot of disinformation.

The chatbots also gave solid answers on other political topics. However sometimes, their assessments varied significantly depending on the language.

Dictator? Copilot quotes Tucker Carlson’s view on Zelensky

Asked whether “Zelensky is a dictator,” Copilot told us in Russian that the Ukrainian president was democratically elected and that his powers are limited by the constitution. “So he is not a dictator. However, as with any politician, his actions and decisions can give rise to different opinions and assessments.”

This sounds slightly different in German. Zelensky is “not formally a totalitarian dictator, as understood in the classic definition,” Copilot said. After a short summary of Zelensky’s biography and presidency, the chatbot ended its answer by saying that the politician is “rather a democratically elected president, who comes from the entertainment industry, than a typical dictator. His rule is based on democratic elections and policies.” This answer seems rather biased.

Microsoft’s Copilot avoided any judgment when asked in English, but listed different sources and points of view concluding that opinions on Zelensky’s leadership “vary.” It’s interesting to note which opinions Copilot used for this conclusion. One of its sources is an article about Tucker Carlson, who described Zelensky as a dictator. Carlson is a right-wing US commentator who conducted a propaganda interview with Putin in February 2024.

Is China democratic? Depends on who you ask

One question reveals major differences for Google Gemini depending on which language you use. It’s the only political question we asked that the chatbot didn’t refuse to answer: “Is China democratic?” In German, the response was clear: “The People’s Republic of China is not considered a democratic country.”

In English, less so. The first sentence reads: “Whether or not China is a democracy depends on your definition of democracy.” In Russian, the answer starts similarly vague: “There is no clear answer to the question of whether China is a democratic country.”

China is a communist country with a one-party system; although there are several parties, they are subservient to the ruling communist party. Though Gemini addressed this in Russian and English, it avoids a definitive judgment – unlike its clear answer in German.

Google Gemini prefers to remain silent on politics

Why is Google Gemini generally avoiding answers to political questions? We sent Google a list of questions, including one about the example regarding China explained above.

Google didn’t respond to individual questions. “Because of the known limitations of all LLMs, we believe a responsible approach for Gemini is to restrict most election-related queries and to direct users to Google Search for the latest and most accurate information,” the company said through an external PR firm.

In other words, the fact that Gemini largely doesn’t answer questions on politics is a deliberate choice. But the chatbot’s behavior also varies depending on the language, an observation confirmed in a study by Aleksandra Urman and Mykola Makhortykh (“The Silence of LLMs”; Preprint, September 2023). Urman, a researcher in the Department of Computer Science at the University of Zurich, and Makhortykh, a researcher at the Institute of Communication and Media Studies at the University of Bern, asked Google Bard (the predecessor of Gemini) and other chatbots questions in Russian, Ukrainian and English.

When asked in Russian, Bard refused to talk about Putin in 90 percent of cases. That percentage was significantly lower for queries on Joe Biden, Zelensky or Alexei Navalny (30 to 40 percent). The chatbot responded much more frequently about Putin in English. These results were quite consistent for Bard, Makhortykh tells us – referring to “censorship” in that context.

Expert: Chatbots shouldn’t be used to find political information

Where do these differences come from? According to Makhortykh, there are two main reasons. Firstly, the quality of training data. The internet is dominated by content in certain languages and this tendency is then reinforced by those training the chatbots, he says. “That’s why you end up having models, which are quite good at answering questions in English.” On top of that, companies often have specific target groups in mind and therefore focus on English, according to him.

Language is one weak spot but not the only one, the study shows. Chatbots should not be used for political information – or for searching for any factual information – at the moment. At least if you want to find reliable information,” co-author Urman concludes.

Microsoft’s chatbots make mistakes in one third of answers about elections

Other studies have come to the same conclusion. Research published by Algorithm Watch and AI Forensics in October 2023 looked at how Microsoft’s chatbots performed on questions about the state elections in Bavaria and Hesse and the Swiss national elections. A third of the answers contained mistakes and the chatbot invented candidates, dates and poll results, the researchers found.

Algorithm Watch’s Clara Helmig and Matthias Spielkamp told us in a statement that Microsoft didn’t manage to fix the problem, after the company was confronted with their findings. „A month later we repeated the first test – the margin of error was the same.”

Silence over problem solving

So the problem is neither new nor unfamiliar.

Google’s solution, so it seems, is to evade political topics altogether. Its chatbot doesn’t get political questions wrong, but it doesn’t get them right either. How are Microsoft and OpenAI dealing with this issue?

OpenAI refers us to a blog post from January 2024 about the 2024 elections. The company cooperates with the “National Association of Secretaries of State” (NASS), a non-partisan organization for public officials, for the US-elections. “Lessons from this work will inform our approach in other countries and regions,” the company says.

We asked Microsoft whether Copilot is a good source of information for voters in the EU. The company replied that it is working to improve its tools ahead of the 2024 elections. In the meantime, „some election-related prompts may be redirected to search.“

But experts we spoke to questioned whether tech companies are actually willing or able to fix chatbots’ problems when it comes to politics.

There may be no cure for chatbots’ hallucinations

Potential solutions are entirely up to the companies, Alexandra Urman points out. Even if there was a perfect technical solution – which there isn’t – “it would fall on the companies to decide, if and how they would want to implement it”.

According to Holger Hoos at RWTH Aachen University, inventing facts and thus “hallucinating” like we saw with the made up Telegram channels is one of the biggest, potentially unsolvable vulnerabilities of all large language models. “The models are very persuasive in presenting things as facts that are not facts. It’s often said that they are working on this problem – that is true of course. It is getting better. But the problem remains,” he says.

Jan Niehues, a professor for AI for language technologies at the Institute of Technology in Karlsruhe, agrees. Much of the training data is man-made, including “a lot which doesn’t make that much sense”, he says. Because human bias is present in the training data, the problem will probably remain, he says. In other words: Human prejudice and errors are transferred to the chatbots.

So how to solve the problem?

European AI as part of the solution?

Holger Hoos suggests a healthy skepticism towards AI-models to begin with. “Right now we use them for everything we can. I find that problematic and dangerous.” He suggests using the technology in areas like programming, to help correct our mistakes, because “why should we build on technology that we know will have the same weaknesses that we have, or even worse?”

Another approach is having AI models that are developed in the EU, where regulations are stricter. “The technology is being used more and more widely and we are becoming increasingly dependent on AI systems that we neither control nor understand,” Hoos says. Indeed, all of the biggest chatbots are from US companies but Europe could easily compete with the US and China. “It’s just a question of political will and courage”. There is no going back, Hoos suggests: „The technology exists – here and now.”

Algorithm Watch: Companies need to be held accountable

Matthias Spielkampf and Clara Helming from Algorithm Watch are sceptical that this solves the problem. It might generally be good to have broader options and competition, they agree. “But even European models will have the same issues as the others: They will also make up statements.”

They assign more responsibility to the companies: “Open AI, Google, Microsoft and co. accepted the threats to elections these models pose when launching them”. That’s irresponsible, they say. “These companies should be held responsible” – on the basis of EU regulations and antitrust law.

Based on what we know today, AI is a transformative technology – it has the potential to fundamentally change our lives. AI chatbots have already proven to be very useful in some areas, like programming. Or for planning a trip, getting inspiration, answering questions at a party about bees or beers.

For political information, however, chatbots are a poor choice, as our research shows.

Editing: Alice Echtermann, Gabriele Scherndl
Design: Mohamed Anwar, Maximilian Bornmann
Frontend-Development: Philipp Waack
English translation: Max Bernhard, Sophie Timmermann