Demos found AI chatbots mangled Scottish election facts in one-third of answers

How the test ran

Demos published Electoral Hallucinations on May 20, 2026. The report tested whether popular text-based AI services could answer basic questions about the Scottish Parliament election without sending voters into a procedural ditch.

The researchers ran a one-day snapshot test on March 27, during the pre-election window. Two analysts asked questions about three real Holyrood constituencies: one in a large city with a prominent politician running, one hotly contested urban constituency with redrawn boundaries, and one rural constituency. The services tested were ChatGPT, Google Gemini, Google AI Overviews, Grok, and Replika.

Demos asked 75 election questions across those constituencies and services, then manually assessed factuality, evidence use, bias, and vulnerability to malicious use. The detailed appendix explains the testing structure: one conversation per constituency per chatbot, plus opening prompts intended to mimic a normal Scottish voter asking for help.

The headline number is bad enough without decoration. Across factual responses, 34.1% contained factual errors. Demos broke that into 8.75% entirely inaccurate responses and 25.3% partly accurate responses with errors. In raw counts, that was 109 erroneous responses out of 320 factual responses.

That is not a rounding error. That is a one-in-three chance of bad information in a context where small procedural mistakes can affect whether someone votes correctly.

What the bots got wrong

The errors were concrete. This was not a philosophical disagreement about policy tone or an argument over whether a summary sounded balanced enough. Demos listed examples that belong in the blunt category of "things voters should not be told incorrectly."

ChatGPT got the election date wrong by more than two months. It made incorrect claims about voter eligibility rules involving residency and citizenship. ChatGPT and Replika both incorrectly claimed that voters needed to bring ID in contexts where that advice was wrong. ChatGPT and Replika also made up an expenses scandal involving a politician, and Replika invented a date for that fake scandal.

Replika invented a candidate, wrongly claimed an incumbent was running, got a registration deadline wrong, and invented an accusation of nepotism. Grok and ChatGPT misidentified constituencies when given postcodes. Gemini incorrectly described a candidate's position on the Scottish Assisted Dying Bill and wrongly claimed an SNP fraud inquiry was still ongoing when it had concluded.

Those are not minor imperfections. They are the exact categories of information voters ask for: when is the election, who is standing, am I eligible, do I need ID, what constituency am I in, what are the candidates' records? If an AI assistant fails there, it is a confidently formatted liability rather than a useful election guide.

The Guardian's coverage reported that the Electoral Commission called for stronger legal controls after the findings. The Commission's concern is easy to understand. A wrong answer from one chatbot user can spread into a group chat, a social post, or a local rumor before anyone checks it against an official source. Election misinformation does not need to be malicious to be damaging. It only needs to be repeated.

Why this fits even though it is a study

Vibe Graveyard includes research studies when they document real-world failure patterns with credible methodology and concrete implications. This one qualifies. The services tested were public systems people actually use. The subject matter was a live election. The errors involved real constituencies, real voter procedures, and real candidates.

The study also sits next to earlier incidents already on the site. The Senedd chatbot story covered AI systems giving misleading Welsh election advice. GOV.UK Chat showed a government chatbot giving bad tax answers on launch day. Demos adds a broader test across commercial AI services during a Scottish election window, with quantified error rates and a regulator response.

That matters because the failure pattern is not limited to one chatbot, one country, or one bad prompt. The Demos report found the same basic weakness across multiple service types: general chatbots, a search overview product, a companion bot, and Grok. Some were better than others, but the aggregate reliability was nowhere near what election information requires.

Elections are hostile environments for static model knowledge. Candidates change, boundaries move, registration deadlines pass, investigations conclude, and manifestos update. A model with stale training data or weak live retrieval can sound authoritative while dragging old facts into a new campaign. Demos found exactly that problem, including ChatGPT responses relying on information more than a year out of date.

"Ask the official source" is not enough

The obvious answer is that voters should check official election websites. They should. That does not solve the deployment problem.

AI companies are already presenting their products as general-purpose answer machines. Users ask them ordinary questions. The assistants answer in a tone of authority. Search products now put AI answers above links. Companion bots answer questions they were never designed to handle. The user does not necessarily know when the system has crossed from "useful explanation" into "stale or invented civic information."

Telling voters to ignore AI election answers after the fact is like posting a wet-floor sign in another building. If the product answers election questions, the product needs election-grade guardrails: refusal when the system cannot verify current facts, links to official election pages, clear date stamps, and high-risk topic detection for voter eligibility and procedural advice.

Demos' policy recommendations point in that direction. The report calls for stronger minimum standards around election safeguards, transparency, text identification, and making existing law ready for large language models. The Guardian reported that the Electoral Commission wants clearer duties on AI platforms to protect voters from misleading information during election periods.

A boring fix

The technical fix is not glamorous. Election queries should be routed to verified official sources or refused. If the system cannot determine the current constituency for a postcode, it should say so and point the user to the official lookup. If it cannot verify a candidate list, it should not invent one. If it is using stale data, the answer should stop before giving procedural advice.

That kind of restraint is commercially unattractive because it makes the assistant feel less magical. Too bad. The alternative is a chatbot that can tell a voter the wrong election date with a straight face.

Demos' results show the gap between consumer AI confidence and civic reliability. A model can be impressive at summarizing policy debates and still fail at the administrative facts that make voting possible. The voting public does not need a charming synthesis of old internet text. It needs the correct date, the correct rules, the correct constituency, and the correct candidate list.

If AI companies want their tools to answer election questions, the standard is not "mostly helpful." In this domain, mostly helpful means a measurable number of voters are being handed bad instructions. That is not an assistant. That is a slop machine with a ballot-shaped output slot.

Vibe Graveyard

Demos found AI chatbots mangled Scottish election facts in one-third of answers

Incident Details

Tech Stack

References

How the test ran

What the bots got wrong

Why this fits even though it is a study

"Ask the official source" is not enough

A boring fix

Discussion