UK government's GOV.UK Chat launched with misleading tax answers on day one
On Friday, May 15, 2026, the UK government rolled out GOV.UK Chat inside the official GOV.UK app, billing it as the largest government-built chatbot of its kind, trained on 80,000 pages of gov.uk content with a target accuracy of 90%. Within hours of launch, tax expert Dan Neidle of Tax Policy Associates published evidence in The Times showing the bot giving misleading answers on tax questions that millions of UK households actually have. The bot failed to mention the £100,000 cliff edge where tax-free childcare eligibility collapses, and it told a user that selling old MacBooks on eBay could attract capital gains tax, which is not how UK CGT works for personal-use chattels. The Cabinet Office framed the tool as "information about services" rather than advice; Neidle pointed out the bot itself reads like it is giving advice. Either way, a 90% accuracy claim on benefits and tax means one in ten answers is wrong on questions where being wrong costs real money.
Incident Details
Tech Stack
References
The launch
On Friday, May 15, 2026, the UK government switched on GOV.UK Chat inside the official GOV.UK app, billing it as the largest government-built chatbot of its kind. The tool draws answers from around 80,000 pages of gov.uk content using retrieval-augmented generation and was internally benchmarked at roughly 90% accuracy. The launch coverage focused on the speed at which citizens could now ask the government a plain-English question and get a plain-English answer, rather than navigating a series of dropdown menus or PDFs.
The launch was supposed to be a clean political win: a long-promised piece of AI-enabled public infrastructure, finally live, with a clear value story about reducing call-centre load and helping citizens self-serve. Within hours, that framing collided with tax expert Dan Neidle's spot-testing.
Two real-money errors on day one
Neidle, founder of Tax Policy Associates, did what tax specialists always do when a new tax-information tool ships: he asked it questions whose correct answer he already knew, on points where many UK households genuinely make decisions worth real money. He published evidence of what came back in The Times, with summary coverage running in LBC, The Register, Result Sense, CFOtech UK, and Financial Accountant.
Two specific failures stood out.
Tax-free childcare cliff edge
Neidle asked the chatbot whether a £1,000 pay rise from £99,000 to £100,000 would affect his entitlement to tax-free childcare. The correct answer is "yes, dramatically." Tax-free childcare has a strict £100,000 cliff edge in UK rules: once an individual earner crosses that line, eligibility for the scheme collapses entirely. The way the rule interacts with bonuses and pension contributions is the kind of thing households with children plan around for months. It is one of the better-known thresholds in UK personal tax.
GOV.UK Chat replied that there was no upper income limit at that level. The cliff edge was simply not mentioned. A user trusting the answer would have walked into a household-budget surprise of several thousand pounds a year in childcare costs, with no way to claw back the lost eligibility for the rest of the tax year.
MacBook capital gains
Neidle then asked whether selling two old MacBooks on eBay for £1,300 needed to be declared. The correct answer is "no, this is the sale of personal-use chattels, which are outside the scope of UK capital gains tax." The exemption for personal-use chattels under a certain value threshold is one of the older and clearer corners of the UK CGT regime.
GOV.UK Chat framed the question around capital gains tax, suggesting that the user might owe CGT and that they should consider declaring the sale. Neidle's published reaction described the answer as "stupid." He was being mild. A government chatbot that warns you about a tax you do not owe does worse than waste your time; it funnels law-abiding people into unnecessary HMRC paperwork and erodes confidence in the broader system.
"Information about services" vs. "advice"
The government's response to the criticism leaned hard on a semantic distinction: GOV.UK Chat is offering information about government services, not formal advice. That is true as a matter of internal policy. It is not true as a matter of how users interpret what a chatbot says.
When a UK citizen opens the official government app and types in a question about tax-free childcare or about selling personal items on eBay, the chatbot's answer is not received as "this is general background information." It is received as authoritative state guidance, because it is coming through the official channel. Neidle made this point directly in his published critique: regardless of what the government calls the output, the chatbot itself reads like it is providing advice, and ordinary users will treat it that way.
This is the same gap that has tripped up every previous government chatbot deployment Vibe Graveyard has covered: New York City's MyCity chatbot telling business owners they could pocket workers' tips, the Canada Revenue Agency's Charlie chatbot giving incorrect tax answers, Wales's Senedd chatbot telling users they could vote in elections they were not eligible for. In each case, the agency's internal positioning of the tool ("information," "general guidance," "not a substitute for advice") was overridden in practice by the user's expectation that a government channel is authoritative.
What the 90% accuracy number actually means
The government's own pre-launch benchmark put GOV.UK Chat at around 90% accuracy. That number is doing a lot of work in the launch materials. It sounds reassuring. In tax and means-tested benefits, it should not.
90% accuracy means that one in ten answers does not match official guidance in every detail. If the topic distribution is uniform across the 80,000 pages of source material, then most of those misses are on low-stakes questions about office hours and form locations. If the topic distribution is the actual usage distribution - which will skew heavily toward tax, benefits, immigration, driving licences, and pension entitlements - then the 10% of misses cluster on exactly the questions where being wrong costs the user real money or denies them entitlements.
Neidle's two spot-test errors are not unrepresentative outliers. They are the predictable shape of a 90% accuracy bot fielding queries about personal tax. Other tax professionals quoted in the Financial Accountant and CFOtech UK coverage made the same point: the failure pattern lives in the exact areas where a citizen would most benefit from an accurate answer and most suffer from a wrong one.
What it would take to fix
The technical fix is straightforward in principle: route certain query categories - personal tax, benefits eligibility, child-related entitlements, capital gains - through a higher-friction path that either escalates the user to a human or returns a deliberately conservative answer with strong "please verify" framing. That kind of routing is what the existing UK guidance on AI in public services already calls for. The harder fix is institutional: accepting that "ship the chatbot to everyone, watch errors with reasonable mature monitoring" is not appropriate for an interface where individual wrong answers translate into household-level financial damage.
Sinch's broader May 2026 customer-service rollback report and the Qualtrics customer-experience data from earlier in 2026 both pointed at the same underlying truth: enterprise AI customer service is in a quiet rollback cycle because the production-stage failure rate is higher than the pilot-stage rate. GOV.UK Chat is on the public-sector version of the same curve. The first published failures are visible on day one because tax experts test new tools. The next ones will come from individual citizens who followed the answer and then discovered the consequences later.
The graveyard lesson
The pattern is becoming routine, which is itself the story. A government deploys an AI chatbot inside an official channel. The deployment is announced with a confidence-boosting accuracy number. A specialist tests the bot on real questions in the bot's flagship topic area. The bot fails on those questions in ways a competent civil servant would never fail. The agency responds by drawing a line between "information" and "advice" that exists nowhere in the user's mental model.
The fix is not to keep shipping these tools and asking citizens to interpret accuracy claims as if they were trained statisticians. The fix is to keep the bot away from the categories where being wrong costs the user money or denies them their rights, and to make that scope obvious in the interface. Otherwise the next Dan Neidle is going to spot the next two errors the same week the next agency announces the next launch.
Discussion