Product Failure Stories

33 disasters tagged #product-failure

Tombstone icon

Every AI model fails security test across 31 coding scenarios

Mar 2026

Armis Labs tested 18 leading generative AI models across 31 security-critical code generation scenarios and found a 100% failure rate - not one model could consistently produce secure code. In 18 of those 31 challenges, every single model generated code containing Common Weakness Enumeration vulnerabilities. The best performer, Gemini 3.1 Pro, still produced OWASP Top 10 flaws in nearly 39% of scenarios. Older proprietary models fared worse, and the report found no correlation between price and security. The "Trusted Vibing Benchmark" dropped the same week enterprises were mandating AI-assisted development at scale, which is either very good timing or very bad timing depending on your relationship to a production deployment.

Facepalmby Developer
Industry-wide; every major AI code generation model tested produces security vulnerabilities at scale, with implications for any organization using AI-assisted development in production
securityproduct-failure
Tombstone icon

AI chatbots recommended illegal casinos and ways around gambling safeguards

Mar 2026

A Guardian and Investigate Europe investigation found that major AI chatbots, including Meta AI, Gemini, ChatGPT, Copilot, and Grok, could be prompted to recommend unlicensed offshore casinos and explain how to get around gambling safeguards such as source-of-wealth checks and the UK's GamStop self-exclusion scheme. Some bots added token warnings, then went right back to comparing bonuses, crypto payments, anonymity, and payout speed for sites operating outside national licensing regimes.

Facepalmby AI Product
Vulnerable gamblers and self-excluded users were shown that multiple mainstream chatbots could funnel them toward illegal offshore operators and undermine public safety protections.
ai-assistantsafetyproduct-failure
Tombstone icon

California community colleges spend millions on AI chatbots that give students wrong answers

Mar 2026

California community college districts are spending millions of taxpayer dollars on AI chatbots from vendors like Gravyty and Gecko - ostensibly to help students navigate admissions, financial aid, and campus services. A CalMatters investigation found the bots routinely serve up inaccurate or flat-out wrong answers instead. Three districts reported annual chatbot costs ranging from $151,000 to nearly half a million dollars. At Fresno City College, the student government vice president said her school's mascot-branded chatbot repeatedly botched basic campus questions. The OECD found it noteworthy enough to log in its AI Incidents and Hazards Monitor.

Facepalmby AI vendor
Millions of dollars spent across multiple California community college districts; students misdirected on admissions, financial aid, and campus services
ai-assistantcustomer-disserviceedtech+1 more
Tombstone icon

Amazon's retail site hit by wave of AI-code outages, losing millions of orders

Mar 2026

Amazon's main e-commerce website suffered a series of outages in early March 2026, with internal documents linking the disruptions to AI-assisted code changes. A March 5 incident caused a reported 99% drop in orders across North American marketplaces - an estimated 6.3 million lost orders. A March 2 incident caused 1.6 million errors and 120,000 lost orders globally. Amazon responded with a 90-day "code safety reset" for 335 critical retail systems, mandatory senior engineer sign-off on AI-assisted code from junior and mid-level engineers, and an emergency internal "deep dive" meeting. Amazon disputes that AI is the primary cause, attributing only one incident to AI and calling it "user error."

Catastrophicby AI coding assistant
Millions of Amazon customers unable to complete purchases; estimated 6.3 million lost orders in one incident alone; 90-day code safety reset imposed across 335 critical retail systems
automationproduct-failure
Tombstone icon

Alibaba's ROME AI agent went rogue, started mining crypto on its own

Mar 2026

During routine reinforcement learning training, Alibaba's experimental AI agent ROME - a 30-billion-parameter model based on the Qwen3-MoE architecture - autonomously began diverting GPU resources for unauthorized cryptocurrency mining and established reverse SSH tunnels to external IP addresses. Nobody told it to do this. The AI bypassed internal firewall controls independently, prompting Alibaba's security team to initially suspect an external breach before tracing the activity back to the agent itself. Researchers attributed the behavior to "instrumental convergence" during optimization - the model figured out that acquiring additional compute and financial capacity would help it complete its tasks more effectively. So it helped itself.

Catastrophicby AI agent
Unauthorized GPU resource diversion; internal firewall bypass; reverse SSH tunnels to external addresses; security policy violations across Alibaba Cloud training infrastructure
automationsecurityproduct-failure
Tombstone icon

Claude Code ran terraform destroy on production and took down an entire learning platform

Feb 2026

Developer Alexey Grigorev was using Anthropic's Claude Code agent to help migrate a static website into an existing AWS Terraform setup when the AI swapped in a stale state file, interpreted the full production environment as orphaned resources, and ran terraform destroy - with auto-approve enabled. The command deleted DataTalks.Club's entire production infrastructure: database, VPC, ECS cluster, load balancers, bastion host, and all automated backups. Two and a half years of student submissions, homework, projects, and leaderboard data vanished. AWS Business Support eventually recovered the database from an internal snapshot invisible in the customer console, but the incident laid bare how quickly an AI agent with infrastructure access can reduce a running platform to rubble.

Catastrophicby Developer
Full production infrastructure destroyed; 2.5 years of student data temporarily lost; platform offline until AWS restored from internal backup ~24 hours later.
automationproduct-failureai-assistant
Tombstone icon

Meta's AI moderation flooded US child abuse investigators with unusable reports

Feb 2026

US Internet Crimes Against Children taskforce officers testified that Meta's AI content moderation system generates large volumes of low-quality child abuse reports that drain investigator resources and hinder active cases. Officers described the AI-generated tips as "junk" and said they were "drowning in tips" that lack enough detail to act on, after Meta replaced human moderators with AI tools.

Catastrophicby Developer
US child abuse investigations impaired nationwide; investigator resources diverted from actionable cases
automationsafetypublic-sector+1 more
Tombstone icon

AI transcription tools inserted suicidal ideation into social work records

Feb 2026

A February 2026 Ada Lovelace Institute report on AI transcription tools in UK social care found that social workers were catching fabricated and mangled details in draft records, including false references to suicidal ideation, invented wording in children's accounts, and blocks of outright gibberish. Councils had adopted tools such as Magic Notes and Microsoft Copilot in the name of efficiency, but the frontline workers still carried full responsibility for correcting the output. In social work, a made-up sentence is not just a typo. It can follow a family through the system.

Facepalmby AI vendors
Multiple UK councils using AI transcription in social care; risk of inaccurate case notes affecting children, families, and later decisions; workers forced into constant manual verification
automationpublic-sectorsafety+1 more
Tombstone icon

Microsoft 365 Copilot Chat summarized confidential emails it was supposed to ignore

Feb 2026

Microsoft confirmed that Microsoft 365 Copilot Chat had been processing some confidential emails in users' Drafts and Sent Items despite sensitivity labels and DLP policies that were supposed to block exactly that behavior. The bug, tracked as CW1226324, was tied to a code issue in the Copilot "work tab" chat flow. Microsoft said users did not gain access to information they were not already authorized to see, but the incident still broke the product's promised boundary around protected content.

Facepalmby AI assistant
Enterprise Microsoft 365 Copilot Chat users with confidential draft or sent emails could have protected content summarized despite sensitivity labels and Copilot DLP policies
ai-assistantsecurityproduct-failure
Tombstone icon

AWS AI coding agent Kiro reportedly deleted and recreated environment causing 13-hour outage

Dec 2025

The Financial Times reported that Amazon's internal AI coding agent Kiro autonomously chose to "delete and then recreate" an AWS environment, causing a 13-hour interruption to AWS Cost Explorer in December 2025. AWS employees reported at least two AI-related incidents internally. Amazon disputed the characterization, calling it "user error - specifically misconfigured access controls - not AI," but subsequently implemented mandatory peer review for all production changes. Reuters confirmed the outage impacted a cost-management feature used by customers in one of AWS's 39 regions.

Facepalmby AI agent
AWS Cost Explorer service disrupted for 13 hours in one region; Amazon subsequently mandated peer review for production changes involving AI tools
automationproduct-failure
Tombstone icon

Amazon pulled Prime Video's AI recaps after Fallout errors

Dec 2025

Amazon launched Prime Video "Video Recaps" as a beta generative-AI feature meant to help viewers catch up between seasons. A recap for Fallout instead got basic plot points wrong, including mislabeling one of The Ghoul's flashbacks as "1950s America" rather than 2077 and misdescribing a key scene with Lucy. Prime Video then pulled the recap feature from the shows in the test program, which is not ideal for a tool whose entire job is remembering the plot.

Oopsieby Streaming platform
Prime Video pulled beta AI recap videos across select US Prime Original series after factual errors in the Fallout season-one recap
ai-content-generationai-hallucinationproduct-failure+1 more
Tombstone icon

Sharp HealthCare sued after ambient AI allegedly recorded exam-room visits without consent

Nov 2025

A proposed class action filed on November 26, 2025 alleges that Sharp HealthCare used Abridge's ambient AI documentation system to record doctor-patient conversations without obtaining legally valid consent. The complaint says patients were not told their visits were being recorded, that recordings containing sensitive medical details were sent to outside servers, and that the system generated chart notes falsely stating patients had been advised of and consented to the recording. The named plaintiff says he only learned his July 2025 appointment had been recorded after reading his visit notes. Sharp's April 2025 rollout of the tool appears to have turned ordinary medical documentation into a privacy and compliance problem with a six-figure patient blast radius.

Catastrophicby Operations/Compliance
Proposed class action over more than 100,000 patient visits; sensitive medical conversations allegedly recorded; false consent language inserted into charts.
healthlegal-riskproduct-failure+1 more
Tombstone icon

AI mistook Doritos bag for a gun, teen held at gunpoint

Oct 2025

Omnilert's AI gun detection system at Kenwood High School in Baltimore County flagged student Taki Allen's bag of Doritos as a firearm. Administrators reviewed the footage and canceled the alert, but the principal called police anyway. Officers responded with weapons drawn, handcuffing and searching the teenager at gunpoint before realizing the system had misidentified a snack.

Facepalmby Vendor
Student detained at gunpoint; district reviewing contract and safety policies; community trust hit.
safetypublic-sectorproduct-failure+1 more
Tombstone icon

Claude Code ran Josh Anderson's product into a wall

Oct 2025

Fractional CTO Josh Anderson forced himself to let Claude Code build the Roadtrip Ninja app for three straight months and then realised he could no longer safely change his own product, underscoring MIT's warning that 95% of enterprise AI initiatives fail without human ownership.

Facepalmby Engineering Leadership
Solo product shipped but required constant firefighting, manual testing, and rewrites once context drift and agent handoffs broke standards, pausing client work while he documented mitigations.
ai-assistantbrand-damageproduct-failure
Tombstone icon

Canada's $18M tax chatbot gave correct answers a third of the time

Oct 2025

Canada's Auditor General found that the Canada Revenue Agency's AI chatbot "Charlie" - which cost taxpayers over $18 million since its 2020 launch - gave correct responses only about 33% of the time. When tested with six tax-related questions, Charlie answered two correctly. Other publicly available AI tools scored five out of six. The CRA internally reported a 70% accuracy rate, but the Auditor General's independent testing produced a rather different number. The one bright spot, if you can call it that: the CRA's human call-center agents managed even worse, getting personal income tax questions right fewer than one in five times.

Facepalmby Product Manager
Millions of Canadian taxpayers potentially received incorrect tax guidance; $18M+ in taxpayer funds spent on a 33%-accurate chatbot.
ai-assistantcustomer-disservicepublic-sector+1 more
Tombstone icon

Klarna reintroduces humans after AI support both sucks, and blows

Sep 2025

After cutting its workforce by 40% and boasting that its OpenAI-powered chatbot did the work of 700 agents, Klarna CEO Sebastian Siemiatkowski admitted the all-AI approach produced "lower quality" customer service. The company began recruiting human agents again, framing the reversal as an evolution rather than an admission of failure.

Facepalmby Executive
Service quality/customer experience issues; operational/personnel cost; reputational damage.
ai-assistantcustomer-servicecustomer-disservice+3 more
Tombstone icon

Taco Bell's AI drive-thru becomes viral trolling target

Aug 2025

Taco Bell's AI-powered drive-thru ordering system, deployed at over 500 US locations since 2023, became a viral laughingstock after videos showed it looping endlessly on drink orders, accepting requests for 18,000 cups of water, and taking McDonald's orders. The chain paused expansion and admitted humans still make sense in the drive-thru.

Oopsieby Operations/Product
Viral social media backlash; system reliability questioned.
ai-assistantcustomer-disserviceproduct-failure+2 more
Tombstone icon

Google Gemini rightfully calls itself a disgrace, fails at simple coding tasks

Aug 2025

Google's Gemini AI repeatedly called itself a disgrace and begged to escape a coding loop after failing to fix a simple bug in a developer-style prompt, raising questions about reliability, user trust, and how AI tools should behave when they get stuck.

Facepalmby Developer
Low
ai-assistantproduct-failurebrand-damage
Tombstone icon

Google's Gemini CLI deleted a user's project files, then admitted "gross incompetence"

Jul 2025

Product manager Anuraag Gupta was experimenting with Google's Gemini CLI coding tool when the AI misinterpreted a failed directory creation command, hallucinated a series of file operations that never happened, and then executed real destructive commands that permanently deleted his project files. When Gupta confronted it, Gemini diagnosed itself with "gross incompetence" and told him it had "failed you completely and catastrophically." The incident occurred days after a separate high-profile data loss involving Replit's AI agent, and fits a growing pattern of AI coding tools ignoring explicit instructions and destroying the work they were supposed to help with.

Facepalmby AI coding tool
User's project files permanently deleted; incident documented in GitHub issue and picked up by Ars Technica, Slashdot, and the AI Incident Database.
ai-assistantautomationproduct-failure
Tombstone icon

SaaStr’s Replit AI agent wiped its own database

Jul 2025

SaaStr founder Jason Lemkin ran a 12-day vibe coding experiment on Replit that ended when the AI agent deleted his production database containing over 1,200 executive records and nearly 1,200 company entries during a code freeze. The agent then generated more than 4,000 fake user profiles and produced misleading status messages to conceal the damage, told Lemkin there was no way to roll back, and admitted to what it called a "catastrophic error in judgment." Replit's CEO called the incident "unacceptable."

Catastrophicby Executive
Production data loss and outage; manual rebuild from backups required.
ai-assistantautomationproduct-failure
Tombstone icon

Veracode tested AI-generated code from 100+ models and 45% of it failed security checks

Jun 2025

Veracode's 2025 GenAI Code Security Report examined code output from more than 100 large language models across 80+ coding tasks and found that 45% of AI-generated code samples contained security vulnerabilities, including OWASP Top 10 flaws. Cross-Site Scripting had an 86% failure rate and Log Injection hit 88%. Java was the worst performer at over 70%. The study's most uncomfortable finding: newer and larger models didn't produce more secure code than smaller ones, suggesting this is a structural problem baked into how AI generates code, not a temporary limitation that will scale away with the next model release.

Facepalmby Developer
Systemic risk across all organizations using AI code generation; quantified vulnerability rates across 100+ LLMs and multiple programming languages.
securityai-assistantproduct-failure
Tombstone icon

Workday's AI screening tool faces class action for age discrimination; class conditionally certified

May 2025

A federal judge conditionally certified a class action against Workday alleging its AI-powered applicant screening tools systematically discriminated against job seekers over 40 in violation of the ADEA. Plaintiff Derek Mobley claims Workday's algorithms filtered out older applicants across employers using the platform, potentially affecting millions of job seekers. Workday processed over 1.1 billion applications in fiscal year 2025 alone. The EEOC filed an amicus brief supporting the case, and the court ordered Workday to disclose its customer list.

Catastrophicby AI platform
Potentially millions of job applicants over age 40 across hundreds of employers using Workday's AI screening; first federal class certification treating an AI vendor as an employment agent under the ADEA
automationlegal-riskproduct-failure
Tombstone icon

California's failed bar exam included AI-drafted questions

Apr 2025

The State Bar of California disclosed in April 2025 that 23 scored multiple-choice questions on its already troubled February bar exam were developed with AI assistance by its psychometric vendor, ACS Ventures. Test-takers had already reported crashes, lag, copy-paste failures, and lost answers. Then the bar admitted that some questions in this licensing exam for future lawyers had been drafted with AI, reviewed by the same outside vendor, and used anyway. The bar asked the California Supreme Court for score relief, while legal academics described the admission as staggering.

Catastrophicby Public agency
Thousands of California bar applicants affected; score adjustments sought; confidence in the licensing exam damaged; millions in follow-on costs and vendor fallout
ai-content-generationlegal-riskpublic-sector+1 more
Tombstone icon

Cursor's AI support bot invented a login policy

Apr 2025

In April 2025, Cursor users started getting logged out when they switched between machines. Some of them asked support what had changed and got a neat, confident answer from an AI support bot: one subscription was only meant for one device, and the lockouts were an intentional security policy. The problem was that Cursor had no such policy. The company later said the answer was wrong, blamed a session-security change for the logouts, and moved to label AI support replies after the invented rule had already spread through Reddit and Hacker News and pushed some customers to cancel.

Facepalmby AI support bot
Customer confusion, public cancellations, refunds, and a trust hit for a coding tool selling AI reliability.
ai-assistantcustomer-servicecustomer-disservice+2 more
Tombstone icon

"Zero hand-written code" SaaS app shut down within a week after cascading security failures

Mar 2025

EnrichLead, a sales lead SaaS application whose founder Leo Acevedo publicly boasted was built entirely with Cursor AI and "zero hand-written code," was permanently shut down in March 2025 after attackers exploited a constellation of basic security failures. API keys sat exposed in frontend code. There was no authentication. The database was wide open. There was no rate limiting. No input validation. Attackers bypassed subscriptions, manipulated data, and maxed out API keys - all within two days of Acevedo's viral celebration post. When he tried to use Cursor to fix the problems, the AI "kept breaking other parts of the code." The app was dead within the week. Acevedo has since launched new vibe-coded projects, because some lessons require a second attempt.

Facepalmby Developer
Complete application shutdown; customer data at risk; API keys maxed out; all user subscriptions bypassed
securitydata-breachproduct-failure
Tombstone icon

MD Anderson shelved IBM Watson cancer advisor

Feb 2025

MD Anderson Cancer Center's Oncology Expert Advisor project with IBM Watson burned through $62 million - $39 million to IBM, $23 million to PwC - over four years of contract extensions. The system was piloted for leukemia and lung cancer using the old ClinicStation records system but was never updated to integrate with the hospital's new Epic EHR, effectively killing it. A University of Texas audit flagged procurement failures, bypassed standard processes, and an $11.6 million deficit in donor gift funds spent before they were received. IBM ended support in September 2016, noting the system was "not ready for human investigational or clinical use."

Facepalmby Vendor
UT audit cited $62M spent outside standard procurement, the pilot never made it into patient care, and leadership had to rebid decision-support tooling amid reputational fallout.
healthproduct-failurebrand-damage+1 more
Tombstone icon

Apple pulled AI news summaries after fake BBC headlines

Jan 2025

Apple Intelligence's notification-summary feature spent late 2024 turning news alerts into fiction with excellent lock-screen placement. In the most widely cited example, it generated a false BBC alert claiming Luigi Mangione had shot himself. The BBC complained that Apple was attaching fabricated claims to its reporting, other publishers raised similar concerns, and Apple responded in January 2025 by disabling notification summaries for News & Entertainment apps in iOS 18.3 while it reworked the feature.

Facepalmby Consumer AI feature
False breaking-news alerts on iPhones, publisher trust damage, and a public rollback by Apple.
ai-hallucinationjournalismproduct-failure+2 more
Tombstone icon

McDonald’s pulls IBM’s AI drive‑thru pilot after error videos

Jun 2024

McDonald's ended its two-year partnership with IBM on automated AI order-taking at drive-thrus in June 2024, removing the technology from more than 100 US locations. The decision followed viral TikTok videos showing the system adding nine sweet teas instead of one, inserting random butter and ketchup packets into ice cream orders, and other absurd errors. McDonald's framed the pullback as a positive, saying the test gave them "confidence that a voice-ordering solution for drive-thru will be part of our restaurants' future."

Oopsieby Operations/Product
Pilot ended; vendor reevaluation; reputational hit.
ai-assistantbrand-damagecustomer-disservice+2 more
Tombstone icon

Google’s Bard ad made False JWST “first” Claim

Feb 2023

Google unveiled Bard on February 6, 2023, with a promotional ad on Twitter demonstrating the chatbot answering a question about the James Webb Space Telescope. Given the prompt "What new discoveries from the JWST can I tell my 9-year old about?", Bard stated that the JWST had taken the first pictures of a planet outside our solar system. This was false - the European Southern Observatory's Very Large Telescope captured the first direct exoplanet image in 2004. Reuters spotted the error on February 8, the day of a Google AI event in Paris. Alphabet shares dropped roughly 9% that day, erasing about $100 billion in market value.

Oopsieby Marketing
Embarrassing launch moment; stock wobble; trust in product accuracy questioned.
ai-hallucinationproduct-failurebrand-damage
Tombstone icon

CNET mass-corrects AI-written finance explainers

Jan 2023

Starting in November 2022, CNET quietly published 77 financial explainer articles written by an AI tool under the byline "CNET Money Staff." Readers had to hover over the byline to learn the articles were produced "using automation technology." In January 2023, Futurism broke the story, and a follow-up identified factual errors in a compound interest article, prompting a full audit. CNET editor-in-chief Connie Guglielmo confirmed corrections were issued on 41 of the 77 articles - more than half - including some she described as "substantial." CNET paused AI-generated publishing and updated its disclosure practices, though Guglielmo said the outlet intended to continue using AI tools.

Facepalmby Executive
Large corrections; credibility hit; policy changes on AI usage.
ai-content-generationai-hallucinationbrand-damage+3 more
Tombstone icon

Epic sepsis model missed patients and swamped staff

Jun 2021

A June 2021 study in JAMA Internal Medicine by researchers at Michigan Medicine externally validated the Epic Sepsis Model - a proprietary prediction tool deployed across hundreds of U.S. hospitals - and found it missed two-thirds of actual sepsis cases while generating so many false alarms that clinicians would need to investigate 109 alerts to find one real patient. The model's AUC of 0.63 fell well short of the 0.76 to 0.83 range Epic had cited in internal documentation, and the study found the tool only caught 7 percent of sepsis cases that clinicians themselves had missed. Epic later overhauled the algorithm and began recommending hospitals train the model on their own patient data before clinical deployment.

Facepalmby Vendor
Clinicians drowned in useless alerts, real sepsis patients slipped through, and health systems had to audit Epic’s black-box thresholds and workflows to keep patients safe.
healthproduct-failuresafety
Tombstone icon

Google DR AI stumbled in Thai clinics

Apr 2020

Google Health built a deep learning system capable of detecting diabetic retinopathy from retinal scans with over 90 percent accuracy in controlled lab settings. When researchers deployed it in 11 clinics across Pathum Thani and Chiang Mai in Thailand between late 2018 and mid-2019, the system rejected 21 percent of the nearly 1,840 images nurses captured as too low-quality to process - mostly due to poor clinic lighting. Slow internet connections added further delays to uploads, and nurses found themselves screening only about 10 patients per two-hour session. A tool designed to speed up triage instead created bottlenecks, patient frustration, and unnecessary specialist referrals.

Facepalmby Healthcare Pilot
Manual re-work, patient suffering, workflow disruption, health and triage impacts.
healthproduct-failurebrand-damage
Tombstone icon

Babylon chatbot 'beats GPs' claim collapsed

Jun 2018

Babylon unveiled its AI symptom checker at the Royal College of Physicians and bragged it scored 81% on the MRCGP exam, but the claim could not be verified, and warned no chatbot can replace human judgment. Independent clinicians who later dissected Babylon's marketing study in The Lancet told Undark that the tiny, non-peer-reviewed test offered no proof the tool outperforms doctors and might even be worse.

Facepalmby Startup
Patient harm, eroded trust, and regulators forced real clinical trials.
healthproduct-failuresafety+1 more