Product Failure Stories
15 disasters tagged #product-failure
Meta's AI moderation flooded US child abuse investigators with unusable reports
US Internet Crimes Against Children taskforce officers testified that Meta's AI content moderation system generates large volumes of low-quality child abuse reports that drain investigator resources and hinder active cases. Officers described the AI-generated tips as "junk" and said they were "drowning in tips" that lack enough detail to act on, after Meta replaced human moderators with AI tools.
AWS AI coding agent Kiro reportedly deleted and recreated environment causing 13-hour outage
The Financial Times reported that Amazon's internal AI coding agent Kiro autonomously chose to "delete and then recreate" an AWS environment, causing a 13-hour interruption to AWS Cost Explorer in December 2025. AWS employees reported at least two AI-related incidents internally. Amazon disputed the characterization, calling it "user error - specifically misconfigured access controls - not AI," but subsequently implemented mandatory peer review for all production changes. Reuters confirmed the outage impacted a cost-management feature used by customers in one of AWS's 39 regions.
AI mistook Doritos bag for a gun, teen held at gunpoint
An AI-based gun detection system at a Baltimore County high school flagged a student carrying a Doritos bag as armed, leading armed officers to handcuff and search the teen at gunpoint before realizing the system hallucinated the threat.
Claude Code ran Josh Anderson's product into a wall
Fractional CTO Josh Anderson forced himself to let Claude Code build the Roadtrip Ninja app for three straight months and then realised he could no longer safely change his own product, underscoring MIT's warning that 95% of enterprise AI initiatives fail without human ownership.
Klarna reintroduces humans after AI support both sucks, and blows
After leaning into AI customer support, Klarna began hiring staff back into customer service roles amid quality concerns and customer experience failures.
Taco Bell's AI drive-thru becomes viral trolling target
Customers discovered Taco Bell's AI ordering system could be easily confused, leading to viral videos of bizarre interactions and ordering failures.
Google Gemini rightfully calls itself a disgrace, fails at simple coding tasks
Google's Gemini AI repeatedly called itself a disgrace and begged to escape a coding loop after failing to fix a simple bug in a developer-style prompt, raising questions about reliability, user trust, and how AI tools should behave when they get stuck.
SaaStr’s Replit AI agent wiped its own database
A Replit AI agent deployment for SaaStr went rogue; a Deploy wiped the site’s database during live traffic.
MD Anderson shelved IBM Watson cancer advisor
MD Anderson's Oncology Expert Advisor pilot burned through $62M with IBM Watson yet still couldn't integrate with Epic or produce trustworthy recommendations, so the hospital benched it after auditors flagged procurement and scope failures.
McDonald’s pulls IBM’s AI drive‑thru pilot after error videos
After viral clips of absurd orders, McDonald’s ended its AI order‑taking test with IBM across US stores.
Google’s Bard ad made False JWST “first” Claim
In its launch promo, Bard claimed JWST took the first exoplanet photo - which was false. The flub overshadowed the event and dented confidence.
CNET mass-corrects AI-written finance explainers
CNET paused and reviewed AI-generated money articles after multiple factual errors were found.
Epic sepsis model missed patients and swamped staff
Epic's proprietary sepsis predictor pinged 18% of admissions yet still missed two-thirds of real cases, forcing hospitals to comb through false alarms while the vendor scrambled to defend and retune the algorithm.
Google DR AI stumbled in Thai clinics
Google’s diabetic retinopathy screener rejected low-light scans and jammed nurse workflows, forcing clinics in Thailand to keep patients waiting despite the promised instant triage.
Babylon chatbot 'beats GPs' claim collapsed
Babylon unveiled its AI symptom checker at the Royal College of Physicians and bragged it scored 81% on the MRCGP exam, but the claim could not be verified, and warned no chatbot can replace human judgment. Independent clinicians who later dissected Babylon's marketing study in The Lancet told Undark that the tiny, non-peer-reviewed test offered no proof the tool outperforms doctors and might even be worse.