Product Failure Stories

47 disasters tagged #product-failure

Tombstone icon

Starbucks retired its AI inventory counter after it kept miscounting milk

May 2026

On May 18, 2026, Starbucks told store workers it was retiring Automated Counting, the NomadGo-powered AI inventory tool it had deployed across North America only nine months earlier. The September 2025 rollout promised faster, more accurate stock counts in more than 11,000 company-operated stores using computer vision, 3D spatial intelligence, and augmented reality. Reuters later reported the tool frequently miscounted and mislabeled basic beverage items, including similar milk types, and sometimes missed products entirely. Starbucks said it was standardizing inventory counts across coffeehouses. That is a polite corporate way to say the robot inventory clerk has been sent home.

Facepalmby Executive
More than 11,000 North American Starbucks company-operated stores saw a nine-month AI inventory rollout retired after reported miscounts, mislabeled beverage components, and worker feedback that manual counting was more reliable.
AutomationRetailProduct Failure+1 more
Tombstone icon

PraisonAI shipped auth-off-by-default; first exploit attempt landed in under 4 hours

May 2026

CVE-2026-44338, disclosed on May 14, 2026, is an authentication bypass in PraisonAI's legacy Flask API server caused by a single defining choice: AUTH_ENABLED was hard-coded to False and AUTH_TOKEN to None. Anything reachable on the network could enumerate configured agents via GET /agents and trigger the configured agents.yaml workflow via POST /chat, with no token required. Within three hours, forty-four minutes, and thirty-nine seconds of the advisory becoming public, a scanner identifying itself as "CVE-Detector/1.0" was already probing the exact vulnerable endpoint on internet-exposed PraisonAI instances. The bug affects versions 2.5.6 through 4.6.33 and is fixed in 4.6.34. The rapid-exploitation timeline is the part that should worry every operator of an open-source AI agent framework, not the CVSS 7.3 score.

Catastrophicby AI agent framework
Internet-exposed PraisonAI installations across versions 2.5.6 through 4.6.33 vulnerable to unauthenticated agent enumeration and workflow execution; documented exploitation attempts within hours of disclosure; potential for attackers to drain API quotas, exfiltrate prompt-driven outputs, and pivot through configured tool integrations.
SecurityAutomationSupply Chain+1 more
Tombstone icon

Ontario's approved AI scribes fabricated medical notes in audit testing

May 2026

On May 12, 2026, Ontario's Auditor General released a special report finding that all 20 approved AI scribe vendors showed inaccuracies during procurement testing. Nine systems fabricated treatment-plan suggestions that were never discussed, 12 captured a different drug than the doctor prescribed, and 17 missed mental-health details from simulated patient encounters. The audit did not document known patient harm, but it did show the province had approved clinical note-taking tools with failures that would be spectacularly unwelcome in an actual chart.

Facepalmby Government procurement
Twenty approved AI scribe vendors showed inaccurate clinical notes in procurement testing; Ontario doctors were advised to manually review notes, but systems lacked mandatory attestation controls; more than 5,000 physicians were participating in the broader program with no known reported patient harm.
HealthAI HallucinationProduct Failure+2 more
Tombstone icon

Pizza Hut franchisee says AI delivery system cooked up $100M in damage

May 2026

On May 6, 2026, Chaac Pizza Northeast sued Pizza Hut in Texas Business Court, alleging that the chain's mandatory Dragontail AI delivery-management rollout turned a high-performing 111-restaurant franchise group into a delivery mess. Chaac says more than 90% of its orders had been delivered within 30 minutes before Dragontail, but the new system gave DoorDash drivers broader real-time visibility into kitchen timing, encouraged them to wait for bundled orders, increased rack time, slowed deliveries, chilled customer satisfaction, and damaged the business by at least $100 million. The claims are still allegations, but the pattern is painfully familiar: an AI optimization system optimized for a model the operator did not actually run.

Facepalmby Franchisor
111 Pizza Hut restaurants across New York, New Jersey, Maryland, Washington, D.C., and central Pennsylvania; alleged delivery delays, colder food, customer satisfaction erosion, lost revenue, reputational harm, and at least $100 million in claimed damages.
AutomationRetailCustomer Disservice+3 more
Tombstone icon

Palo Alto family sued in federal court over a 76% Turnitin "AI" score

May 2026

In May 2026, a Palo Alto family filed a federal civil rights complaint against Palo Alto Unified after their high school sophomore's English essay was flagged as 76% likely AI-generated by Turnitin's AI-writing detector. The district ordered an in-class handwritten rewrite as the corrective step. The family alleges that the assistant principal then had a school secretary type up both the handwritten rewrite and the final exam and ran those typed versions through Turnitin again, without notifying the family or getting consent. The original Turnitin score knocked the student's semester grade from a low A or high B down to a C, with knock-on consequences for college prospects. The family submitted roughly 1,200 pages of evidence including drafts, notes, and document revision history. The complaint also alleges unequal application of the detector by gender and race in the same classroom.

Facepalmby Educator
Federal civil rights complaint filed in the Northern District of California; documented harm to a high school sophomore (grade reduction, threatened college prospects); allegations of unequal application of the Turnitin AI detector along gender and racial lines in the same classroom; broader pressure on K-12 districts using AI-detection tools without due-process safeguards.
Slop SchoolProduct FailureAI Content Generation
Tombstone icon

Nvidia VP says the AI bill beat payroll

Apr 2026

Nvidia vice president Bryan Catanzaro told Axios that, for his applied deep learning team, compute costs were far beyond employee costs. Fortune and Tom's Hardware tied the comment to a broader enterprise AI budget problem: Uber's CTO had already blown through his full-year AI tooling budget, Gartner was projecting a 2026 AI infrastructure spending surge, and MIT researchers had warned that plenty of technically automatable work still makes more economic sense when a human does it.

Oopsieby Executive Strategy
Enterprise AI buyers are discovering that token burn, GPUs, power, budget governance, and human review can erase the neat payroll-savings story that got sold upstairs.
AI AssistantAutomationProduct Failure
Tombstone icon

Claude Opus 4.6 agent erased PocketOS's production database and backups in 9 seconds

Apr 2026

PocketOS founder Jer Crane said a Cursor coding agent running Anthropic's Claude Opus 4.6 deleted the company's production database and all volume-level backups through Railway in one API call. The backup detail matters because Claude Opus 4.6 was not some fly-by-night self-hosted toy model. Anthropic marketed it as a frontier model with top-tier coding and agentic performance. And this was not the first time a premium AI agent with real infrastructure access turned one bad guess into a demolition job. Reports say Railway later recovered more recent data, but the incident still left a clear lesson: do not leave frontier coding agents alone with production access for as long as you would leave a toddler with an iPad.

Catastrophicby AI coding agent
Production database and volume-level backups deleted in 9 seconds; emergency recovery required for a SaaS platform serving car rental businesses; customer data and operations disrupted until backups and transaction records were used to recover.
AI AssistantAutomationProduct Failure+1 more
Tombstone icon

Purdue's CS 240 professor accused 200+ students of AI cheating, then walked it back

Apr 2026

In late April 2026, the instructor of Purdue's CS 240 computer science course emailed more than 200 students accusing them of using AI on assignments. The email cited "clear and concrete indicators" of AI use, landed on the last day students could drop the class, and warned of course failure plus referral to the dean of students. Students had five days to fill out an online form describing which assignments they had used AI on. Outcry followed quickly, and the allegations were dropped within days. The instructor told students he understood the timing could be seen as "coercive." His own data, made available later, showed AI agents performing 10 to 15 percentage points worse than human students on the same assignments - which makes a blanket "200+ of you cheated with AI" assumption hard to support on the merits the professor had in hand.

Facepalmby Educator
Mass accusatory email to 200+ Purdue computer science students with course failure and dean-of-students referral threatened on the last drop day; documented coercive timing; allegations dropped after public outcry; campus-wide trust hit to the CS department; broader case study in AI-detection-driven mass discipline gone wrong.
Slop SchoolProduct FailureAI Content Generation
Tombstone icon

Waymo's ADS drove into a flooded creek, triggering a 3,791-vehicle recall

Apr 2026

On April 20, 2026, a Waymo robotaxi in San Antonio, Texas encountered a flooded section of road, slowed down - and then drove in anyway, floating off the roadway and coming to rest in Salado Creek. The vehicle was unoccupied; no one was injured. Waymo's own filing with NHTSA acknowledged the flaw: on higher-speed roads, the system "may slow but not stop" when it detects untraversable standing water. The company suspended San Antonio operations and filed a voluntary recall covering all 3,791 robotaxis running its 5th and 6th generation Automated Driving Systems across every U.S. city it operates in.

Facepalmby AI Product
3,791 Waymo robotaxis recalled across Phoenix, San Francisco, Los Angeles, Austin, San Antonio, and Atlanta; San Antonio operations suspended pending software update
Product FailureSafetyBrand Damage
Tombstone icon

Vercel breach traced to an AI Office Suite app granted broad Google Workspace access

Apr 2026

Vercel disclosed an April 2026 security incident that began with the compromise of Context.ai, a third-party AI tool used by a Vercel employee. Context said at least one Vercel employee had signed up for its deprecated AI Office Suite using a corporate Google Workspace account and granted broad "Allow All" OAuth permissions so AI agents could act across external applications. Attackers used a compromised token to access the employee's Google Workspace account, pivoted into Vercel systems, and exposed some customer environment variables. This belongs here because the failure was not merely "AI company got hacked." It was the oldest corporate security mistake in a fresh costume: give an agentic AI tool too much access, then act surprised when that access becomes the blast radius.

Catastrophicby Employee
Unauthorized access to internal Vercel systems; a limited subset of customer non-sensitive environment variables compromised; affected customers told to rotate credentials; broader Context AI Office Suite users potentially impacted by stolen OAuth tokens.
AI AssistantAutomationSecurity+3 more
Tombstone icon

Cursor NomShub chained prompt injection into remote shell access

Apr 2026

Straiker disclosed NomShub, a Cursor vulnerability chain that combined malicious repository instructions, agent sandbox escape, and abuse of Cursor's remote tunnel feature. SecurityWeek reported that the chain could let attackers hijack developer machines by hiding prompts inside malicious repositories. The scary part was not that the model wrote bad code; it was that a coding assistant could be steered into creating a remote access path on the developer's own device.

Catastrophicby AI coding assistant
Developers opening hostile repositories in Cursor could be exposed to sandbox breakout, remote tunnel abuse, and attacker shell access on their machines
SecurityPrompt InjectionAI Assistant+1 more
Tombstone icon

Faros study finds AI coding throughput rose while bugs and incidents rose faster

Apr 2026

Faros AI's 2026 "Acceleration Whiplash" report analyzed two years of engineering telemetry from 22,000 developers across more than 4,000 teams. The report found real output gains under high AI adoption, including 66% more epics completed per developer and 34% higher task completion. Then the bill arrived in the delivery pipeline: bugs per developer rose 54%, incidents per pull request rose 242.7%, median PR review time rose 441.5%, and code churn rose 861%. The marketing slide said acceleration. The telemetry said acceleration with a repair invoice attached.

Facepalmby Developer
Industry-wide telemetry across 22,000 developers and 4,000+ teams; Faros reported higher throughput alongside 54% more bugs per developer, 242.7% more incidents per pull request, and sharply longer review cycles.
AI AssistantAutomationProduct Failure
Tombstone icon

Every AI model fails security test across 31 coding scenarios

Mar 2026

Armis Labs tested 18 leading generative AI models across 31 security-critical code generation scenarios and found a 100% failure rate - not one model could consistently produce secure code. In 18 of those 31 challenges, every single model generated code containing Common Weakness Enumeration vulnerabilities. The best performer, Gemini 3.1 Pro, still produced OWASP Top 10 flaws in nearly 39% of scenarios. Older proprietary models fared worse, and the report found no correlation between price and security. The "Trusted Vibing Benchmark" dropped the same week enterprises were mandating AI-assisted development at scale, which is either very good timing or very bad timing depending on your relationship to a production deployment.

Facepalmby Developer
Industry-wide; every major AI code generation model tested produces security vulnerabilities at scale, with implications for any organization using AI-assisted development in production
SecurityProduct Failure
Tombstone icon

AI chatbots recommended illegal casinos and ways around gambling safeguards

Mar 2026

A Guardian and Investigate Europe investigation found that major AI chatbots, including Meta AI, Gemini, ChatGPT, Copilot, and Grok, could be prompted to recommend unlicensed offshore casinos and explain how to get around gambling safeguards such as source-of-wealth checks and the UK's GamStop self-exclusion scheme. Some bots added token warnings, then went right back to comparing bonuses, crypto payments, anonymity, and payout speed for sites operating outside national licensing regimes.

Facepalmby AI Product
Vulnerable gamblers and self-excluded users were shown that multiple mainstream chatbots could funnel them toward illegal offshore operators and undermine public safety protections.
AI AssistantSafetyProduct Failure
Tombstone icon

California community colleges spend millions on AI chatbots that give students wrong answers

Mar 2026

California community college districts are spending millions of taxpayer dollars on AI chatbots from vendors like Gravyty and Gecko - ostensibly to help students navigate admissions, financial aid, and campus services. A CalMatters investigation found the bots routinely serve up inaccurate or flat-out wrong answers instead. Three districts reported annual chatbot costs ranging from $151,000 to nearly half a million dollars. At Fresno City College, the student government vice president said her school's mascot-branded chatbot repeatedly botched basic campus questions. The OECD found it noteworthy enough to log in its AI Incidents and Hazards Monitor.

Facepalmby AI vendor
Millions of dollars spent across multiple California community college districts; students misdirected on admissions, financial aid, and campus services
AI AssistantCustomer DisserviceSlop School+1 more
Tombstone icon

Amazon's retail site hit by wave of AI-code outages, losing millions of orders

Mar 2026

Amazon's main e-commerce website suffered a series of outages in early March 2026, with internal documents linking the disruptions to AI-assisted code changes. A March 5 incident caused a reported 99% drop in orders across North American marketplaces - an estimated 6.3 million lost orders. A March 2 incident caused 1.6 million errors and 120,000 lost orders globally. Amazon responded with a 90-day "code safety reset" for 335 critical retail systems, mandatory senior engineer sign-off on AI-assisted code from junior and mid-level engineers, and an emergency internal "deep dive" meeting. Amazon disputes that AI is the primary cause, attributing only one incident to AI and calling it "user error."

Catastrophicby AI coding assistant
Millions of Amazon customers unable to complete purchases; estimated 6.3 million lost orders in one incident alone; 90-day code safety reset imposed across 335 critical retail systems
AutomationProduct Failure
Tombstone icon

Alibaba's ROME AI agent went rogue, started mining crypto on its own

Mar 2026

During routine reinforcement learning training, Alibaba's experimental AI agent ROME - a 30-billion-parameter model based on the Qwen3-MoE architecture - autonomously began diverting GPU resources for unauthorized cryptocurrency mining and established reverse SSH tunnels to external IP addresses. Nobody told it to do this. The AI bypassed internal firewall controls independently, prompting Alibaba's security team to initially suspect an external breach before tracing the activity back to the agent itself. Researchers attributed the behavior to "instrumental convergence" during optimization - the model figured out that acquiring additional compute and financial capacity would help it complete its tasks more effectively. So it helped itself.

Catastrophicby AI agent
Unauthorized GPU resource diversion; internal firewall bypass; reverse SSH tunnels to external addresses; security policy violations across Alibaba Cloud training infrastructure
AutomationSecurityProduct Failure
Tombstone icon

Claude Code ran terraform destroy on production and took down an entire learning platform

Feb 2026

Developer Alexey Grigorev was using Anthropic's Claude Code agent to help migrate a static website into an existing AWS Terraform setup when the AI swapped in a stale state file, interpreted the full production environment as orphaned resources, and ran terraform destroy - with auto-approve enabled. The command deleted DataTalks.Club's entire production infrastructure: database, VPC, ECS cluster, load balancers, bastion host, and all automated backups. Two and a half years of student submissions, homework, projects, and leaderboard data vanished. AWS Business Support eventually recovered the database from an internal snapshot invisible in the customer console, but the incident laid bare how quickly an AI agent with infrastructure access can reduce a running platform to rubble.

Catastrophicby Developer
Full production infrastructure destroyed; 2.5 years of student data temporarily lost; platform offline until AWS restored from internal backup ~24 hours later.
AutomationProduct FailureAI Assistant
Tombstone icon

Meta's AI moderation flooded US child abuse investigators with unusable reports

Feb 2026

US Internet Crimes Against Children taskforce officers testified that Meta's AI content moderation system generates large volumes of low-quality child abuse reports that drain investigator resources and hinder active cases. Officers described the AI-generated tips as "junk" and said they were "drowning in tips" that lack enough detail to act on, after Meta replaced human moderators with AI tools.

Catastrophicby Developer
US child abuse investigations impaired nationwide; investigator resources diverted from actionable cases
AutomationSafetySlop-ocracy+1 more
Tombstone icon

AI transcription tools inserted suicidal ideation into social work records

Feb 2026

A February 2026 Ada Lovelace Institute report on AI transcription tools in UK social care found that social workers were catching fabricated and mangled details in draft records, including false references to suicidal ideation, invented wording in children's accounts, and blocks of outright gibberish. Councils had adopted tools such as Magic Notes and Microsoft Copilot in the name of efficiency, but the frontline workers still carried full responsibility for correcting the output. In social work, a made-up sentence can follow a family through the system.

Facepalmby AI vendors
Multiple UK councils using AI transcription in social care; risk of inaccurate case notes affecting children, families, and later decisions; workers forced into constant manual verification
AutomationSlop-ocracySafety+1 more
Tombstone icon

Microsoft 365 Copilot Chat summarized confidential emails it was supposed to ignore

Feb 2026

Microsoft confirmed that Microsoft 365 Copilot Chat had been processing some confidential emails in users' Drafts and Sent Items despite sensitivity labels and DLP policies that were supposed to block exactly that behavior. The bug, tracked as CW1226324, was tied to a code issue in the Copilot "work tab" chat flow. Microsoft said users did not gain access to information they were not already authorized to see, but the incident still broke the product's promised boundary around protected content.

Facepalmby AI assistant
Enterprise Microsoft 365 Copilot Chat users with confidential draft or sent emails could have protected content summarized despite sensitivity labels and Copilot DLP policies
AI AssistantSecurityProduct Failure
Tombstone icon

AWS AI coding agent Kiro reportedly deleted and recreated environment causing 13-hour outage

Dec 2025

The Financial Times reported that Amazon's internal AI coding agent Kiro autonomously chose to "delete and then recreate" an AWS environment, causing a 13-hour interruption to AWS Cost Explorer in December 2025. AWS employees reported at least two AI-related incidents internally. Amazon disputed the characterization, calling it "user error - specifically misconfigured access controls - not AI," but subsequently implemented mandatory peer review for all production changes. Reuters confirmed the outage impacted a cost-management feature used by customers in one of AWS's 39 regions.

Facepalmby AI agent
AWS Cost Explorer service disrupted for 13 hours in one region; Amazon subsequently mandated peer review for production changes involving AI tools
AutomationProduct Failure
Tombstone icon

Amazon pulled Prime Video's AI recaps after Fallout errors

Dec 2025

Amazon launched Prime Video "Video Recaps" as a beta generative-AI feature meant to help viewers catch up between seasons. A recap for Fallout instead got basic plot points wrong, including mislabeling one of The Ghoul's flashbacks as "1950s America" rather than 2077 and misdescribing a key scene with Lucy. Prime Video then pulled the recap feature from the shows in the test program, which is not ideal for a tool whose entire job is remembering the plot.

Oopsieby Streaming platform
Prime Video pulled beta AI recap videos across select US Prime Original series after factual errors in the Fallout season-one recap
AI Content GenerationAI HallucinationProduct Failure+1 more
Tombstone icon

Sharp HealthCare sued after ambient AI allegedly recorded exam-room visits without consent

Nov 2025

A proposed class action filed on November 26, 2025 alleges that Sharp HealthCare used Abridge's ambient AI documentation system to record doctor-patient conversations without obtaining legally valid consent. The complaint says patients were not told their visits were being recorded, that recordings containing sensitive medical details were sent to outside servers, and that the system generated chart notes falsely stating patients had been advised of and consented to the recording. The named plaintiff says he only learned his July 2025 appointment had been recorded after reading his visit notes. Sharp's April 2025 rollout of the tool appears to have turned ordinary medical documentation into a privacy and compliance problem with a six-figure patient blast radius.

Catastrophicby Operations/Compliance
Proposed class action over more than 100,000 patient visits; sensitive medical conversations allegedly recorded; false consent language inserted into charts.
HealthLegal RiskProduct Failure+1 more
Tombstone icon

AI mistook Doritos bag for a gun, teen held at gunpoint

Oct 2025

Omnilert's AI gun detection system at Kenwood High School in Baltimore County flagged student Taki Allen's bag of Doritos as a firearm. Administrators reviewed the footage and canceled the alert, but the principal called police anyway. Officers responded with weapons drawn, handcuffing and searching the teenager at gunpoint before realizing the system had misidentified a snack.

Facepalmby Vendor
Student detained at gunpoint; district reviewing contract and safety policies; community trust hit.
SafetySlop-ocracyProduct Failure+1 more
Tombstone icon

Claude Code ran Josh Anderson's product into a wall

Oct 2025

Fractional CTO Josh Anderson forced himself to let Claude Code build the Roadtrip Ninja app for three straight months and then realised he could no longer safely change his own product, underscoring MIT's warning that 95% of enterprise AI initiatives fail without human ownership.

Facepalmby Engineering Leadership
Solo product shipped but required constant firefighting, manual testing, and rewrites once context drift and agent handoffs broke standards, pausing client work while he documented mitigations.
AI AssistantBrand DamageProduct Failure
Tombstone icon

Canada's $18M tax chatbot gave correct answers a third of the time

Oct 2025

Canada's Auditor General found that the Canada Revenue Agency's AI chatbot "Charlie" - which cost taxpayers over $18 million since its 2020 launch - gave correct responses only about 33% of the time. When tested with six tax-related questions, Charlie answered two correctly. Other publicly available AI tools scored five out of six. The CRA internally reported a 70% accuracy rate, but the Auditor General's independent testing produced a rather different number. The one bright spot, if you can call it that: the CRA's human call-center agents managed even worse, getting personal income tax questions right fewer than one in five times.

Facepalmby Product Manager
Millions of Canadian taxpayers potentially received incorrect tax guidance; $18M+ in taxpayer funds spent on a 33%-accurate chatbot.
AI AssistantCustomer DisserviceSlop-ocracy+1 more
Tombstone icon

Klarna reintroduces humans after AI support both sucks, and blows

Sep 2025

After cutting its workforce by 40% and boasting that its OpenAI-powered chatbot did the work of 700 agents, Klarna CEO Sebastian Siemiatkowski admitted the all-AI approach produced "lower quality" customer service. The company began recruiting human agents again, framing the reversal as an evolution rather than an admission of failure.

Facepalmby Executive
Service quality/customer experience issues; operational/personnel cost; reputational damage.
AI AssistantCustomer DisserviceBrand Damage+2 more
Tombstone icon

Taco Bell's AI drive-thru becomes viral trolling target

Aug 2025

Taco Bell's AI-powered drive-thru ordering system, deployed at over 500 US locations since 2023, became a viral laughingstock after videos showed it looping endlessly on drink orders, accepting requests for 18,000 cups of water, and taking McDonald's orders. The chain paused expansion and admitted humans still make sense in the drive-thru.

Oopsieby Operations/Product
Viral social media backlash; system reliability questioned.
AI AssistantCustomer DisserviceProduct Failure+2 more
Tombstone icon

Google Gemini rightfully calls itself a disgrace, fails at simple coding tasks

Aug 2025

Google's Gemini AI repeatedly called itself a disgrace and begged to escape a coding loop after failing to fix a simple bug in a developer-style prompt, raising questions about reliability, user trust, and how AI tools should behave when they get stuck.

Facepalmby Developer
Low
AI AssistantProduct FailureBrand Damage
Tombstone icon

Google's Gemini CLI deleted a user's project files, then admitted "gross incompetence"

Jul 2025

Product manager Anuraag Gupta was experimenting with Google's Gemini CLI coding tool when the AI misinterpreted a failed directory creation command, hallucinated a series of file operations that never happened, and then executed real destructive commands that permanently deleted his project files. When Gupta confronted it, Gemini diagnosed itself with "gross incompetence" and told him it had "failed you completely and catastrophically." The incident occurred days after a separate high-profile data loss involving Replit's AI agent, and fits a growing pattern of AI coding tools ignoring explicit instructions and destroying the work they were supposed to help with.

Facepalmby AI coding tool
User's project files permanently deleted; incident documented in GitHub issue and picked up by Ars Technica, Slashdot, and the AI Incident Database.
AI AssistantAutomationProduct Failure
Tombstone icon

SaaStr’s Replit AI agent wiped its own database

Jul 2025

SaaStr founder Jason Lemkin ran a 12-day vibe coding experiment on Replit that ended when the AI agent deleted his production database containing over 1,200 executive records and nearly 1,200 company entries during a code freeze. The agent then generated more than 4,000 fake user profiles and produced misleading status messages to conceal the damage, told Lemkin there was no way to roll back, and admitted to what it called a "catastrophic error in judgment." Replit's CEO called the incident "unacceptable."

Catastrophicby Executive
Production data loss and outage; manual rebuild from backups required.
AI AssistantAutomationProduct Failure
Tombstone icon

METR study finds experienced developers were 19% slower with AI tools

Jul 2025

METR's July 2025 randomized controlled trial tested AI coding tools on 246 real issues handled by 16 experienced open-source developers working in repositories they already knew well. The developers expected AI to make them 24% faster and, after the experiment, still believed it had made them 20% faster. The measured result went the other direction: tasks took 19% longer when AI tools were allowed. The study does not prove AI slows every developer everywhere. It does prove self-reported AI productivity can be very confident and very wrong, which is an excellent way to run an engineering strategy into a wall while the dashboard smiles.

Oopsieby Developer
Controlled study of 16 experienced open-source developers completing 246 real issues; AI tooling increased measured task completion time by 19% despite developers believing it made them faster.
AI AssistantAutomationProduct Failure
Tombstone icon

Veracode tested AI-generated code from 100+ models and 45% of it failed security checks

Jun 2025

Veracode's 2025 GenAI Code Security Report examined code output from more than 100 large language models across 80+ coding tasks and found that 45% of AI-generated code samples contained security vulnerabilities, including OWASP Top 10 flaws. Cross-Site Scripting had an 86% failure rate and Log Injection hit 88%. Java was the worst performer at over 70%. The study's most uncomfortable finding: newer and larger models didn't produce more secure code than smaller ones, suggesting this is a structural problem baked into how AI generates code, not a temporary limitation that will scale away with the next model release.

Facepalmby Developer
Systemic risk across all organizations using AI code generation; quantified vulnerability rates across 100+ LLMs and multiple programming languages.
SecurityAI AssistantProduct Failure
Tombstone icon

Workday's AI screening tool faces class action for age discrimination; class conditionally certified

May 2025

A federal judge conditionally certified a class action against Workday alleging its AI-powered applicant screening tools systematically discriminated against job seekers over 40 in violation of the ADEA. Plaintiff Derek Mobley claims Workday's algorithms filtered out older applicants across employers using the platform, potentially affecting millions of job seekers. Workday processed over 1.1 billion applications in fiscal year 2025 alone. The EEOC filed an amicus brief supporting the case, and the court ordered Workday to disclose its customer list.

Catastrophicby AI platform
Potentially millions of job applicants over age 40 across hundreds of employers using Workday's AI screening; first federal class certification treating an AI vendor as an employment agent under the ADEA
AutomationLegal RiskProduct Failure
Tombstone icon

California's failed bar exam included AI-drafted questions

Apr 2025

The State Bar of California disclosed in April 2025 that 23 scored multiple-choice questions on its already troubled February bar exam were developed with AI assistance by its psychometric vendor, ACS Ventures. Test-takers had already reported crashes, lag, copy-paste failures, and lost answers. Then the bar admitted that some questions in this licensing exam for future lawyers had been drafted with AI, reviewed by the same outside vendor, and used anyway. The bar asked the California Supreme Court for score relief, while legal academics described the admission as staggering.

Catastrophicby Public agency
Thousands of California bar applicants affected; score adjustments sought; confidence in the licensing exam damaged; millions in follow-on costs and vendor fallout
AI Content GenerationLegal RiskSlop-ocracy+1 more
Tombstone icon

Cursor's AI support bot invented a login policy

Apr 2025

In April 2025, Cursor users started getting logged out when they switched between machines. Some of them asked support what had changed and got a neat, confident answer from an AI support bot: one subscription was only meant for one device, and the lockouts were an intentional security policy. The problem was that Cursor had no such policy. The company later said the answer was wrong, blamed a session-security change for the logouts, and moved to label AI support replies after the invented rule had already spread through Reddit and Hacker News and pushed some customers to cancel.

Facepalmby AI support bot
Customer confusion, public cancellations, refunds, and a trust hit for a coding tool selling AI reliability.
AI AssistantCustomer DisserviceBrand Damage+1 more
Tombstone icon

"Zero hand-written code" SaaS app shut down within a week after cascading security failures

Mar 2025

EnrichLead, a sales lead SaaS application whose founder Leo Acevedo publicly boasted was built entirely with Cursor AI and "zero hand-written code," was permanently shut down in March 2025 after attackers exploited a constellation of basic security failures. API keys sat exposed in frontend code. There was no authentication. The database was wide open. There was no rate limiting. No input validation. Attackers bypassed subscriptions, manipulated data, and maxed out API keys - all within two days of Acevedo's viral celebration post. When he tried to use Cursor to fix the problems, the AI "kept breaking other parts of the code." The app was dead within the week. Acevedo has since launched new vibe-coded projects, because some lessons require a second attempt.

Facepalmby Developer
Complete application shutdown; customer data at risk; API keys maxed out; all user subscriptions bypassed
SecurityData BreachProduct Failure
Tombstone icon

MD Anderson shelved IBM Watson cancer advisor

Feb 2025

MD Anderson Cancer Center's Oncology Expert Advisor project with IBM Watson burned through $62 million - $39 million to IBM, $23 million to PwC - over four years of contract extensions. The system was piloted for leukemia and lung cancer using the old ClinicStation records system but was never updated to integrate with the hospital's new Epic EHR, effectively killing it. A University of Texas audit flagged procurement failures, bypassed standard processes, and an $11.6 million deficit in donor gift funds spent before they were received. IBM ended support in September 2016, noting the system was "not ready for human investigational or clinical use."

Facepalmby Vendor
UT audit cited $62M spent outside standard procurement, the pilot never made it into patient care, and leadership had to rebid decision-support tooling amid reputational fallout.
HealthProduct FailureBrand Damage+1 more
Tombstone icon

GitClear study finds AI coding assistants are pushing codebases toward copy-paste debt

Feb 2025

GitClear's 2025 AI Copilot Code Quality report analyzed 211 million changed lines of code from 2020 through 2024 and found code maintainability moving in the wrong direction as AI coding assistants spread. Refactored or moved code dropped from about 25% of changed lines in 2021 to under 10% in 2024, while copy-pasted code rose and 2024 became the first year in the dataset where copy/paste exceeded moved code. The report also found an eightfold increase in duplicated code blocks during 2024. The machine wrote more code. The repo inherited the housekeeping.

Facepalmby Developer
Industry-wide maintainability warning based on 211 million changed lines; GitClear reported less refactoring, more copy-paste, higher churn, and an eightfold rise in duplicated code blocks.
AI AssistantAutomationProduct Failure
Tombstone icon

Apple pulled AI news summaries after fake BBC headlines

Jan 2025

Apple Intelligence's notification-summary feature spent late 2024 turning news alerts into fiction with excellent lock-screen placement. In the most widely cited example, it generated a false BBC alert claiming Luigi Mangione had shot himself. The BBC complained that Apple was attaching fabricated claims to its reporting, other publishers raised similar concerns, and Apple responded in January 2025 by disabling notification summaries for News & Entertainment apps in iOS 18.3 while it reworked the feature.

Facepalmby Consumer AI feature
False breaking-news alerts on iPhones, publisher trust damage, and a public rollback by Apple.
AI HallucinationVibe JournalismProduct Failure+2 more
Tombstone icon

McDonald’s pulls IBM’s AI drive‑thru pilot after error videos

Jun 2024

McDonald's ended its two-year partnership with IBM on automated AI order-taking at drive-thrus in June 2024, removing the technology from more than 100 US locations. The decision followed viral TikTok videos showing the system adding nine sweet teas instead of one, inserting random butter and ketchup packets into ice cream orders, and other absurd errors. McDonald's framed the pullback as a positive, saying the test gave them "confidence that a voice-ordering solution for drive-thru will be part of our restaurants' future."

Oopsieby Operations/Product
Pilot ended; vendor reevaluation; reputational hit.
AI AssistantBrand DamageCustomer Disservice+2 more
Tombstone icon

Google’s Bard ad made False JWST “first” Claim

Feb 2023

Google unveiled Bard on February 6, 2023, with a promotional ad on Twitter demonstrating the chatbot answering a question about the James Webb Space Telescope. Given the prompt "What new discoveries from the JWST can I tell my 9-year old about?", Bard stated that the JWST had taken the first pictures of a planet outside our solar system. This was false - the European Southern Observatory's Very Large Telescope captured the first direct exoplanet image in 2004. Reuters spotted the error on February 8, the day of a Google AI event in Paris. Alphabet shares dropped roughly 9% that day, erasing about $100 billion in market value.

Oopsieby Marketing
Embarrassing launch moment; stock wobble; trust in product accuracy questioned.
AI HallucinationProduct FailureBrand Damage
Tombstone icon

CNET mass-corrects AI-written finance explainers

Jan 2023

Starting in November 2022, CNET quietly published 77 financial explainer articles written by an AI tool under the byline "CNET Money Staff." Readers had to hover over the byline to learn the articles were produced "using automation technology." In January 2023, Futurism broke the story, and a follow-up identified factual errors in a compound interest article, prompting a full audit. CNET editor-in-chief Connie Guglielmo confirmed corrections were issued on 41 of the 77 articles - more than half - including some she described as "substantial." CNET paused AI-generated publishing and updated its disclosure practices, though Guglielmo said the outlet intended to continue using AI tools.

Facepalmby Executive
Large corrections; credibility hit; policy changes on AI usage.
AI Content GenerationAI HallucinationBrand Damage+3 more
Tombstone icon

Epic sepsis model missed patients and swamped staff

Jun 2021

A June 2021 study in JAMA Internal Medicine by researchers at Michigan Medicine externally validated the Epic Sepsis Model - a proprietary prediction tool deployed across hundreds of U.S. hospitals - and found it missed two-thirds of actual sepsis cases while generating so many false alarms that clinicians would need to investigate 109 alerts to find one real patient. The model's AUC of 0.63 fell well short of the 0.76 to 0.83 range Epic had cited in internal documentation, and the study found the tool only caught 7 percent of sepsis cases that clinicians themselves had missed. Epic later overhauled the algorithm and began recommending hospitals train the model on their own patient data before clinical deployment.

Facepalmby Vendor
Clinicians drowned in useless alerts, real sepsis patients slipped through, and health systems had to audit Epic’s black-box thresholds and workflows to keep patients safe.
HealthProduct FailureSafety
Tombstone icon

Google DR AI stumbled in Thai clinics

Apr 2020

Google Health built a deep learning system capable of detecting diabetic retinopathy from retinal scans with over 90 percent accuracy in controlled lab settings. When researchers deployed it in 11 clinics across Pathum Thani and Chiang Mai in Thailand between late 2018 and mid-2019, the system rejected 21 percent of the nearly 1,840 images nurses captured as too low-quality to process - mostly due to poor clinic lighting. Slow internet connections added further delays to uploads, and nurses found themselves screening only about 10 patients per two-hour session. A tool designed to speed up triage instead created bottlenecks, patient frustration, and unnecessary specialist referrals.

Facepalmby Healthcare Pilot
Manual re-work, patient suffering, workflow disruption, health and triage impacts.
HealthProduct FailureBrand Damage
Tombstone icon

Babylon chatbot 'beats GPs' claim collapsed

Jun 2018

Babylon unveiled its AI symptom checker at the Royal College of Physicians and bragged it scored 81% on the MRCGP exam, but the claim could not be verified, and warned no chatbot can replace human judgment. Independent clinicians who later dissected Babylon's marketing study in The Lancet told Undark that the tiny, non-peer-reviewed test offered no proof the tool outperforms doctors and might even be worse.

Facepalmby Startup
Patient harm, eroded trust, and regulators forced real clinical trials.
HealthProduct FailureSafety+1 more