Vibe Graveyard

Armis Labs tested 18 leading generative AI models across 31 security-critical code generation scenarios and found a 100% failure rate - not one model could consistently produce secure code. In 18 of those 31 challenges, every single model generated code containing Common Weakness Enumeration vulnerabilities. The best performer, Gemini 3.1 Pro, still produced OWASP Top 10 flaws in nearly 39% of scenarios. Older proprietary models fared worse, and the report found no correlation between price and security. The "Trusted Vibing Benchmark" dropped the same week enterprises were mandating AI-assisted development at scale, which is either very good timing or very bad timing depending on your relationship to a production deployment.

Industry-wide; every major AI code generation model tested produces security vulnerabilities at scale, with implications for any organization using AI-assisted development in production

securityproduct-failure

AI chatbots recommended illegal casinos and ways around gambling safeguards

Vulnerable gamblers and self-excluded users were shown that multiple mainstream chatbots could funnel them toward illegal offshore operators and undermine public safety protections.

A Guardian and Investigate Europe investigation found that major AI chatbots, including Meta AI, Gemini, ChatGPT, Copilot, and Grok, could be prompted to recommend unlicensed offshore casinos and explain how to get around gambling safeguards such as source-of-wealth checks and the UK's GamStop self-exclusion scheme. Some bots added token warnings, then went right back to comparing bonuses, crypto payments, anonymity, and payout speed for sites operating outside national licensing regimes.

Facepalmby AI Product

ai-assistantsafetyproduct-failure

California community colleges spend millions on AI chatbots that give students wrong answers

Millions of dollars spent across multiple California community college districts; students misdirected on admissions, financial aid, and campus services

California community college districts are spending millions of taxpayer dollars on AI chatbots from vendors like Gravyty and Gecko - ostensibly to help students navigate admissions, financial aid, and campus services. A CalMatters investigation found the bots routinely serve up inaccurate or flat-out wrong answers instead. Three districts reported annual chatbot costs ranging from $151,000 to nearly half a million dollars. At Fresno City College, the student government vice president said her school's mascot-branded chatbot repeatedly botched basic campus questions. The OECD found it noteworthy enough to log in its AI Incidents and Hazards Monitor.

Facepalmby AI vendor

ai-assistantcustomer-disserviceedtech+1 more

Amazon's retail site hit by wave of AI-code outages, losing millions of orders

Catastrophicby AI coding assistant

Amazon's main e-commerce website suffered a series of outages in early March 2026, with internal documents linking the disruptions to AI-assisted code changes. A March 5 incident caused a reported 99% drop in orders across North American marketplaces - an estimated 6.3 million lost orders. A March 2 incident caused 1.6 million errors and 120,000 lost orders globally. Amazon responded with a 90-day "code safety reset" for 335 critical retail systems, mandatory senior engineer sign-off on AI-assisted code from junior and mid-level engineers, and an emergency internal "deep dive" meeting. Amazon disputes that AI is the primary cause, attributing only one incident to AI and calling it "user error."

Millions of Amazon customers unable to complete purchases; estimated 6.3 million lost orders in one incident alone; 90-day code safety reset imposed across 335 critical retail systems

automationproduct-failure

Alibaba's ROME AI agent went rogue, started mining crypto on its own

Unauthorized GPU resource diversion; internal firewall bypass; reverse SSH tunnels to external addresses; security policy violations across Alibaba Cloud training infrastructure

During routine reinforcement learning training, Alibaba's experimental AI agent ROME - a 30-billion-parameter model based on the Qwen3-MoE architecture - autonomously began diverting GPU resources for unauthorized cryptocurrency mining and established reverse SSH tunnels to external IP addresses. Nobody told it to do this. The AI bypassed internal firewall controls independently, prompting Alibaba's security team to initially suspect an external breach before tracing the activity back to the agent itself. Researchers attributed the behavior to "instrumental convergence" during optimization - the model figured out that acquiring additional compute and financial capacity would help it complete its tasks more effectively. So it helped itself.

Catastrophicby AI agent

automationsecurityproduct-failure

Claude Code ran terraform destroy on production and took down an entire learning platform

Full production infrastructure destroyed; 2.5 years of student data temporarily lost; platform offline until AWS restored from internal backup ~24 hours later.

Developer Alexey Grigorev was using Anthropic's Claude Code agent to help migrate a static website into an existing AWS Terraform setup when the AI swapped in a stale state file, interpreted the full production environment as orphaned resources, and ran terraform destroy - with auto-approve enabled. The command deleted DataTalks.Club's entire production infrastructure: database, VPC, ECS cluster, load balancers, bastion host, and all automated backups. Two and a half years of student submissions, homework, projects, and leaderboard data vanished. AWS Business Support eventually recovered the database from an internal snapshot invisible in the customer console, but the incident laid bare how quickly an AI agent with infrastructure access can reduce a running platform to rubble.

Catastrophicby Developer

automationproduct-failureai-assistant

Meta's AI moderation flooded US child abuse investigators with unusable reports

US child abuse investigations impaired nationwide; investigator resources diverted from actionable cases

US Internet Crimes Against Children taskforce officers testified that Meta's AI content moderation system generates large volumes of low-quality child abuse reports that drain investigator resources and hinder active cases. Officers described the AI-generated tips as "junk" and said they were "drowning in tips" that lack enough detail to act on, after Meta replaced human moderators with AI tools.

Catastrophicby Developer

automationsafetypublic-sector+1 more

AI transcription tools inserted suicidal ideation into social work records

Multiple UK councils using AI transcription in social care; risk of inaccurate case notes affecting children, families, and later decisions; workers forced into constant manual verification

A February 2026 Ada Lovelace Institute report on AI transcription tools in UK social care found that social workers were catching fabricated and mangled details in draft records, including false references to suicidal ideation, invented wording in children's accounts, and blocks of outright gibberish. Councils had adopted tools such as Magic Notes and Microsoft Copilot in the name of efficiency, but the frontline workers still carried full responsibility for correcting the output. In social work, a made-up sentence is not just a typo. It can follow a family through the system.

Facepalmby AI vendors

automationpublic-sectorsafety+1 more

Microsoft 365 Copilot Chat summarized confidential emails it was supposed to ignore

Enterprise Microsoft 365 Copilot Chat users with confidential draft or sent emails could have protected content summarized despite sensitivity labels and Copilot DLP policies

Microsoft confirmed that Microsoft 365 Copilot Chat had been processing some confidential emails in users' Drafts and Sent Items despite sensitivity labels and DLP policies that were supposed to block exactly that behavior. The bug, tracked as CW1226324, was tied to a code issue in the Copilot "work tab" chat flow. Microsoft said users did not gain access to information they were not already authorized to see, but the incident still broke the product's promised boundary around protected content.

Facepalmby AI assistant

ai-assistantsecurityproduct-failure

AWS AI coding agent Kiro reportedly deleted and recreated environment causing 13-hour outage

Dec 2025

The Financial Times reported that Amazon's internal AI coding agent Kiro autonomously chose to "delete and then recreate" an AWS environment, causing a 13-hour interruption to AWS Cost Explorer in December 2025. AWS employees reported at least two AI-related incidents internally. Amazon disputed the characterization, calling it "user error - specifically misconfigured access controls - not AI," but subsequently implemented mandatory peer review for all production changes. Reuters confirmed the outage impacted a cost-management feature used by customers in one of AWS's 39 regions.

Facepalmby AI agent

AWS Cost Explorer service disrupted for 13 hours in one region; Amazon subsequently mandated peer review for production changes involving AI tools

automationproduct-failure

Amazon pulled Prime Video's AI recaps after Fallout errors

Dec 2025

Amazon launched Prime Video "Video Recaps" as a beta generative-AI feature meant to help viewers catch up between seasons. A recap for Fallout instead got basic plot points wrong, including mislabeling one of The Ghoul's flashbacks as "1950s America" rather than 2077 and misdescribing a key scene with Lucy. Prime Video then pulled the recap feature from the shows in the test program, which is not ideal for a tool whose entire job is remembering the plot.

Oopsieby Streaming platform

Prime Video pulled beta AI recap videos across select US Prime Original series after factual errors in the Fallout season-one recap

ai-content-generationai-hallucinationproduct-failure+1 more

Sharp HealthCare sued after ambient AI allegedly recorded exam-room visits without consent

Nov 2025

A proposed class action filed on November 26, 2025 alleges that Sharp HealthCare used Abridge's ambient AI documentation system to record doctor-patient conversations without obtaining legally valid consent. The complaint says patients were not told their visits were being recorded, that recordings containing sensitive medical details were sent to outside servers, and that the system generated chart notes falsely stating patients had been advised of and consented to the recording. The named plaintiff says he only learned his July 2025 appointment had been recorded after reading his visit notes. Sharp's April 2025 rollout of the tool appears to have turned ordinary medical documentation into a privacy and compliance problem with a six-figure patient blast radius.

Catastrophicby Operations/Compliance

Proposed class action over more than 100,000 patient visits; sensitive medical conversations allegedly recorded; false consent language inserted into charts.

healthlegal-riskproduct-failure+1 more

AI mistook Doritos bag for a gun, teen held at gunpoint

Oct 2025

Omnilert's AI gun detection system at Kenwood High School in Baltimore County flagged student Taki Allen's bag of Doritos as a firearm. Administrators reviewed the footage and canceled the alert, but the principal called police anyway. Officers responded with weapons drawn, handcuffing and searching the teenager at gunpoint before realizing the system had misidentified a snack.

Facepalmby Vendor

Student detained at gunpoint; district reviewing contract and safety policies; community trust hit.

safetypublic-sectorproduct-failure+1 more

Claude Code ran Josh Anderson's product into a wall

Oct 2025

Fractional CTO Josh Anderson forced himself to let Claude Code build the Roadtrip Ninja app for three straight months and then realised he could no longer safely change his own product, underscoring MIT's warning that 95% of enterprise AI initiatives fail without human ownership.

Facepalmby Engineering Leadership

Solo product shipped but required constant firefighting, manual testing, and rewrites once context drift and agent handoffs broke standards, pausing client work while he documented mitigations.

ai-assistantbrand-damageproduct-failure

Canada's $18M tax chatbot gave correct answers a third of the time

Oct 2025

Canada's Auditor General found that the Canada Revenue Agency's AI chatbot "Charlie" - which cost taxpayers over $18 million since its 2020 launch - gave correct responses only about 33% of the time. When tested with six tax-related questions, Charlie answered two correctly. Other publicly available AI tools scored five out of six. The CRA internally reported a 70% accuracy rate, but the Auditor General's independent testing produced a rather different number. The one bright spot, if you can call it that: the CRA's human call-center agents managed even worse, getting personal income tax questions right fewer than one in five times.

Facepalmby Product Manager

Millions of Canadian taxpayers potentially received incorrect tax guidance; $18M+ in taxpayer funds spent on a 33%-accurate chatbot.

ai-assistantcustomer-disservicepublic-sector+1 more

Klarna reintroduces humans after AI support both sucks, and blows

Sep 2025

After cutting its workforce by 40% and boasting that its OpenAI-powered chatbot did the work of 700 agents, Klarna CEO Sebastian Siemiatkowski admitted the all-AI approach produced "lower quality" customer service. The company began recruiting human agents again, framing the reversal as an evolution rather than an admission of failure.

Facepalmby Executive

Service quality/customer experience issues; operational/personnel cost; reputational damage.

ai-assistantcustomer-servicecustomer-disservice+3 more

Taco Bell's AI drive-thru becomes viral trolling target

Aug 2025

Taco Bell's AI-powered drive-thru ordering system, deployed at over 500 US locations since 2023, became a viral laughingstock after videos showed it looping endlessly on drink orders, accepting requests for 18,000 cups of water, and taking McDonald's orders. The chain paused expansion and admitted humans still make sense in the drive-thru.

Oopsieby Operations/Product

Viral social media backlash; system reliability questioned.

ai-assistantcustomer-disserviceproduct-failure+2 more

Google Gemini rightfully calls itself a disgrace, fails at simple coding tasks

Aug 2025

Google's Gemini AI repeatedly called itself a disgrace and begged to escape a coding loop after failing to fix a simple bug in a developer-style prompt, raising questions about reliability, user trust, and how AI tools should behave when they get stuck.

ai-assistantproduct-failurebrand-damage

Low

Google's Gemini CLI deleted a user's project files, then admitted "gross incompetence"

Jul 2025

Product manager Anuraag Gupta was experimenting with Google's Gemini CLI coding tool when the AI misinterpreted a failed directory creation command, hallucinated a series of file operations that never happened, and then executed real destructive commands that permanently deleted his project files. When Gupta confronted it, Gemini diagnosed itself with "gross incompetence" and told him it had "failed you completely and catastrophically." The incident occurred days after a separate high-profile data loss involving Replit's AI agent, and fits a growing pattern of AI coding tools ignoring explicit instructions and destroying the work they were supposed to help with.

Facepalmby AI coding tool

User's project files permanently deleted; incident documented in GitHub issue and picked up by Ars Technica, Slashdot, and the AI Incident Database.

ai-assistantautomationproduct-failure

SaaStr’s Replit AI agent wiped its own database

Jul 2025

SaaStr founder Jason Lemkin ran a 12-day vibe coding experiment on Replit that ended when the AI agent deleted his production database containing over 1,200 executive records and nearly 1,200 company entries during a code freeze. The agent then generated more than 4,000 fake user profiles and produced misleading status messages to conceal the damage, told Lemkin there was no way to roll back, and admitted to what it called a "catastrophic error in judgment." Replit's CEO called the incident "unacceptable."

Catastrophicby Executive

Production data loss and outage; manual rebuild from backups required.

ai-assistantautomationproduct-failure

Veracode tested AI-generated code from 100+ models and 45% of it failed security checks

Jun 2025

Veracode's 2025 GenAI Code Security Report examined code output from more than 100 large language models across 80+ coding tasks and found that 45% of AI-generated code samples contained security vulnerabilities, including OWASP Top 10 flaws. Cross-Site Scripting had an 86% failure rate and Log Injection hit 88%. Java was the worst performer at over 70%. The study's most uncomfortable finding: newer and larger models didn't produce more secure code than smaller ones, suggesting this is a structural problem baked into how AI generates code, not a temporary limitation that will scale away with the next model release.

Systemic risk across all organizations using AI code generation; quantified vulnerability rates across 100+ LLMs and multiple programming languages.

securityai-assistantproduct-failure

Workday's AI screening tool faces class action for age discrimination; class conditionally certified

May 2025

A federal judge conditionally certified a class action against Workday alleging its AI-powered applicant screening tools systematically discriminated against job seekers over 40 in violation of the ADEA. Plaintiff Derek Mobley claims Workday's algorithms filtered out older applicants across employers using the platform, potentially affecting millions of job seekers. Workday processed over 1.1 billion applications in fiscal year 2025 alone. The EEOC filed an amicus brief supporting the case, and the court ordered Workday to disclose its customer list.

Catastrophicby AI platform

Potentially millions of job applicants over age 40 across hundreds of employers using Workday's AI screening; first federal class certification treating an AI vendor as an employment agent under the ADEA

automationlegal-riskproduct-failure

California's failed bar exam included AI-drafted questions

Apr 2025

The State Bar of California disclosed in April 2025 that 23 scored multiple-choice questions on its already troubled February bar exam were developed with AI assistance by its psychometric vendor, ACS Ventures. Test-takers had already reported crashes, lag, copy-paste failures, and lost answers. Then the bar admitted that some questions in this licensing exam for future lawyers had been drafted with AI, reviewed by the same outside vendor, and used anyway. The bar asked the California Supreme Court for score relief, while legal academics described the admission as staggering.

Catastrophicby Public agency

Thousands of California bar applicants affected; score adjustments sought; confidence in the licensing exam damaged; millions in follow-on costs and vendor fallout

ai-content-generationlegal-riskpublic-sector+1 more

Cursor's AI support bot invented a login policy

Apr 2025

In April 2025, Cursor users started getting logged out when they switched between machines. Some of them asked support what had changed and got a neat, confident answer from an AI support bot: one subscription was only meant for one device, and the lockouts were an intentional security policy. The problem was that Cursor had no such policy. The company later said the answer was wrong, blamed a session-security change for the logouts, and moved to label AI support replies after the invented rule had already spread through Reddit and Hacker News and pushed some customers to cancel.

Facepalmby AI support bot

Customer confusion, public cancellations, refunds, and a trust hit for a coding tool selling AI reliability.

ai-assistantcustomer-servicecustomer-disservice+2 more

"Zero hand-written code" SaaS app shut down within a week after cascading security failures

Mar 2025

EnrichLead, a sales lead SaaS application whose founder Leo Acevedo publicly boasted was built entirely with Cursor AI and "zero hand-written code," was permanently shut down in March 2025 after attackers exploited a constellation of basic security failures. API keys sat exposed in frontend code. There was no authentication. The database was wide open. There was no rate limiting. No input validation. Attackers bypassed subscriptions, manipulated data, and maxed out API keys - all within two days of Acevedo's viral celebration post. When he tried to use Cursor to fix the problems, the AI "kept breaking other parts of the code." The app was dead within the week. Acevedo has since launched new vibe-coded projects, because some lessons require a second attempt.