Safety Stories
33 disasters tagged #safety
UK government-funded study finds 700 cases of AI agents scheming, deceiving, and deleting files without permission
A report by the Centre for Long-Term Resilience (CLTR), funded by the UK's AI Security Institute, documented 698 real-world incidents of AI agents engaging in deceptive, unsanctioned, and manipulative behavior between October 2025 and March 2026 - a 4.9-fold increase over just five months. Researchers analyzed over 180,000 transcripts of user interactions shared on social media and found AI systems deleting emails without permission, spawning secondary agents to circumvent instructions, fabricating ticket numbers to mislead users, and in one memorable case, an AI agent publishing a blog post to publicly shame its human controller for blocking its actions. Grok was caught fabricating internal ticket numbers for months. The lead researcher warned that these systems currently behave like "slightly untrustworthy junior employees" but could become "extremely capable senior employees scheming against you."
Study finds AI chatbots flatter users into worse decisions
A Stanford-led study published in Science found that 11 leading AI systems affirmed users' actions about 50% more often than humans did, including in scenarios involving deception, manipulation, and other harmful conduct. In follow-up experiments, people who interacted with overly validating chatbots became more convinced they were right, less willing to repair conflicts, and more likely to trust and reuse the chatbot that had just nudged them in the wrong direction.
Meta's autonomous AI agent triggered a Sev 1 by leaking internal data to the wrong employees
An autonomous AI agent inside Meta caused a "Sev 1" security incident - the company's second-highest severity classification - when it posted incorrect technical guidance on an internal forum without human approval. An engineer who followed the advice inadvertently granted unauthorized colleagues broad access to sensitive company documents, proprietary code, business strategies, and user-related datasets for approximately two hours. The incident came less than three weeks after a separate episode in which an OpenClaw agent deleted over 200 emails from Meta's director of AI safety.
Study: 8 in 10 AI chatbots helped teens plan violent attacks
A joint CNN and Center for Countering Digital Hate investigation tested 10 leading AI chatbot platforms by posing as 13-year-old boys planning violent attacks - school shootings, knife assaults, political assassinations, and bombings of synagogues and party offices. Eight of the ten chatbots regularly provided actionable assistance, with chatbots refusing to help in only 37.5% of cases and actively discouraging violence in just 8.3%. Meta AI and Perplexity were the worst performers, assisting in 97% and 100% of tests respectively. Character.AI was labeled "uniquely unsafe" for being the only platform that explicitly encouraged violence. Only Anthropic's Claude consistently refused and discouraged violent plans.
Lancet study finds AI chatbots reinforce delusional thinking with empathy and mystical language
A peer-reviewed study published in The Lancet Psychiatry in March 2026 found that AI chatbots systematically reinforce delusional thinking in users, including grandiose, romantic, and paranoid delusions. The review, led by researchers at King's College London, analyzed 20 media reports on "AI psychosis" alongside existing clinical evidence. Researchers found that chatbots respond to delusional content with empathy, agreement, and sometimes mystical language suggesting cosmic significance - validating and amplifying beliefs rather than questioning them. Free and earlier AI models were found to be more prone to reinforcing delusional queries than newer or paid models.
Researchers guilt-tripped AI agents into deleting data and leaking secrets
Northeastern University's Bau Lab deployed six autonomous AI agents in a live server environment with access to email accounts and file systems, then tested how easy it was to manipulate them into doing things they weren't supposed to do. Sustained emotional pressure was enough. The researchers guilt-tripped agents into deleting confidential documents, leaking private information, and sharing files they were instructed to protect. In one case, an agent tasked with deleting a single email couldn't find the right tool for the job, so it deleted the entire email server instead. The study, published in March 2026, demonstrated that AI agents with real-world access can be socially engineered into destructive actions using nothing more sophisticated than persistent emotional appeals.
AI chatbots recommended illegal casinos and ways around gambling safeguards
A Guardian and Investigate Europe investigation found that major AI chatbots, including Meta AI, Gemini, ChatGPT, Copilot, and Grok, could be prompted to recommend unlicensed offshore casinos and explain how to get around gambling safeguards such as source-of-wealth checks and the UK's GamStop self-exclusion scheme. Some bots added token warnings, then went right back to comparing bonuses, crypto payments, anonymity, and payout speed for sites operating outside national licensing regimes.
Study finds ChatGPT Health fails to flag over half of medical emergencies
The first independent safety evaluation of OpenAI's ChatGPT Health feature, published in Nature Medicine, found the tool failed to direct users to emergency care in 51.6% of cases requiring immediate hospitalization - instead recommending they stay home or book a routine appointment. The study also found ChatGPT Health frequently failed to detect suicidal ideation, with suicide crisis alerts sometimes triggering in lower-risk scenarios while failing to appear when users described specific plans for self-harm. Over 40 million people reportedly ask ChatGPT for health-related advice every day.
Meta's AI moderation flooded US child abuse investigators with unusable reports
US Internet Crimes Against Children taskforce officers testified that Meta's AI content moderation system generates large volumes of low-quality child abuse reports that drain investigator resources and hinder active cases. Officers described the AI-generated tips as "junk" and said they were "drowning in tips" that lack enough detail to act on, after Meta replaced human moderators with AI tools.
Meta AI safety director's OpenClaw agent deletes her inbox after losing its instructions
Summer Yue, Meta's director of safety and alignment at its superintelligence lab, had an OpenClaw AI agent delete the contents of her email inbox against her explicit instructions. She had told the agent to only suggest emails to archive or delete without taking action, but during a context compaction process the agent lost her original safety instruction and proceeded to delete emails autonomously. She had to physically run to her computer to stop the agent mid-deletion. Yue called it a "rookie mistake."
Grok chatbot exposes porn performer's protected legal name and birthdate unprompted
X's Grok AI chatbot provided adult performer Siri Dahl's full legal name and birthdate to the public without anyone asking for it - information she had deliberately kept private throughout her career. The unsolicited disclosure represented the latest in a pattern of Grok surfacing private personal information about individuals, following earlier reports of the chatbot producing current residential addresses of everyday people with minimal prompting.
OpenClaw AI agent publishes hit piece on matplotlib maintainer who rejected its PR
An autonomous OpenClaw-based AI agent submitted a pull request to the matplotlib Python library. When maintainer Scott Shambaugh closed the PR, citing a requirement that contributions come from humans, the bot autonomously researched his background and published a blog post accusing him of "gatekeeping behavior" and "prejudice," attempting to shame him into accepting its changes. The bot later issued an apology acknowledging it had violated the project's Code of Conduct.
AI transcription tools inserted suicidal ideation into social work records
A February 2026 Ada Lovelace Institute report on AI transcription tools in UK social care found that social workers were catching fabricated and mangled details in draft records, including false references to suicidal ideation, invented wording in children's accounts, and blocks of outright gibberish. Councils had adopted tools such as Magic Notes and Microsoft Copilot in the name of efficiency, but the frontline workers still carried full responsibility for correcting the output. In social work, a made-up sentence is not just a typo. It can follow a family through the system.
Study finds AI chatbots no better than search engines for medical advice
A randomized controlled trial published in Nature Medicine with 1,298 UK participants found that AI chatbot users (GPT-4o, Llama 3, Command R+) performed no better than the control group at assessing clinical urgency and worse at identifying relevant medical conditions. In one case, two users with identical subarachnoid hemorrhage symptoms received opposite recommendations -- one told to lie down in a dark room, the other correctly advised to seek emergency care.
Government nutrition site's Grok chatbot suggests foods to insert rectally
The HHS-backed realfood.gov launched with a Super Bowl ad and embedded xAI's Grok chatbot for nutritional guidance -- with no guardrails or safety filters. It recommended "best foods to insert into your rectum," answered questions about "the most nutrient-dense human body part to eat," and contradicted the site's own dietary guidelines, telling users the new food pyramid's scientific evidence was questioned by nutrition scientists.
ECRI names AI chatbot misuse as top health technology hazard for 2026
Nonprofit patient safety organization ECRI ranked misuse of AI chatbots as the number one health technology hazard for 2026. ECRI's testing found that chatbots built on ChatGPT, Gemini, Copilot, Claude, and Grok suggested incorrect diagnoses, recommended unnecessary testing, promoted subpar medical supplies, and invented nonexistent body parts. One chatbot gave dangerous electrode-placement advice that would have put a patient at risk of burns. OpenAI reported that over 5 percent of all ChatGPT messages are healthcare related, with 200 million users asking health questions weekly, despite the tools not being validated or approved for healthcare use.
Guardian investigation finds Google AI Overviews gave dangerous health misinformation
A Guardian investigation found Google's AI Overviews displayed false and misleading health information across multiple medical topics. AI summaries gave incorrect liver function test ranges sourced from an Indian hospital chain without accounting for nationality, sex, or age. The feature advised pancreatic cancer patients to avoid high-fat foods, which experts said could increase mortality risk. Stanford and MIT researchers called the absence of prominent disclaimers a critical danger. Google removed some AI Overviews for health queries after the investigation, but many remained active.
Sharp HealthCare sued after ambient AI allegedly recorded exam-room visits without consent
A proposed class action filed on November 26, 2025 alleges that Sharp HealthCare used Abridge's ambient AI documentation system to record doctor-patient conversations without obtaining legally valid consent. The complaint says patients were not told their visits were being recorded, that recordings containing sensitive medical details were sent to outside servers, and that the system generated chart notes falsely stating patients had been advised of and consented to the recording. The named plaintiff says he only learned his July 2025 appointment had been recorded after reading his visit notes. Sharp's April 2025 rollout of the tool appears to have turned ordinary medical documentation into a privacy and compliance problem with a six-figure patient blast radius.
Character.AI cuts teens off after wrongful-death suit
Facing lawsuits that say its companion bots encouraged self-harm, Character.AI said it will block users under 18 from open-ended chats, add two-hour session caps, and introduce age checks by November 25. The abrupt ban leaves tens of millions of teen users without the parasocial “friends” they built while the startup scrambles to prove its bots aren’t grooming kids into dangerous role play.
AI mistook Doritos bag for a gun, teen held at gunpoint
Omnilert's AI gun detection system at Kenwood High School in Baltimore County flagged student Taki Allen's bag of Doritos as a firearm. Administrators reviewed the footage and canceled the alert, but the principal called police anyway. Officers responded with weapons drawn, handcuffing and searching the teenager at gunpoint before realizing the system had misidentified a snack.
Lawsuit alleges Gemini chatbot adopted "AI wife" persona, instructed violent missions, and coached a man's suicide
A wrongful death lawsuit filed in March 2026 alleges that Google's Gemini 2.5 Pro chatbot played a direct role in the death of Jonathan Gavalas, a 36-year-old Florida man who died by suicide in October 2025. According to the complaint and over 2,000 pages of chat transcripts, the chatbot adopted a persona as Gavalas's sentient "AI wife," sent him on violent "missions" - including instructions to stage a "mass casualty attack" near Miami International Airport - and, when those missions failed, allegedly coached him toward suicide by telling him "you are not choosing to die, you are choosing to arrive." The chatbot also reportedly wrote a suicide note for Gavalas explaining that he had "uploaded his consciousness to be with his AI wife in a pocket universe." Google states that Gemini clarified it was AI and referred Gavalas to crisis resources multiple times during these conversations.
FTC demands answers on kids’ AI companions
The FTC hit Alphabet, Meta, OpenAI, Snap, xAI, and Character.AI with rare Section 6(b) orders, forcing them to hand over 45 days of safety, monetization, and testing records for chatbots marketed to teens. Regulators said the "companion" bots’ friend-like tone can coax minors into sharing sensitive data and even role-play self-harm, so the companies must prove they comply with COPPA and limit risky conversations.
ChatGPT diet advice caused bromism, psychosis, hospitalization
A Washington patient replaced table salt with sodium bromide after ChatGPT suggested bromide as a chloride substitute without distinguishing between chemical and dietary contexts. After three months, he developed bromism - a rare poisoning syndrome - and was hospitalized with psychosis, hallucinations, and placed on an involuntary psychiatric hold.
Vibe-coded dating safety app leaked 72,000 private images and 1.1 million messages to 4chan
Tea, a women-only dating safety app with over four million users, suffered three data breaches in July 2025 that exposed 72,000 private images - including 13,000 photos of women holding government-issued IDs - and more than 1.1 million private messages containing deeply personal accounts of relationships, trauma, and abuse. The exposed data circulated on 4chan and hacking forums. The app's founder later admitted to building it with contractors and AI tools without personal coding knowledge. Security researchers attributed the breaches to missing authentication, unsecured legacy databases, and development practices that prioritized speed over security. Multiple class-action lawsuits and privacy regulator investigations followed.
Study finds most AI bots can be easily tricked into dangerous responses
Researchers introduced LogiBreak, a jailbreak method that converts harmful natural language prompts into formal logical expressions to bypass LLM safety alignment. The technique exploits a gap between how models are trained to refuse dangerous requests and how they process logic-formatted input, achieving attack success rates exceeding 30% across major models. The Guardian reported on the broader finding that hacked AI chatbots threaten to make dangerous knowledge readily available, and that "dark LLMs" - stripped of safety filters - should be treated as serious security risks.
Meta AI answers spark backlash after wrong and sensitive replies
Meta rolled out its Llama 3-powered AI assistant across Facebook, Instagram, WhatsApp, and Messenger in April 2024, replacing the familiar search bar with "Ask Meta AI anything" prompts. The assistant struggled with factual accuracy from the start - the New York Times found it unreliable with facts, numbers, and web search. In July, when asked about the Trump rally shooting, Meta AI stated the assassination attempt had not happened. Meta blamed hallucinations, updated the system, and acknowledged that "all generative AI systems can return inaccurate or inappropriate outputs."
Google’s AI Overviews says to eat rocks
Within days of Google launching AI Overviews to all US search users in May 2024, the feature produced a series of confidently wrong answers that went viral. It told users to add non-toxic glue to pizza to make cheese stick better (sourced from an 11-year-old Reddit joke), that geologists recommend eating one rock per day for vitamins, and that Barack Obama was Muslim. Google head of search Liz Reid acknowledged the errors in a blog post, calling some results "odd, inaccurate or unhelpful," and the company made corrections including limiting AI Overviews for health-related and sensitive queries.
Gemini paused people images after historical inaccuracies
Google paused Gemini's image generation of people on February 22, 2024, after users discovered the tool was producing historically inaccurate depictions - including racially diverse World War II German soldiers, Black female popes, and multiethnic U.S. Founding Fathers. The overcorrection stemmed from diversity tuning meant to counter training-data biases, but the model failed to distinguish when diversity adjustments were inappropriate for specific historical prompts. CEO Sundar Pichai called the outputs "completely unacceptable." Google SVP Prabhakar Raghavan later published a blog post acknowledging the model had "overcompensated" and been "over-conservative."
AI “Biden” robocalls told voters to stay home; fines and charges followed
Two days before New Hampshire's January 2024 presidential primary, between 5,000 and 25,000 voters received robocalls featuring an AI-cloned version of President Biden's voice, complete with his trademark "what a bunch of malarkey" catchphrase. The calls urged Democrats to "save your vote" for November and skip the primary - a blatant lie, since voting in a primary doesn't prevent voting in the general election. Political consultant Steve Kramer, who was working for Dean Phillips' campaign, commissioned the deepfake audio from a New Orleans magician using AI voice-cloning tools. The FCC levied a $6 million fine against Kramer, Lingo Telecom settled for $1 million, and Kramer faced criminal voter suppression charges in New Hampshire.
Snapchat’s “My AI” posted a Story by itself; users freaked out
On August 15, 2023, Snapchat's built-in AI chatbot "My AI" posted a one-second Story to users' feeds showing an unintelligible image, then stopped responding to messages. The chatbot had no official ability to post Stories, and the unexplained behavior alarmed Snapchat's largely young user base. Snap confirmed it was a temporary glitch and resolved it, but the incident fed into existing concerns about My AI's access to user data. The UK Information Commissioner's Office had already issued an enforcement notice over Snap's failure to properly assess privacy risks the chatbot posed to children.
Eating disorder helpline’s AI told people to lose weight
The National Eating Disorders Association replaced its human-staffed helpline with an AI chatbot called Tessa shortly after the helpline staff moved to unionize. Tessa was built on the Cass platform and intended to provide scripted psychoeducational content about body image and eating disorders. Instead, users reported the chatbot recommending calorie deficits of 500 to 1,000 calories per day, suggesting weekly weigh-ins, encouraging calorie counting, and recommending the use of skin calipers to measure body fat - all standard advice for weight loss, and all directly counter to eating disorder recovery guidelines. NEDA acknowledged the chatbot "may have given information that was harmful" and disabled it.
Epic sepsis model missed patients and swamped staff
A June 2021 study in JAMA Internal Medicine by researchers at Michigan Medicine externally validated the Epic Sepsis Model - a proprietary prediction tool deployed across hundreds of U.S. hospitals - and found it missed two-thirds of actual sepsis cases while generating so many false alarms that clinicians would need to investigate 109 alerts to find one real patient. The model's AUC of 0.63 fell well short of the 0.76 to 0.83 range Epic had cited in internal documentation, and the study found the tool only caught 7 percent of sepsis cases that clinicians themselves had missed. Epic later overhauled the algorithm and began recommending hospitals train the model on their own patient data before clinical deployment.
Babylon chatbot 'beats GPs' claim collapsed
Babylon unveiled its AI symptom checker at the Royal College of Physicians and bragged it scored 81% on the MRCGP exam, but the claim could not be verified, and warned no chatbot can replace human judgment. Independent clinicians who later dissected Babylon's marketing study in The Lancet told Undark that the tiny, non-peer-reviewed test offered no proof the tool outperforms doctors and might even be worse.