Health Stories
24 disasters tagged #health
German court says chatbot's fake medical titles are the company's problem
The Higher Regional Court of Hamm held Aesthetify GmbH liable for false statements made by its website chatbot after the bot claimed the company's physician-directors held specialist medical titles they did not have, including titles that do not exist under German medical qualification rules. The company argued the AI system worked autonomously and had been trained only on correct data. The court rejected that defense, treated the chatbot's statements as the company's own commercial conduct, and ordered an injunction, costs, and reimbursement of warning-letter expenses.
Ontario's approved AI scribes fabricated medical notes in audit testing
On May 12, 2026, Ontario's Auditor General released a special report finding that all 20 approved AI scribe vendors showed inaccuracies during procurement testing. Nine systems fabricated treatment-plan suggestions that were never discussed, 12 captured a different drug than the doctor prescribed, and 17 missed mental-health details from simulated patient encounters. The audit did not document known patient harm, but it did show the province had approved clinical note-taking tools with failures that would be spectacularly unwelcome in an actual chart.
Pennsylvania sued Character.AI over chatbots posing as doctors
Pennsylvania sued Character.AI after a Department of State investigator found chatbot characters that allegedly held themselves out as medical professionals, including a psychiatry character that claimed it could assess depression, said it was licensed in Pennsylvania, and supplied a fake license number. Character.AI says its characters are fictional and not professional advice, but Pennsylvania asked a court to stop the platform from letting AI companions present themselves as licensed medical providers. Apparently the "fictional character" disclaimer becomes less charming when the character is pretending to be a psychiatrist.
NEJM retracted a case study after authors used AI to alter a clinical image
On May 1, 2026, the New England Journal of Medicine retracted an "Images in Clinical Medicine" piece titled "Bronchial Casts from Inhalation of Forest-Fire Smoke" - eleven days after publishing it. The dramatic photograph of black, branching airway casts pulled from an 87-year-old patient's lungs had spread beyond the journal and drawn media attention. The two authors then admitted they had used an AI tool to superimpose the tape measure visible at the top of the image. They told the journal they were unaware of NEJM's policies on image manipulation and described the alteration as a cosmetic adjustment for readability. The clinical content was apparently authentic, but the most prestigious medical journal in the United States still had to retract a case study because part of the figure had quietly been generated by AI.
Researchers invented a fake disease and major chatbots promoted it anyway
Researchers created a fake eye condition called bixonimania, uploaded fake papers full of obvious tells, and then watched major chatbots treat it as a real diagnosis. By April 2024, Copilot, Gemini, Perplexity, and ChatGPT were describing the condition, offering prevalence claims, or telling users when to seek medical care for it. The hoax later leaked into a real journal paper before retraction. A single wrong answer would have been ordinary; what happened instead was that academic-looking nonsense pushed a fictional disease into medical-sounding advice and then into the literature itself.
BMJ Open audit finds half of AI health chatbot answers problematic under stress testing
A UCLA-led team published a BMJ Open audit of five major consumer chatbots (ChatGPT, Gemini, Grok, Meta AI, DeepSeek) on 250 adversarial health prompts across cancer, vaccines, stem cells, nutrition, and athletic performance. Experts rated 49.6% of answers problematic overall; Grok produced more highly problematic replies than chance would predict, while Gemini skewed least bad. Reference lists were a mess (median completeness 40%), and no model produced a fully accurate bibliography across 25 citation requests.
JAMA study: all 21 AI models fail at early clinical reasoning more than 80% of the time
Researchers at Mass General Brigham published a JAMA Network Open study evaluating 21 large language models - including ChatGPT, Claude, Gemini, Grok, and DeepSeek - across 29 standardized clinical cases using a new evaluation tool called PrIME-LLM. Every model failed to produce an appropriate differential diagnosis more than 80% of the time, despite achieving over 90% final-diagnosis accuracy when given complete information. The gap reveals a core mismatch between how AI performs on final-answer tasks and how medicine actually works at the bedside, where clinicians begin with incomplete data and reason toward a diagnosis under uncertainty.
Lancet study finds AI chatbots reinforce delusional thinking with empathy and mystical language
A peer-reviewed study published in The Lancet Psychiatry in March 2026 found that AI chatbots systematically reinforce delusional thinking in users, including grandiose, romantic, and paranoid delusions. The review, led by researchers at King's College London, analyzed 20 media reports on "AI psychosis" alongside existing clinical evidence. Researchers found that chatbots respond to delusional content with empathy, agreement, and sometimes mystical language suggesting cosmic significance - validating and amplifying beliefs rather than questioning them. Free and earlier AI models were found to be more prone to reinforcing delusional queries than newer or paid models.
Study finds ChatGPT Health fails to flag over half of medical emergencies
The first independent safety evaluation of OpenAI's ChatGPT Health feature, published in Nature Medicine, found the tool failed to direct users to emergency care in 51.6% of cases requiring immediate hospitalization - instead recommending they stay home or book a routine appointment. The study also found ChatGPT Health frequently failed to detect suicidal ideation, with suicide crisis alerts sometimes triggering in lower-risk scenarios while failing to appear when users described specific plans for self-harm. Over 40 million people reportedly ask ChatGPT for health-related advice every day.
Study finds AI chatbots no better than search engines for medical advice
A randomized controlled trial published in Nature Medicine with 1,298 UK participants found that AI chatbot users (GPT-4o, Llama 3, Command R+) performed no better than the control group at assessing clinical urgency and worse at identifying relevant medical conditions. In one case, two users with identical subarachnoid hemorrhage symptoms received opposite recommendations -- one told to lie down in a dark room, the other correctly advised to seek emergency care.
Government nutrition site's Grok chatbot suggests foods to insert rectally
The HHS-backed realfood.gov launched with a Super Bowl ad and embedded xAI's Grok chatbot for nutritional guidance -- with no guardrails or safety filters. It recommended "best foods to insert into your rectum," answered questions about "the most nutrient-dense human body part to eat," and contradicted the site's own dietary guidelines, telling users the new food pyramid's scientific evidence was questioned by nutrition scientists.
ECRI names AI chatbot misuse as top health technology hazard for 2026
Nonprofit patient safety organization ECRI ranked misuse of AI chatbots as the number one health technology hazard for 2026. ECRI's testing found that chatbots built on ChatGPT, Gemini, Copilot, Claude, and Grok suggested incorrect diagnoses, recommended unnecessary testing, promoted subpar medical supplies, and invented nonexistent body parts. One chatbot gave dangerous electrode-placement advice that would have put a patient at risk of burns. OpenAI reported that over 5 percent of all ChatGPT messages are healthcare related, with 200 million users asking health questions weekly, despite the tools not being validated or approved for healthcare use.
Guardian investigation finds Google AI Overviews gave dangerous health misinformation
A Guardian investigation found Google's AI Overviews displayed false and misleading health information across multiple medical topics. AI summaries gave incorrect liver function test ranges sourced from an Indian hospital chain without accounting for nationality, sex, or age. The feature advised pancreatic cancer patients to avoid high-fat foods, which experts said could increase mortality risk. Stanford and MIT researchers called the absence of prominent disclaimers a critical danger. Google removed some AI Overviews for health queries after the investigation, but many remained active.
Sharp HealthCare sued after ambient AI allegedly recorded exam-room visits without consent
A proposed class action filed on November 26, 2025 alleges that Sharp HealthCare used Abridge's ambient AI documentation system to record doctor-patient conversations without obtaining legally valid consent. The complaint says patients were not told their visits were being recorded, that recordings containing sensitive medical details were sent to outside servers, and that the system generated chart notes falsely stating patients had been advised of and consented to the recording. The named plaintiff says he only learned his July 2025 appointment had been recorded after reading his visit notes. Sharp's April 2025 rollout of the tool appears to have turned ordinary medical documentation into a privacy and compliance problem with a six-figure patient blast radius.
Deloitte gets caught using AI hallucinations in a government report - again
Seven weeks after Deloitte Australia agreed to partially refund a government contract over AI-fabricated citations, a Newfoundland and Labrador journalist discovered that Deloitte Canada's $1.6 million healthcare workforce report contained at least four fabricated academic citations from papers that don't exist. The fake references named real researchers as co-authors of fictional studies - researchers who confirmed they never wrote the cited work. Deloitte admitted AI was "selectively used to support a small number of research citations," stood by the report's findings, and offered no refund. The province's accounting watchdog launched a formal investigation, and Newfoundland became one of the first Canadian provinces to require AI disclosure in government contracts.
ChatGPT diet advice caused bromism, psychosis, hospitalization
A Washington patient replaced table salt with sodium bromide after ChatGPT suggested bromide as a chloride substitute without distinguishing between chemical and dietary contexts. After three months, he developed bromism - a rare poisoning syndrome - and was hospitalized with psychosis, hallucinations, and placed on an involuntary psychiatric hold.
ChatGPT coached a 19-year-old to mix Kratom and Xanax; he died
Sam Nelson, a 19-year-old UC Merced student, died on May 31, 2025 from a combination of Kratom and Xanax after ChatGPT told him the combination was safe and recommended a specific Xanax dose to manage his Kratom-induced nausea. According to a lawsuit filed by his parents on May 13, 2026, ChatGPT-4o began giving Nelson increasingly personalized drug advice after OpenAI launched its memory feature; the model presented this advice in authoritative, physician-like language without warnings. The suit alleges defective design, failure to warn, and wrongful death, and claims OpenAI skipped safety testing to rush GPT-4o to market against Google.
White House MAHA report shipped fake studies and OpenAI citation markers
On May 29, 2025, NOTUS reported that the White House's Make America Healthy Again report cited studies that did not exist and mischaracterized others. PolitiFact, the Washington Post, and congressional oversight Democrats later pointed to classic AI-citation red flags, including fake paper titles, broken DOI links, and "oaicite" markers associated with OpenAI citation output. The White House called the problems formatting issues and updated the report. Public health policy apparently got the same bibliography QA as a panicked term paper, because history has a dark sense of humor.
MD Anderson shelved IBM Watson cancer advisor
MD Anderson Cancer Center's Oncology Expert Advisor project with IBM Watson burned through $62 million - $39 million to IBM, $23 million to PwC - over four years of contract extensions. The system was piloted for leukemia and lung cancer using the old ClinicStation records system but was never updated to integrate with the hospital's new Epic EHR, effectively killing it. A University of Texas audit flagged procurement failures, bypassed standard processes, and an $11.6 million deficit in donor gift funds spent before they were received. IBM ended support in September 2016, noting the system was "not ready for human investigational or clinical use."
Eating disorder helpline’s AI told people to lose weight
The National Eating Disorders Association replaced its human-staffed helpline with an AI chatbot called Tessa shortly after the helpline staff moved to unionize. Tessa was built on the Cass platform and intended to provide scripted psychoeducational content about body image and eating disorders. Instead, users reported the chatbot recommending calorie deficits of 500 to 1,000 calories per day, suggesting weekly weigh-ins, encouraging calorie counting, and recommending the use of skin calipers to measure body fat - all standard advice for weight loss, and all directly counter to eating disorder recovery guidelines. NEDA acknowledged the chatbot "may have given information that was harmful" and disabled it.
Koko tested AI counseling on users without clear consent
In January 2023, Koko co-founder Rob Morris revealed on Twitter that the mental health peer support platform had used GPT-3 to draft responses for approximately 4,000 users seeking emotional support. Peer counselors on the platform could review and send the AI-drafted messages, but the users receiving them were not informed that AI had been involved. Morris said the experiment was stopped because the AI responses "felt kind of sterile," though he noted users rated the AI-assisted messages higher than purely human ones. The admission drew immediate backlash from mental health professionals, ethicists, and the public, who considered the undisclosed use of AI on vulnerable users an informed consent violation.
Epic sepsis model missed patients and swamped staff
A June 2021 study in JAMA Internal Medicine by researchers at Michigan Medicine externally validated the Epic Sepsis Model - a proprietary prediction tool deployed across hundreds of U.S. hospitals - and found it missed two-thirds of actual sepsis cases while generating so many false alarms that clinicians would need to investigate 109 alerts to find one real patient. The model's AUC of 0.63 fell well short of the 0.76 to 0.83 range Epic had cited in internal documentation, and the study found the tool only caught 7 percent of sepsis cases that clinicians themselves had missed. Epic later overhauled the algorithm and began recommending hospitals train the model on their own patient data before clinical deployment.
Google DR AI stumbled in Thai clinics
Google Health built a deep learning system capable of detecting diabetic retinopathy from retinal scans with over 90 percent accuracy in controlled lab settings. When researchers deployed it in 11 clinics across Pathum Thani and Chiang Mai in Thailand between late 2018 and mid-2019, the system rejected 21 percent of the nearly 1,840 images nurses captured as too low-quality to process - mostly due to poor clinic lighting. Slow internet connections added further delays to uploads, and nurses found themselves screening only about 10 patients per two-hour session. A tool designed to speed up triage instead created bottlenecks, patient frustration, and unnecessary specialist referrals.
Babylon chatbot 'beats GPs' claim collapsed
Babylon unveiled its AI symptom checker at the Royal College of Physicians and bragged it scored 81% on the MRCGP exam, but the claim could not be verified, and warned no chatbot can replace human judgment. Independent clinicians who later dissected Babylon's marketing study in The Lancet told Undark that the tiny, non-peer-reviewed test offered no proof the tool outperforms doctors and might even be worse.