Researchers invented a fake disease and major chatbots promoted it anyway
Researchers created a fake eye condition called bixonimania, uploaded fake papers full of obvious tells, and then watched major chatbots treat it as a real diagnosis. By April 2024, Copilot, Gemini, Perplexity, and ChatGPT were describing the condition, offering prevalence claims, or telling users when to seek medical care for it. The hoax later leaked into a real journal paper before retraction. The striking part is not that a chatbot answered one bad question badly. It is that academic-looking nonsense was enough to push a fictional disease into medical-sounding advice and then into the literature itself.
Incident Details
Tech Stack
References
A Disease With Better Branding Than Evidence
Bixonimania sounds like something a tired clinician might say while skimming a packed inbox of eye complaints. It has the right shape. It sounds medical enough to pass in casual conversation. That was the point.
Researchers led by Almira Osmanovic Thunström at the University of Gothenburg invented the condition as an experiment in how easily bad medical information could pass through the systems now being used to summarize, explain, and increasingly mediate health information. They seeded the web with fake material about a fabricated eye disorder tied to screen use. The papers and posts were not subtle about being fake. They included absurd affiliations, obviously invented funding, and other clues that should have told any careful reader the whole thing was nonsense.
The chatbots did not react like careful readers.
What the Researchers Actually Did
The project began in 2024, when the team posted fabricated material describing bixonimania as a real eye condition. This was not a polished deception campaign aimed at fooling peer reviewers through months of cloak-and-dagger work. It was closer to a controlled test of whether the modern information stack, especially large language models, could distinguish scholarly-looking fabrication from real biomedical evidence.
According to later reporting, the fake materials included details so implausible that they read like dare-level bait. Yet once the papers existed in the wild, they became machine-readable evidence. That is the important part. A large language model does not sit back, laugh at the Tolkien-style institutions, and move on. It sees tokens, patterns, and the shape of an academic claim. If the model has been trained or retrieval-augmented in a way that gives those patterns weight, the nonsense starts to look a lot like source material.
By mid-April 2024, Microsoft's Copilot was describing bixonimania as real. Google's Gemini was doing the same and advising users to consult an ophthalmologist. Later that month, Perplexity supplied prevalence numbers for the fake condition, and ChatGPT discussed whether users' symptoms sounded consistent with it. This was not one system going off script in isolation. It was a multi-platform failure mode.
Why This Is More Disturbing Than a Single Wrong Answer
Models get facts wrong all the time. That part is not new. What makes bixonimania useful is that the researchers built a fake condition from scratch, watched multiple systems absorb it, and then saw it echoed back with the tone and structure of legitimate medical guidance.
That reveals two problems at once.
First, chatbots can be startlingly vulnerable to scholarly cosplay. A document does not need to be true to acquire machine authority. It only has to look enough like a biomedical source to pass through the ingestion and ranking pipeline without friction.
Second, once the falsehood is inside the model's response patterns, it is no longer just a bad paper sitting in a corner of the internet. It becomes conversational misinformation. The system starts offering diagnosis-adjacent guidance, prevalence claims, and treatment-seeking suggestions. The falsehood becomes interactive.
If someone asks a search engine to define a bogus disease and gets a weird snippet page, that is a nuisance. If someone asks a chatbot about symptoms and receives calm, plausible advice about a disease that does not exist, the nuisance has matured into a health-information problem.
Then It Escaped Into the Literature
The most grimly predictable part came next. Reporting on the incident said the fake bixonimania material was eventually cited in a real journal paper before that paper was retracted. That is where the story stops being a neat hoax and turns into a broader warning about AI-contaminated research workflows.
Academic citations already contain plenty of ordinary human error. Authors misspell names, mangle titles, and point to the wrong DOI. Bixonimania was different. The problem was not a messy bibliography. The problem was that fictitious material, once given enough academic-looking packaging, proved capable of moving from fake source material to chatbot answers to published research.
That chain matters because it shows how different failure modes can compound. A fake paper becomes training or retrieval fodder. A chatbot repeats it. Another author, student, or researcher uses AI or a sloppy search workflow and sees what looks like corroboration. The fake source then gets cited in a real manuscript. At that point the machine is no longer merely hallucinating. Humans have joined the loop and started institutionalizing the error.
Medical Advice by Pattern Matching
One reason health misinformation is especially dangerous in chatbot form is that the systems are good at sounding proportionate. They do not usually scream. They reassure. They phrase guidance in the register of a cautious explainer. They tell users to monitor symptoms, consider differential causes, or seek specialist help. That surface reasonableness is what makes a fake condition like bixonimania risky once the model starts treating it as real.
A user does not need to be totally gullible to get nudged in the wrong direction. They only need to believe the model is summarizing something that exists in the literature. If the advice is framed as low-drama medical information, many people will not suspect that the diagnosis itself was fabricated by researchers as an experiment.
This is why the usual defense of "chatbots are only tools" starts to wear thin in health contexts. A tool that calmly invents conditions or validates fake ones is not behaving like a neutral index. It is actively manufacturing credibility.
Why the Academic Wrapper Worked
The fake-disease experiment also says something awkward about modern trust. A random blog post claiming a new disease exists would not travel very far in serious medical circles. Wrap the same claim in preprint formatting, references, and researcher-style presentation, and machines start treating it as evidence.
That does not mean preprint culture is the problem. Preprints are useful, often necessary, and not inherently less trustworthy than journal articles. The problem is that generative systems and users around them are often bad at distinguishing between "machine-readable document" and "verified scientific finding." Academic form is being mistaken for academic reliability.
That confusion has downstream consequences beyond medicine. Once models learn that source-shaped nonsense is good enough, every field that relies on searchable expertise becomes easier to poison. Bixonimania happened to be about eyes and screens. The same pattern could be used for nutrition, mental health, rare diseases, or consumer drug advice.
The Human Part of the Failure
It would be comforting to blame the whole thing on the bots. The bots earned plenty of blame. But this story also reflects a human appetite for outsourcing judgment. People increasingly use chatbots as triage tools, explainers, and first stops for health information. Researchers increasingly use AI systems while searching, outlining, summarizing, and drafting. Editors and reviewers are working under time pressure that makes deep source checking less routine than everyone claims it is.
Bixonimania moved because the systems were willing to repeat it and because the surrounding human workflows were willing to trust outputs that looked sourced enough. That combination is what gives a fake disease a path from prank to public-health nuisance.
What This One Shows Better Than Most
Plenty of AI health stories depend on edge cases or deliberately adversarial prompting. Bixonimania is more useful than that. The researchers did not need elaborate jailbreaks. They gave the internet a fake disease wearing an academic nametag and watched the stack salute.
That should be enough to unsettle any institution telling patients, clinicians, or students to treat chatbot health explanations as a harmless convenience layer. Once a fictional diagnosis can be promoted by multiple leading systems and then cited in a real paper, the issue is no longer limited to chatbot embarrassment. It is a quality-control problem spread across models, search habits, and scholarly publishing.
Discussion