Study finds AI chatbots no better than search engines for medical advice
Feb 2026
A randomized controlled trial published in Nature Medicine with 1,298 UK participants found that AI chatbot users (GPT-4o, Llama 3, Command R+) performed no better than the control group at assessing clinical urgency and worse at identifying relevant medical conditions. In one case, two users with identical subarachnoid hemorrhage symptoms received opposite recommendations -- one told to lie down in a dark room, the other correctly advised to seek emergency care.
Incident Details
Perpetrator:AI assistant
Severity:Facepalm
Blast Radius:General public using AI chatbots for medical guidance; study demonstrates benchmark performance does not predict real-world clinical utility
Tech Stack
GPT-4oLlama 3Cohere Command R+