Google’s AI Overviews says to eat rocks

On May 14, 2024, at its annual I/O developer conference, Google announced it was rolling out AI Overviews to all US search users. The feature places an AI-generated summary at the top of search results, above the traditional list of links. Instead of scanning through websites to find an answer, users would see a synthesized response generated by Google's AI, drawing from multiple sources across the web. Google positioned it as the biggest change to search in years - their competitive answer to Microsoft's Copilot and OpenAI's ChatGPT.

Within days, the feature became the most widely mocked AI product launch in recent memory.

The greatest hits

Users began sharing screenshots of AI Overviews responses that ranged from wrong to dangerous to genuinely bizarre. The examples spread across social media and were covered by the BBC, the New York Times, Wired, The Verge, CNET, Popular Science, and essentially every technology publication in existence.

The most famous example: when users searched for how to get cheese to stick to pizza better, AI Overviews told them to add "non-toxic glue" to the sauce. The source was traced back to a comment on a Reddit thread from 11 years earlier where a user had jokingly suggested adding Elmer's glue. The AI treated this as a sincere culinary recommendation and presented it as authoritative advice at the top of Google Search.

Another AI Overview stated that geologists recommend humans eat at least one small rock per day for the minerals and vitamins they contain. The sourcing for this appears to have been a satirical article from The Onion, which the AI system treated as factual information. Google's AI had looked at a comedy publication, decided its content was a legitimate scientific recommendation, and presented it to users searching for nutrition information.

Other documented errors included AI Overviews telling users that Barack Obama was Muslim, providing incorrect historical dates, giving dangerous medical advice, and producing summaries that directly contradicted the sources they cited. An X account called "Goog Enough" was created specifically to collect and share examples of bad AI Overviews, accumulating potentially hundreds of entries.

CNET documented particularly concerning examples where AI Overviews provided answers that "seemed to come from a different reality." The breadth of errors was not limited to obscure queries. Some of the most viral examples came from common, everyday searches - the kind that Google handles billions of times per day.

Google's response

Google initially defended the feature through a spokesperson, who told The Verge and other outlets that examples like the pizza glue suggestion only appeared for "generally very uncommon queries, and aren't representative of most people's experiences." Popular Science noted the absurdity of this defense: Googling pizza recipe tips is not an uncommon query.

On May 30, 2024, Liz Reid, Google's head of search, published a blog post acknowledging the errors directly. She described some results as "odd, inaccurate or unhelpful AI Overviews" and outlined changes Google was making. Reid confirmed that the glue-on-pizza answer was real and that the AI had sourced it from forum content. She acknowledged that the system had difficulty distinguishing between sincere advice and satirical or sarcastic posts.

Google made several technical adjustments: limiting AI Overviews for health-related and sensitive queries, reducing the feature's tendency to surface forum content as authoritative, and adding better detection for satirical sources. The company also noted that some of the screenshots circulating online were fabricated - people creating fake AI Overview screenshots for engagement - but acknowledged that plenty of the real documented errors were genuine.

Why it happened

AI Overviews uses a retrieval-augmented generation (RAG) approach: the system retrieves relevant web content and then uses a language model to synthesize it into a summary. The quality of the output depends on both the quality of the retrieved sources and the model's ability to correctly interpret and synthesize them.

The pizza glue example illustrates the source quality problem. Google's index includes Reddit, forums, satire sites, joke posts, and the full spectrum of internet content. A search engine's job is to help users find information; it is agnostic about whether that information is true. When a human reads Google search results, they can evaluate the credibility of each source. When an AI model reads the same results and synthesizes them into a single answer, it does not have the same judgment. An 11-year-old Reddit joke carries the same weight as a professional cooking site if the retrieval system rates them similarly for relevance.

The rocks example illustrates the satire problem. The Onion publishes content that is structurally identical to real news articles - same format, same tone, same level of detail - but the content is invented. For a human reader, The Onion's reputation as a satire publication is obvious context. For a language model, the text itself looks indistinguishable from a real news article. Without a reliable way to flag satirical sources, the model will treat The Onion the same as any other publication.

These are not edge cases that arose from unusual interactions with the system. They are predictable consequences of the architecture. Any system that synthesizes arbitrary web content into authoritative-sounding answers without reliably evaluating the credibility and intent of its sources will eventually present jokes as facts, satire as science, and forum trolling as professional advice.

The search trust problem

Google Search has a unique relationship with trust. For over two decades, it has been the default starting point for finding information online. Search results are links to other sites - users click through, evaluate the source, and form their own judgments about credibility. Google's role is curation, not authorship.

AI Overviews changes that relationship. The AI-generated summary at the top of the page looks like Google's own answer. It is presented in Google's voice, on Google's page, with Google's branding. Users who have spent their lives trusting Google Search results now see a box that appears to be Google telling them to eat rocks. The implicit authority of the Google brand is transferred to an AI-generated summary that has no understanding of what it is saying.

This is different from a chatbot hallucinating during a conversation. ChatGPT users understand they are talking to an AI and (theoretically) evaluate responses accordingly. Google Search users are looking for reliable information and expect the results page to help them find it. Placing an AI-generated answer at the most prominent position on the results page - above the organic links, above the ads, above everything - signals that this is the answer. And sometimes the answer was "eat glue."

The competitive pressure

The timing of the rollout was driven by competition. Microsoft had integrated Copilot into Bing. OpenAI's ChatGPT was handling millions of search-like queries. Google, which had built its dominance on being the best at organizing information, was watching users go elsewhere for answers. The pressure to ship an AI-powered search experience was intense.

Google had already stumbled once. In February 2023, when it announced Bard (its ChatGPT competitor), the demo included a factual error about the James Webb Space Telescope that was spotted immediately and wiped $100 billion from Alphabet's market capitalization.

AI Overviews was Google's second attempt at showing it could compete in AI. The launch to all US users - not a limited beta, not an opt-in experiment, but a default feature for everyone - reflected the urgency of the competitive moment. The errors that followed suggested the feature was not ready for that scale.

Aftermath

Google did not remove AI Overviews. It adjusted the feature, tightened its sourcing, and reduced its appearance for categories of queries where errors were most dangerous. Users who wanted to disable the feature could do so through settings, though the option was not prominently advertised.

The incident entered the cultural lexicon. "Google told me to eat rocks" became shorthand for the gap between AI hype and AI reliability. For a company that had spent decades positioning itself as the most trustworthy way to find information, the reputational cost was real. The errors were absurd enough to be funny, which made them memorable, shareable, and impossible to spin away. No amount of corporate messaging about "odd, inaccurate or unhelpful" results could compete with a screenshot telling someone to eat a rock a day for good health.

Vibe Graveyard

Google’s AI Overviews says to eat rocks

Incident Details

Tech Stack

References