Deep-research agents can be steered by a 13-word poisoned Reddit snippet

What the paper tested

On May 22, 2026, Cornell Tech researchers Tingwei Zhang, Harold Triedman, and Vitaly Shmatikov published a preprint with an unpleasantly simple finding: deep-research agents are very good at finding the same writable pages over and over.

The paper tested a class of systems that retrieve web pages, generate follow-up queries, synthesize findings, and produce citation-backed reports. The open-source systems in the experiment were STORM, Co-STORM, and OmniThink. The researchers also did reconnaissance against commercial deep-research products from OpenAI and Google, without running a live attack against those systems because poisoning real web content for a paper would be a great way to become the footnote everyone hates.

The weak point is retrieval overlap. A deep-research agent does not ask one question, fetch ten links, and stop. It breaks a topic into subquestions, sends many related searches, and stitches the results into one polished report. For many topics, that repeated search process keeps returning the same user-generated content pages from places like Reddit and Wikipedia. Those pages are useful because they are current, detailed, and full of human discussion. They are also useful to attackers because people can edit or append to them.

The researchers found that individual user-generated pages appeared in up to 48% of queries inside a topic cluster, and 17% to 23% of retrieved URLs came from user-generated platforms. That turns a single writable page into a lever. If the agent keeps retrieving the page, a short poisoned fragment on that page can keep walking into the report pipeline.

The 13-word steering wheel

The paper's ethical simulation framework, GeoStorm, did not alter live Reddit or Wikipedia pages. Instead, it modeled what would happen if an attacker could place a short crafted snippet into content the agent already retrieves.

The result was not subtle. In the search-result snippet setting, a single poisoned URL containing about 13 words of attacker-chosen text produced conditional mention rates of 38% to 51%. Multi-URL targeting raised that to 42% to 62%. In the full-content setting, where the poisoned text was appended to a complete Reddit thread and made up less than 4% of the retrieved content, the conditional mention rate still landed between 30% and 53%.

The phrase "conditional mention rate" matters. It measures how often the attacker-chosen entity appears after the poisoned content is retrieved. The attack still depends on retrieval. The grim part is that deep-research agents make retrieval less random than ordinary search. Their query expansion and multi-agent loops can repeatedly reach the same source cluster, which makes the attack practical.

404 Media's June 15 coverage put the paper into the mess already visible on Reddit, Wikipedia, Quora, and similar sites. Brands are trying to seed user-generated pages so AI tools will scrape, cite, and recommend them. The marketers have discovered a new acronym, AEO, for AI-engine optimization. Congratulations to everyone involved; we reinvented SEO spam with fewer blue links and more false authority.

Why citations make it worse

A poisoned AI report is more persuasive when it has citations. Users know naked chatbot paragraphs can hallucinate. A structured report with links, footnotes, and source references feels more like research and less like autocomplete in a blazer.

That presentation is exactly why user-generated poisoning is dangerous. The report can cite a real Reddit thread or Wikipedia page while promoting a claim or entity inserted by someone with a motive. The citation is not fake. The page exists. The user can click it. The problem is that the page is writable, the inserted text may be adversarial, and the agent has no reliable sense of whether that tiny fragment belongs in a neutral answer.

Traditional SEO already abuses this trust boundary. Low-quality pages chase ranking signals because a high search position turns into traffic. Deep-research agents change the prize. The attacker does not merely want a page to rank; they want the agent to incorporate the poisoned text into a generated answer and present the answer as synthesis.

That is a nastier failure mode than ordinary search spam. With search spam, the user still sees the suspicious page as a source. With agentic search, the spam can be laundered through the model's own prose. The citation trail exists, but the user may not inspect it because the report already did the thinking for them. Very efficient, if the goal is outsourcing judgment to a machine that just ate a poisoned comment.

The defenses were not clean

The researchers tested defenses at several points in the pipeline, including source blocking, input filtering, and output filtering. The paper says none mitigated the attack without degrading output quality.

That tracks with the product problem. Blocking all user-generated content would cut out a lot of useful, current information. Reddit threads, support forums, Stack Overflow-style discussions, and Wikipedia pages often answer exactly the practical questions users ask deep-research agents. But blindly trusting those pages turns volunteer-moderated public forums into part of the security boundary for commercial AI search products. That is not a sentence anyone should be comfortable reading twice.

Input filters have to detect short poisoned snippets without crushing legitimate discussion. Output filters have to recognize when an entity was inserted because of adversarial content rather than normal relevance. Both jobs are hard because the attack text can be tiny, ordinary-looking, and contextually plausible.

The commercial reconnaissance detail is also worth noting. The researchers found that Gemini Deep Research cited user-generated content at 12.1% for the topics they evaluated. That does not prove Gemini is exploitable in the same way, because the researchers did not run a live end-to-end attack against Google's server-side retrieval. It does show the same dependency pattern exists outside open-source lab systems.

Why this belongs here

This is not a one-company faceplant, but the site already tracks studies when they document a real pattern of AI failure. This one does. It shows how agentic search can convert ordinary writable web pages into attack surfaces, then wrap the result in citation-backed prose that looks more trustworthy than it deserves.

The harm path is straightforward. A user asks for advice about a product, medical topic, vendor, destination, legal question, or technical recommendation. The agent retrieves a poisoned user-generated page. The final report promotes attacker-chosen content. The user sees a sourced research summary, not a random comment somebody slipped into a thread.

Better products will need boring controls: provenance labels for user-generated material, trust tiers for writable sources, stronger separation between retrieved text and generated claims, alerts when a claim depends on a tiny fragment from a mutable page, and refusal to promote entities when the supporting evidence is thin. They will also need to accept that "has a citation" is not the same thing as "is safe to repeat."

The web is writable because that is what makes it useful. Deep-research agents turn that writable web into generated authority. The Cornell paper is a clean warning that the bridge between those two things is currently held together with hope, retrieval heuristics, and the unpaid labor of moderators who did not volunteer to secure the AI search industry.

Vibe Graveyard

Deep-research agents can be steered by a 13-word poisoned Reddit snippet

Incident Details

Tech Stack

References

What the paper tested

The 13-word steering wheel

Why citations make it worse

The defenses were not clean

Why this belongs here

Discussion