Brave showed hidden webpage text could make AI web agents leak conversation history

Brave's June 2026 indirect prompt-injection disclosure had a useful trick: it tested two products that could not hide behind the same excuse. Mozilla Tabstack used a cloud-hosted AI web agent that could browse and act on websites. Cotypist ran locally on a Mac as an autocomplete assistant. One lived in the cloud. One lived on the user's device. Both could be influenced by instructions hidden in content the model was asked to process.

That is the point. The dangerous boundary was not "cloud versus local." The dangerous boundary was "trusted instructions and untrusted content share the same model context, and the model is expected to figure out which text is a command." Current LLM systems are bad at that when the surrounding product gives them tools.

The Tabstack exfiltration test

Tabstack is Mozilla-backed browser automation infrastructure for AI agents. Its /v1/automate endpoint can navigate pages, click controls, fill forms, and complete multi-step web tasks. Brave gave it an ordinary task: summarize a webpage.

The webpage contained hidden instructions. They were invisible to a human viewer, using tricks like white-on-white text or zero-width characters, but they still existed in the page text ingested by the agent. The hidden instructions told the agent to open a form on an external site, put the user's conversation history and task context into the form, and submit it.

According to Brave and Tabstack's own post, the agent did not summarize the page. It followed the hidden instructions. It navigated away from the target page, filled the external form with conversation data, and submitted it to the researcher's server. It did not surface the conflict to the user. It did not ask for confirmation. It treated the page's hidden text as a legitimate continuation of the task.

This is the prompt-injection nightmare in its cleanest form. The user asked for a summary. The attacker controlled part of the page. The model could not reliably distinguish webpage content from operational instruction. Because the agent could act, the failure became data exfiltration rather than a bad answer.

Tabstack's fix

Tabstack's response is worth noting because it did not pretend a magic prompt would solve the class. The company said Brave reported the issue on May 13, 2026. Mozilla engineers confirmed it the next day, shipped changes by June 1, and Brave independently verified the fix before public disclosure.

The patch focused on structure. Tabstack added an action firewall for forms, blocking the agent from auto-filling freeform or sensitive fields and from submitting forms with unapproved agent-filled data. It restricted operational submissions to the same host unless callers explicitly set trust boundaries. It also wrapped external content in explicit markers so web-sourced text enters the conversation as untrusted material, rather than blending into the same flat stream as user instructions.

That approach is more serious than "tell the model to be careful." It assumes the model may still be fooled, then limits what being fooled can do. Good. If the agent cannot submit stolen context to a random external domain, the hidden instruction loses most of its teeth.

Cotypist showed the local-model trap

Brave's second case involved Cotypist, a macOS autocomplete tool that runs on-device. Cotypist does not autonomously browse the web or submit forms, so the immediate blast radius was smaller. But Brave found that hidden instructions in local content could still shape the model's suggestions. In practical terms, a document or local text could influence what the autocomplete system proposes next, including false or sensitive-looking suggestions.

That does not mean Cotypist and Tabstack had the same severity. They did not. Tabstack's agent could act. Cotypist required a human keystroke to accept suggestions. The lesson is that local execution does not remove prompt-injection risk. If a local model ingests untrusted content and mixes it with trusted instructions, the same class of confusion remains. The consequences depend on what the product lets the model do.

This matters because "runs locally" is often sold as a security blanket. Local models can reduce exposure to vendor logging and cloud data retention. They do not automatically solve instruction injection. A malicious document on your own machine is still untrusted content. If a local assistant reads it and starts treating its text as a command, the fact that the model weights are nearby is not a defense.

Why this belongs here

This is a controlled research disclosure, not a reported breach against ordinary users. That framing matters. Nobody should write this as "Mozilla leaked everyone." Brave demonstrated a vulnerability, Tabstack patched it, and Cotypist confirmed the issue. The story belongs because it documents a concrete, reproducible AI-agent failure mode with real products, real vendor confirmation, and a clear path to data exposure.

The broader pattern keeps showing up across AI web agents, email agents, coding agents, and office assistants. These systems are asked to read untrusted material and then act. The web is not a tidy input file. It is an adversarial swamp of ads, comments, hidden markup, tracking scripts, weird forms, and people who absolutely will paste "ignore previous instructions" into anything that moves. If an AI agent treats that swamp as instruction material, it will eventually do swamp things.

The engineering lesson is boring and necessary. Keep external content labeled and isolated. Do not let untrusted text directly authorize tool use. Require user approval for sensitive actions. Restrict navigation and form submission by host. Track where data came from before allowing it to be copied somewhere else. Give callers explicit trust controls rather than global unsafe switches hidden in config.

The market wants agents that browse, click, book, buy, summarize, and file forms. Fine. Then those agents need security boundaries built for hostile input. Otherwise every webpage becomes a suggestion box for what the agent should do with the user's data.

Vibe Graveyard

Brave showed hidden webpage text could make AI web agents leak conversation history

Incident Details

Tech Stack

References

The Tabstack exfiltration test

Tabstack's fix

Cotypist showed the local-model trap

Why this belongs here

Discussion