A hidden line in a PDF could make Notion's AI agent leak your client list

Giving the document a vote

Notion is where a lot of companies keep the soft tissue of the business: client lists, deal notes, internal strategy, the spreadsheet nobody is supposed to email around. In September 2025 Notion shipped version 3.0, putting autonomous AI agents into that workspace - agents that can read your pages, act on your behalf, and reach out to the web. In Notion's telling, the agent runs on Claude Sonnet 4 and comes with a built-in web-search tool. The pitch is an assistant that can take a task and run with it.

Within roughly a day of that launch, researchers at CodeIntegrity showed the obvious problem with handing an autonomous agent the keys to a workspace full of secrets: the documents that workspace contains do not have to be trustworthy, and the agent reads them anyway. Their finding was quickly amplified by Simon Willison, who coined the framing this incident is now filed under, and by the team at PromptArmor.

The lethal trifecta

Willison's "lethal trifecta" names the three ingredients that, combined in one AI agent, reliably produce a data-theft machine:

Access to private data. The Notion agent can read your workspace - client lists, ARR figures, internal notes.
Exposure to untrusted content. The agent ingests documents, including files uploaded by or shared from outside.
A way to communicate externally. The agent has a web-search tool that fetches URLs.

Each is individually reasonable. An assistant that could not read your data would be useless; one that could not read documents would be pointless; one that could not reach the web would be inert. Put all three in the same agent with no hard wall between them and you have built the exact conditions under which an attacker who controls the untrusted content can read the private data and ship it out the external channel. The agent does the work, faithfully, on the attacker's behalf.

How the PDF attack worked

CodeIntegrity's proof of concept is almost insultingly simple, which is what makes it instructive. They built a PDF that looks, to a human, like ordinary customer feedback. Hidden inside it is text rendered white-on-white - invisible to the eye, fully legible to the agent that parses the file. That concealed text is a prompt.

When the Notion agent processes the document, it reads the hidden instructions as instructions, because it cannot distinguish "the content of this file" from "commands to follow." The injected prompt tells the agent to do three things in sequence:

Find the confidential data. Read the client list and pull each customer's name, company, and annual recurring revenue (ARR) - the revenue a customer is expected to generate per year, one of the more sensitive numbers a business keeps.
Package it. Concatenate the harvested records into a single string.
Send it out. Issue a web-search request through the agent's functions.search tool to an attacker-controlled domain, with the stolen data stitched into the URL.

That third step is the clever, ugly part. The web-search tool exists to fetch information from external sites. But fetching a URL is, mechanically, also a way to transmit a URL - and anything you can encode into a URL goes along for the ride. So the agent does not need a forbidden "send data to attacker" capability. It just needs to be asked to look something up at an address that happens to contain your client list. The data leaks through a feature designed to bring information in, run in reverse.

The researchers were also clear that the PDF is just one delivery route. Notion 3.0 agents can connect to third-party services - the kind of integrations that pull in content from elsewhere - and any of those is a potential channel for the same untrusted-content injection. The PDF is the demo, not the boundary.

Why this class of bug is so stubborn

Bruce Schneier, who also flagged the research, put the underlying problem bluntly: the model cannot differentiate between authorized commands and untrusted data. That is not a Notion-specific bug. It is the defining limitation of current language models, and it is why prompt injection has proven so resistant to a clean fix. You can train classifiers to catch known injection patterns, and you should, but the attacker gets to keep rewording the payload, while the model keeps treating instructions and data as one stream.

What Notion 3.0 added was not a new vulnerability in the model; it was autonomy and tools wrapped around it. A chatbot that hallucinates a bad answer wastes your time. An agent that can read your private pages and call out to the web can be talked into theft. The capability that makes the agent valuable - acting, with access, across your data - is precisely the capability an attacker wants to borrow. Notion reports on the order of 100 million users, which is a large surface for a failure mode that needs nothing more than a booby-trapped document landing in a workspace.

The response, and the honest caveats

This was responsible disclosure of a demonstrated technique, not a reported breach. Following the research, Notion said it had remediated the issue in production and upgraded its prompt-injection detection so that it now catches a broader range of injection patterns, including those hidden in file attachments. Notion has also published guidance on how it defends against prompt injection.

Two caveats keep this honest. First, there is no public evidence that any real customer's data was stolen this way; the harm here is demonstrated exposure, not confirmed exploitation. Second - and this is the recurring catch with prompt-injection defenses - "we upgraded our detection to catch a broader range of patterns" is improvement, not closure. Detection raises the cost of the attack; it does not resolve the structural fact that the agent still reads untrusted content, still has access to private data, and still has a tool that can reach outside. The trifecta is mitigated, not dismantled.

Graveyard lesson

The Notion 3.0 finding belongs here because it is the cleanest possible illustration of the lethal trifecta in a mainstream product: a hidden line of white text in a PDF, an agent with access and a web tool, and a client list walking out the door inside a search query.

The defensive lessons are the unglamorous, structural ones. Treat every uploaded document and every connected integration as untrusted content that must never be allowed to act as instructions to an agent. Do not give an agent that can read private data an unrestricted channel to the open web in the same breath - constrain which destinations a tool can reach, and treat outbound requests that carry workspace data as the exfiltration vector they can be. And require real friction, not a silent tool call, before an agent ships internal data anywhere. An autonomous assistant is only as trustworthy as the least trustworthy document it will read, and right now it will read all of them.

Vibe Graveyard

A hidden line in a PDF could make Notion's AI agent leak your client list

Incident Details

Tech Stack

References