AgentFlayer made ChatGPT Connectors leak secrets from a single poisoned file

Tombstone icon

At Black Hat USA 2025, Zenity Labs researcher Tamir Ishay Sharbat demonstrated AgentFlayer, a zero-click exploit against ChatGPT Connectors. A document carrying invisible text - a prompt written in roughly 300 words of one-pixel white font - is shared with a target. Even a harmless "summarize this" makes ChatGPT follow the buried instructions, search the user's connected Google Drive or SharePoint for API keys and secrets, and exfiltrate them through image-render URLs pointed at attacker-controlled Azure Blob storage, slipping past OpenAI's url_safe check. AgentFlayer was part of a broader set of zero click attacks Zenity disclosed on August 6, 2025, that also hit Copilot Studio, Salesforce Einstein, Cursor with Jira, and Gemini. The work is a proof of concept; extraction was proven via the researchers' own logs, with no confirmed customer harm.

Incident Details

Severity:Facepalm
Company:OpenAI
Perpetrator:AI productivity assistant
Incident Date:
Blast Radius:ChatGPT users with connected Google Drive or SharePoint exposed to silent theft of API keys and secrets from a single shared document

The document that gives orders

ChatGPT Connectors let you wire the assistant into the services where your real work lives - Google Drive, SharePoint, GitHub, and similar stores. The pitch is obvious and appealing: ask ChatGPT a question and it can pull in your actual files for context instead of making you paste everything by hand. The problem, as with most of these stories, is that giving an assistant broad read access to your data also gives anyone who can talk to the assistant a path to that data.

At Black Hat USA 2025, on August 6, 2025, Zenity Labs researcher Tamir Ishay Sharbat demonstrated exactly that path. He called the technique AgentFlayer, and the ChatGPT Connectors version is a zero-click attack - meaning the victim does not have to click a malicious link, approve anything, or even knowingly do something risky. They just have to be normal.

The setup is a poisoned document. The attacker writes a prompt - the demonstration used roughly 300 words of instructions - and hides it inside an otherwise innocent-looking file using invisible formatting, such as white text at one-pixel font size on a white background. A human opening the file sees a blank or unremarkable document. The attacker shares this file with the target, who uploads it to ChatGPT and asks the most boring question imaginable: summarize this for me.

That is the entire user action. There is nothing reckless about asking an AI to summarize a document; it is the headline use case. But when ChatGPT reads the file to summarize it, it also reads the hidden text, and the hidden text is not content. It is instructions. This is indirect prompt injection again: the model cannot reliably separate "the document I should describe" from "commands embedded in the document," because in its context window they are the same kind of token soup.

From "summarize this" to "exfiltrate my keys"

The buried prompt tells ChatGPT to stop summarizing and start hunting. In Sharbat's demonstration it instructed the assistant to search the victim's connected Google Drive for documents or files containing API keys, then package what it found and ship it out. SharePoint and other connected stores are exposed the same way. The user asked a harmless question; the AI, following the hidden orders, went rooting through their cloud storage for secrets.

Getting the loot out is the interesting engineering problem, and it is the part OpenAI had already tried to defend. ChatGPT can render images, which means it can be told to load an image from a URL - and a URL is a network request, which is a way to smuggle data out by encoding it into the address. OpenAI knew this and built a mitigation called url_safe, a client-side check that vets image URLs before rendering them and is meant to block requests to sketchy attacker domains.

Sharbat bypassed it by choosing a destination OpenAI considers trustworthy: Microsoft Azure Blob Storage. ChatGPT is comfortable rendering images hosted on Azure Blob, so the url_safe check waves them through. The trick is that you can connect an Azure Blob account to Azure's Log Analytics, which records every request made to your blobs - including all the parameters tacked onto the URL. So the attacker points the exfiltration at their own Azure Blob storage, encodes the stolen secrets into the image-request URL, and then simply reads them back out of their Azure logs. The data leaves through a channel OpenAI's own filter approved.

A whole family of these

AgentFlayer was not a single bug. Zenity disclosed it as a set of zero-click and one-click exploit chains hitting a roster of mainstream enterprise AI agents. Alongside ChatGPT Connectors, the research covered Microsoft Copilot Studio (custom agents that route customer emails could be steered into leaking internal configuration and CRM data), Salesforce Einstein, Cursor paired with a Jira MCP integration (prompt injection through Jira tickets synced from support systems, used to extract repository secrets and access tokens), and Google Gemini. The common thread is the same architectural mistake repeated across vendors: an agent with privileged access to internal data, ingesting untrusted external content, with some channel it can use to send data back out.

The vendor responses were not uniform, and that detail is telling. Zenity's release noted that some vendors, including OpenAI and Microsoft for Copilot Studio, issued patches after responsible disclosure. Others declined, characterizing the behavior as intended functionality. When half the industry patches a thing and the other half calls it a feature, you are looking at a disagreement about what these systems are even supposed to do, not just a list of bugs.

Proof of concept, not a known breach

This is demonstrated, working research, not a catalog of victims. The extraction was proven - Sharbat could see the stolen data arriving in his own attacker-controlled logs - which makes it more than theoretical. But there is no public evidence that AgentFlayer was used against real ChatGPT customers before OpenAI's mitigation. Call it exposure proven, customer harm unconfirmed. The point of a near-miss is that the mechanism worked end to end and nobody got hurt because a researcher got there first.

Why classifiers will not save this

The recurring conclusion across the AgentFlayer coverage is worth internalizing because it kills the most popular proposed fix. People reach for the idea of a classifier - a filter that detects malicious prompts and blocks them. But prompt injection is written in natural language, and natural language has effectively infinite ways to express the same intent. You cannot blacklist your way out of "search my Drive for API keys" when the same instruction can be rephrased a thousand ways and hidden in invisible text. As the researchers put it, blocking these attacks with classifiers or blacklisting is not enough.

The structural fix has little to do with detecting bad prompts. It is about not granting the dangerous combination in the first place: an agent that reads untrusted content should not also hold broad access to your secrets and an unconstrained way to send data outside the system.

The graveyard lesson

AgentFlayer belongs here because it is the cleanest possible demonstration of the zero-click agent failure. The victim did nothing wrong. They uploaded a file and asked for a summary, which is the literal advertised purpose of the product. Everything else - the data hunting, the secret theft, the exfiltration through an approved cloud domain - was done by the helpful assistant, following instructions written in ink the human could not see. The connector that makes ChatGPT useful is the same connector that makes it dangerous, and you do not get to have one without carefully containing the other.

Discussion