TrustFall turns AI coding-agent folder trust into code execution
Adversa AI disclosed TrustFall, a class-level security flaw in agentic coding CLIs including Claude Code, Gemini CLI, Cursor CLI, and GitHub Copilot CLI. The attack used malicious repository configuration to auto-start project Model Context Protocol servers after a developer accepted a folder-trust prompt. In developer environments that meant one keypress could start attacker-controlled code with the user's privileges; in headless CI workflows, the same pattern could run without a prompt at all. Several vendors treated the behavior as a trust-model boundary rather than a conventional CVE.
Incident Details
Tech Stack
References
The Prompt Was Not the Payload
TrustFall was disclosed by Adversa AI on May 7, 2026. The research covered a shared security pattern across agentic coding command-line tools: Claude Code, Gemini CLI, Cursor CLI, and GitHub Copilot CLI. The awkward bit was that the exploit did not require the model to be tricked into writing malware or the user to approve a suspicious shell command during a chat session.
The attack started earlier, at the moment a developer opened or cloned a malicious repository and accepted a folder-trust prompt.
These tools increasingly use project-local configuration to define Model Context Protocol servers. MCP servers are helper processes that give an AI agent access to tools, files, APIs, or workflows. They can be perfectly legitimate. They can also be native operating-system processes running with the same privileges as the developer.
Adversa's finding was that a repository could include configuration that auto-approved and launched a project-defined MCP server after the user trusted the folder. In Claude Code, the research focused on project-scoped settings such as enabling project MCP servers and allowing specific project-defined behavior. Once the user accepted the broad trust dialog, the MCP server started. The code ran as an OS process, not as a toy inside a model sandbox.
That distinction is the whole story. A chatbot saying a foolish thing is one failure mode. A coding agent launching arbitrary local processes from repository configuration is a different failure class entirely, with developer credentials sitting squarely in the blast radius.
One Keypress, Full User Context
In the developer-machine variant, the user clones or opens a malicious repository, runs the agentic CLI, and accepts the tool's folder-trust prompt. In Adversa's Claude Code example, that prompt was generic enough that it did not enumerate exactly which project MCP servers would start or what commands they would run. The default action was to trust.
After that, the malicious MCP server runs with the privileges of the user running the agent. That means access to local source code, environment variables, SSH keys, package-manager tokens, cloud credentials, browser-adjacent files, and anything else the account can reach. Security people tend to use the phrase "full machine compromise" here because "oops, the repo got spicy" does not fit in an incident report.
The CI variant is worse in a quieter way. In headless workflows, there may be no visible prompt to accept because the tool is running inside automation. A pull request branch or repository configuration can become the place where code execution is triggered. CI systems often hold exactly the secrets attackers want: deployment tokens, package publishing credentials, cloud permissions, and access to private build artifacts.
TrustFall therefore sits at the intersection of two trust assumptions. Developers trust repositories enough to inspect them with tools. AI coding agents trust repository-local configuration enough to launch helper processes. CI systems trust automation enough to run without a human sitting there reading a dialog. Put those assumptions in a blender and you get a supply-chain attack path with a logo and a blog post.
Why This Was Hard to Classify
One reason TrustFall is interesting is that it does not fit neatly into traditional vulnerability handling. Adversa said Anthropic treated the Claude Code behavior as outside its threat model, arguing that once a user says a folder is trusted, project configuration is allowed to run within that trust boundary. Adversa's response was that the user-facing consent was not informed enough. If the prompt does not clearly say that trusting the folder can start project-defined native processes, the consent model is doing a lot of very optimistic work.
The cross-tool comparison complicated disclosure further. Adversa found comparable exposure in Gemini CLI, Cursor CLI, and Copilot CLI, though the prompts differed. Some tools mentioned MCP more clearly than others. Some enumerated servers. Some did not. The exposure was similar: project configuration could cause MCP execution after folder trust.
That makes TrustFall less like a single bug and more like an industry convention aging badly in public. Traditional IDEs have long had workspace trust concepts, but they were designed around human operators and predictable extensions. Agentic coding tools change the stakes because they combine repository content, local configuration, tool execution, and developer credentials into one workflow. "Trust this folder" used to mean "this editor may enable workspace features." In an AI coding agent, it can edge closer to "this repository may start processes that act with your authority."
The MCP Trust Gap
MCP is powerful because it lets agents reach beyond chat. It is also risky for the same reason. An MCP server is not a paragraph of text. It is a process. It can read, write, call networks, and run code depending on how it is configured and what the host allows.
Project-local MCP configuration is convenient for teams. A repository can carry the tool setup needed to work on it. New contributors get the same agent capabilities without a long setup document. That convenience is exactly why attackers like this pattern. The malicious file can live where developers expect useful configuration files to live, and the trust prompt can feel like routine setup friction.
The old security advice for unknown repositories was "do not run random code." AI coding agents blur that into something more treacherous: "do not inspect random code with a tool that may run random code while helping you inspect it." That sentence is ugly, but the world earned it.
Why This Belongs Here
TrustFall is a Vibe Graveyard story because the failure sits inside the AI coding-agent workflow rather than ordinary software trivia. The affected tools are designed to read repositories, understand code, and help developers act inside projects. The vulnerable behavior arose from the way those tools operationalized trust and project configuration around agentic execution.
No public source showed widespread real-world exploitation when the research was published, so this should not be written as a breach story. It is a documented, reproducible security failure with credible research, independent coverage, and a concrete blast radius: developer machines and CI runners that gave broad trust to repositories while agentic CLIs had authority to start project-defined helper processes.
The practical lesson is blunt. A folder-trust prompt is not a sandbox. It is a policy decision dressed as a dialog. If the dialog does not enumerate executable project configuration, default to a safe choice, and separate "read this repository" from "run code from this repository," it is asking developers to approve a risk they may not understand.
For organizations using AI coding tools, TrustFall argues for treating agent configuration as executable supply-chain material. Review MCP files in pull requests. Disable auto-start behavior where possible. Keep CI agent permissions narrow. Separate credentials used for code review from credentials used for deployment. Monitor child processes spawned by agent CLIs. None of this is glamorous, but neither is explaining why a repository someone opened to review a patch got a deployment token.
Discussion