Researchers checked 444 AI iPhone apps and two-thirds were leaking the keys to their own AI
In June 2026, Wake Forest University researchers published a study built on a tool they call LLMKeyLens, which inspects the network traffic of iOS apps that use large language models. Of 444 working AI-powered App Store apps they tested, 282 (about 64%) leaked exploitable credentials or backend access. Some shipped plaintext API keys, dozens of those also exposing the app's private system prompt; others stood up unauthenticated backend proxies that would accept requests from anyone who found the URL. The exposure lets an attacker run up the developer's AI bill, abuse the model, or steal the proprietary prompt. On a 90-day re-test, most of the vulnerable apps had still not fixed it.
Incident Details
Tech Stack
References
Every AI feature you tap in a phone app has to talk to a model somewhere, and that conversation needs a credential. The clean way to do this is to keep the secret key on a server you control, let the app talk to your server, and let your server talk to the model. The lazy way is to bury the key inside the app itself, or to stand up a backend that answers to anyone who knows the address, and hope nobody looks. A study out of Wake Forest University looked. It did not go well for the apps.
What they actually did
The researchers built a tool they call LLMKeyLens. In plain terms, it sits in the middle of an app's network traffic and watches what the app sends when it uses an AI feature, the same way a curious person with the right software can watch what their own phone is saying to the internet. They pointed it at 444 iOS App Store apps that had working large language model features and recorded what leaked.
The headline number is the one that should make developers wince: 282 of the 444 apps, roughly 64%, leaked exploitable credentials or backend access. That is not a handful of sloppy hobby projects. That is most of the field.
The three flavors of leak
The exposure came in a few distinct shapes, and each one is bad in its own way.
Some apps transmitted plaintext API keys, the literal secret that bills to the developer's account with the model provider. Anyone capturing that key can use it to make their own requests against the developer's account, on the developer's dime, until somebody notices the bill. Of the apps that leaked keys this way, dozens also exposed the app's system prompt, the hidden instruction set that defines how the product behaves. That prompt is often the actual product. A company can spend months tuning the instructions that turn a generic model into "a calm, supportive sleep coach" or "a strict, accurate tax assistant," and here it was riding along in the clear for anyone watching the traffic.
Other apps did not ship a key at all, which sounds better until you read the next part. They routed requests through an unauthenticated backend proxy, a server the developer set up to talk to the model, with no check on who was allowed to use it. The developer probably thought of this as the secure option, because the key stays on the server. But a proxy that accepts requests from anyone who knows its address is just a key with extra steps. An attacker does not need to steal the secret; they can simply send their own prompts to your proxy and let you pay for the model to answer them.
Add it up and you get a population of apps where a meaningful fraction can be drained, abused, or reverse-engineered by anyone with ordinary traffic-inspection tools and a free afternoon.
Why this counts as an AI failure and not just sloppy mobile dev
Hardcoding a secret in a mobile app is an old mistake. What makes this an AI-era story is the specific pattern that produced it at this scale. The promise of modern AI tooling is that you can wire an app directly to a powerful model in an afternoon, and a great deal of that wiring is now generated or scaffolded by AI coding assistants that happily produce a working demo. A working demo and a safe production app are not the same thing, and the gap between them is exactly the boring security plumbing, authentication, key management, rate limiting, that a "make it work" mindset skips.
The leaked system prompts make the AI angle even sharper. The thing being exposed is not just a generic credential; it is the model-shaped intellectual property at the center of the product, plus the means to impersonate the product and spend its budget. That is a class of harm that did not exist before apps started carrying a brain they rent by the token.
The part that should worry everyone
Studies like this usually end with a responsible-disclosure note and a hopeful line about vendors patching. Here is where this one gets grim. On a 90-day re-test, the researchers found that the large majority of the vulnerable apps, on the order of seven in ten, were still exploitable. The window for quietly fixing this came and went, and most developers either never got the message or never prioritized it.
Some of the affected apps were not obscure. The dataset included apps with very large user bases, which means the exposed credentials and prompts belong to products real people use every day, sitting behind features those people assume are professionally built.
The takeaway, stated without drama
Nobody here got their database wiped or their customers doxxed, at least not in a way the study could confirm. This is a hazard study, a measurement of how much of the AI app ecosystem is standing on a trap door. The blast radius is the developers who will eventually find a surprise invoice from their model provider, the companies whose tuned prompts can be lifted by a competitor, and the users relying on apps that treated the most sensitive part of the build as an afterthought.
The fix is the same advice it has always been, now with higher stakes because the secret bills by the token: keep credentials on a server, make the server check who is calling, never trust the client with anything you cannot afford to hand a stranger, and remember that an AI assistant generating your integration code is optimizing for "this runs," not "this is safe to point at the public." Two-thirds of the apps tested learned that distinction the hard way, and most of them had not finished learning it three months later.
Discussion