Georgia Tech tracker confirms dozens of real-world CVEs introduced by AI-generated code - and says the true number is 5-10x higher

Tombstone icon

Georgia Tech's Systems Software & Security Lab launched the Vibe Security Radar in May 2025 to do something no one else had systematically attempted: track real-world CVEs that were directly introduced by AI-generated code. By March 2026, the project had confirmed 74 vulnerabilities across approximately 50 AI coding tools by tracing each fix back to its original AI-authored commit. The trend is accelerating - 6 CVEs in January, 15 in February, 35 in March. Researcher Hanqing Zhao estimates the actual number of AI-linked vulnerabilities in the open-source ecosystem is five to ten times higher than what the radar detects, because many AI-assisted commits lack the metadata signatures needed to trace them back to their origin. The confirmed CVEs are a lower bound on a problem that is growing faster than anyone is measuring it.

Incident Details

Severity:Facepalm
Company:Multiple (AI coding tool ecosystem)
Perpetrator:AI coding assistants
Incident Date:
Blast Radius:74 confirmed CVEs across 50+ AI coding tools; exponential month-over-month growth; estimated 5-10x undercount across the open-source ecosystem

Moving Past Benchmarks

The security industry has spent the better part of two years arguing about whether AI-generated code is more or less secure than human-written code. Most of that argument has been conducted via benchmarks - controlled experiments where researchers feed prompts to coding assistants and scan the output for vulnerabilities. The results have varied wildly depending on the benchmark design, the models tested, and who was funding the research.

Georgia Tech's Systems Software & Security Lab (SSLab) took a different approach. Instead of testing what AI tools might produce in a lab, researcher Hanqing Zhao launched the Vibe Security Radar in May 2025 to track what AI tools actually produced in the real world - specifically, to find and confirm real Common Vulnerabilities and Exposures (CVEs) that were directly introduced by AI-generated code in production software.

The distinction matters. Benchmarks tell you what a model is capable of doing wrong. CVE tracking tells you what actually went wrong, in real software, affecting real users.

How the Radar Works

The Vibe Security Radar monitors public vulnerability databases - CVE.org, the National Vulnerability Database (NVD), the GitHub Advisory Database (GHSA), Open Source Vulnerabilities (OSV), and RustSec. When a vulnerability is fixed, the team traces the fix back through the repository's commit history to identify the commit that originally introduced the flaw.

This is where the detective work begins. The team uses git blame and related analysis to identify the original commit, then examines it for metadata signatures associated with AI-assisted coding tools. These signatures come in various forms: co-author trailers (like "Co-authored-by: GitHub Copilot"), bot email addresses, commit message markers, or other traces that roughly 50 different AI coding tools leave behind. Tools in scope include Claude Code, GitHub Copilot, Cursor, Devin, Windsurf, Aider, Amazon Q, Google Jules, and dozens more.

Once a potential AI-linked commit is identified, the team applies an AI-assisted analysis step to verify whether the AI-generated code was a significant contributor to the vulnerability's root cause, filtering out cases where the AI commit was incidental to the flaw.

The Numbers

By late March 2026, the Vibe Security Radar had confirmed 74 CVEs directly introduced by AI-generated code. That number alone would be notable - 74 confirmed vulnerabilities with a documented causal chain from AI-generated commit to real-world CVE. But the trajectory is what makes the data alarming.

In January 2026, the radar identified 6 new AI-linked CVEs. In February, 15. In March, 35. That is not linear growth. That is an acceleration curve, and it maps to the broader adoption curve of AI coding tools in production workflows. As more code is AI-generated, more vulnerabilities in that code are being discovered - and the discovery rate is increasing faster than the adoption rate, suggesting that early AI-generated code is now reaching the age where its vulnerabilities are being found through routine use, security audits, and adversarial testing.

The Undercount Problem

If 74 confirmed CVEs were the whole story, the situation would be concerning but perhaps manageable. Hanqing Zhao has been clear that it is not the whole story.

The Vibe Security Radar can only identify AI-linked vulnerabilities when the original commit carries identifiable metadata. Not all AI-assisted commits do. Developers can strip co-author tags. Some AI tools don't add metadata by default. Some developers rewrite commit histories, squash commits, or use workflows that obscure the origin of individual code contributions. When the metadata isn't there, the radar can't trace the vulnerability back to an AI tool, even if the code was entirely AI-generated.

Zhao estimates that the true number of AI-linked vulnerabilities in the open-source ecosystem is five to ten times higher than the radar's confirmed count. If that estimate is even roughly accurate, the actual number of CVEs introduced by AI-generated code through March 2026 is somewhere in the range of 370 to 740 - and climbing exponentially.

The team is working on detection methods that don't rely on explicit metadata. These would identify AI-generated code based on recognizable patterns - coding styles, structural signatures, and characteristics that distinguish AI-authored commits from human-authored ones - without needing a co-author tag or bot email to trace. If successful, this would substantially expand the radar's detection capability and provide a more accurate picture of AI-generated code's security footprint.

What the CVEs Look Like

The SSLab research doesn't just count vulnerabilities - it categorizes them. The types of flaws being introduced by AI-generated code map closely to the OWASP Top 10 and other well-known vulnerability taxonomies: injection attacks, broken authentication, improper input validation, missing access controls, hardcoded credentials, and insecure default configurations.

These are not exotic, novel vulnerability classes. They are the same categories of bugs that the security industry has been documenting, teaching, and building tools to detect for decades. The AI coding tools have apparently not absorbed these lessons. They generate code that looks functional and well-structured but reproduces security anti-patterns that human developers have been trained to avoid - or at least to recognize - since the early 2000s.

This is a particular problem with "vibe coding" workflows, where developers generate entire applications or large code sections from natural language prompts and deploy them with minimal review. The code works. It passes basic functional tests. It does what the prompt asked for. It also ships with the kind of security flaws that a human code reviewer would flag in the first pass - but vibe coding's entire value proposition is speed, and speed often means skipping the review.

The Ecosystem Implication

The Vibe Security Radar is tracking vulnerabilities in open-source software. This matters because open-source libraries and frameworks are the foundation of modern software development. A vulnerability introduced by AI-generated code in a popular open-source library doesn't just affect the developer who used the AI tool - it affects every application that depends on that library.

This creates a supply chain problem. When an AI coding tool introduces a vulnerability in an upstream library, every downstream consumer inherits that vulnerability. The developer who originally used the AI tool may not even know the code was insecure. The downstream consumers almost certainly have no visibility into whether the library code they depend on was human-written or AI-generated. The supply chain doesn't distinguish.

Previous Vibe Graveyard entries have documented individual incidents - a vibe-coded app with exposed API keys, a platform shipping without authentication, a dating app leaking user data. The Georgia Tech research places these individual failures in a systemic context. The 74 confirmed CVEs are not isolated incidents. They are data points on a curve, and the curve is pointing up.

From Anecdotes to Data

The value of the Vibe Security Radar is that it transforms the AI code security conversation from anecdotes to empirical data. Individual stories about vibe-coded apps failing make for compelling reading, but they can always be dismissed as edge cases. A systematic tracker that shows 6, then 15, then 35 CVEs per month - with a methodology that can be examined, replicated, and challenged - is harder to wave away.

The security industry's standard response to new risk categories includes denial ("it's not real"), minimization ("it's not that bad"), and normalization ("it's just like any other bug"). The Vibe Security Radar data is useful precisely because it short-circuits all three: the CVEs are real (they're in public databases), the trend is bad (it's accelerating), and the undercount estimate suggests the problem is significantly larger than what's currently visible.

Zhao and the SSLab team are not arguing that AI coding tools should be banned or abandoned. They are arguing that the security implications of AI-generated code need to be measured, tracked, and addressed with the same rigor applied to any other systemic security risk. The Vibe Security Radar is an attempt to provide that measurement.

The initial readings are not encouraging.

Discussion