SIG measured 400 billion lines of code and found AI mostly amplifies the mess you already had

Tombstone icon

On June 9, 2026, the Software Improvement Group published its State of Software 2026 report, drawing on a benchmark of more than 30,000 systems and over 400 billion lines of code. The headline number is familiar: AI-generated code carries roughly double the security-risk violations of human-written code, and more than half of it contained vulnerabilities. The more interesting finding is the thesis underneath it. AI does not fix or break software discipline on its own; it amplifies whatever is already there. Well-managed codebases get faster, badly-managed ones accumulate debt and security exposure faster, and the gap widens as systems grow. AI-generated code is still only about 1.9% of enterprise production code, and even at that share the cracks are measurable.

Incident Details

Severity:Facepalm
Company:Industry-wide (Software Improvement Group benchmark)
Perpetrator:Engineering leadership
Incident Date:
Blast Radius:Industry-wide benchmark of 30,000+ systems and 400B+ lines of code; AI-generated code showed roughly twice the security-risk violations of human-written code, more than half contained vulnerabilities, and maintainability degraded as systems grew

By mid-2026 the "AI writes insecure code" study has become its own genre, and you would be forgiven for skimming past another one. CodeRabbit found AI-generated code carried multiples more security flaws. Veracode found roughly 45% of AI output failed basic OWASP checks. GitGuardian found AI-assisted commits leak secrets at double the baseline rate. So when the Software Improvement Group published its State of Software 2026 report on June 9, 2026, with the now-ritual finding that AI-generated code carries about twice the security-risk violations of human-written code, the temptation is to file it next to the others and move on.

Do not. The number is familiar, but SIG's report is doing something the others mostly do not, and the difference is worth a few minutes.

The scale is the first thing that sets it apart

Most AI-code studies run a fixed battery of prompts at a set of models and grade the output. Useful, but synthetic. SIG instead leaned on its existing benchmark of real production software: more than 30,000 systems and over 400 billion lines of code analyzed over the past year. This is not a lab exercise asking a model to build a toy app. It is a measurement of code that real organizations are actually running, set against years of prior benchmark data on how human-written systems behave.

Against that backdrop, the security finding lands harder. AI-generated code showed roughly double the security-risk violations of human-written code, and according to the reporting, more than half of the AI-generated code contained vulnerabilities. SIG also found maintainability degraded for AI code, and, importantly, that the gap widens as codebases grow. The bigger and more business-critical the system, the worse the relative penalty.

The number that should reframe the panic

Here is the statistic that quietly undercuts a lot of breathless commentary in both directions: AI-generated code currently accounts for only about 1.9% of enterprise production code in SIG's benchmark.

That cuts two ways, and SIG is honest about both. On one hand, the "AI is writing most of our software now" narrative is, at least in measured production code, not yet true. The flashy surveys where executives claim AI generates 60% or 70% of their codebase are describing drafts, assists, and aspiration, not what actually survives into production. On the other hand, if a sliver under 2% of production code is already showing double the security risk, the exposure becomes material precisely as organizations scale the tools up. We are looking at the early, small version of the problem, not the mature one.

The real thesis: AI is an amplifier

What genuinely separates this report from the pile is its central argument, and it is one the other studies mostly do not make. SIG's framing is that AI does not fix or break software discipline on its own. It amplifies what is already there. Where code quality and architecture are measured and managed, AI accelerates delivery and the productivity gains are real. Where they are not, AI accelerates technical debt, cost, and security exposure just as fast.

In other words, AI is not a great equalizer that drags weak engineering orgs up to competence. It is a multiplier on whatever discipline you already have. A team with strong architecture, real test coverage, and tight review gets a genuine speed boost. A team with sprawling untested code and no governance gets to generate that same kind of code faster than ever, and then gets to maintain it. SIG's CEO Luc Brandts put the warning plainly: "you cannot manage what you cannot measure, and you cannot move fast for long on a foundation you do not understand. When generation outruns governance, technical debt accumulates faster, security exposure widens, and the systems a business depends on become harder to change."

That reframing matters because it tells you the security finding is not really about the model. It is about the conditions you drop the model into. The same tool produces different outcomes depending on whether the surrounding engineering culture catches its mistakes.

The rest of the picture is grim in a useful way

The report's broader benchmark data explains why "amplify" is so dangerous for most organizations: the foundation is already shaky. SIG found that 71% of code has a low degree of security controls, and that the average-sized system contains around 20 critical security findings. Security risk scales with system size, with the largest and most business-critical systems scoring worst. The one piece of good news is correlational and actionable: systems with lower code-level technical debt showed up to 72% stronger security compliance. Clean foundations are measurably safer foundations.

There is also a productivity-paradox thread that pairs neatly with the customer-service rollback studies elsewhere in this graveyard. SIG describes developers generating excess code to satisfy AI-driven productivity metrics, then spending more time, and more tokens, correcting and refining that output later. The dashboard says you are shipping more. The maintenance burden quietly says you are shipping more of a problem.

How it differs from the studies already on the site

To be fair to the reader who has seen the others: yes, the top-line "2x security risk" overlaps with prior research, and SIG is a commercial software-quality firm with an obvious interest in selling measurement and governance. Discount accordingly. But the distinct contributions are real. The benchmark is enormous and grounded in production rather than prompts. The 1.9% production-share number is a useful corrective to survey hype. The "maintainability degrades as systems grow" finding is specific and measurable. And the amplifier thesis is a sharper explanation than "AI bad at security," because it predicts who gets hurt: the organizations that were already not measuring or managing quality.

The lesson

The cheap reading is "AI writes insecure code, again." The accurate reading is that AI hands every engineering organization a faster version of itself. If your existing discipline catches bad code, AI lets you go faster safely. If it does not, AI lets you accumulate security debt at a rate your old, slower workflow could never have managed, on top of a foundation where 71% of code already has weak security controls. The tool is not the variable that decides the outcome. The governance you had before you turned it on is. That is the part no model release is going to fix for you.

Discussion