GitClear study finds AI coding assistants are pushing codebases toward copy-paste debt

Copy-paste got easier

GitClear's 2025 AI Copilot Code Quality report covers a quieter failure mode than one chatbot saying something stupid in public: codebases swelling with easy code while the slower work of consolidation and reuse falls behind.

The report analyzed 211 million changed lines of code authored between 2020 and 2024. The dataset combined anonymized commercial repositories with major open-source projects, and GitClear classified changes into operations such as added, deleted, updated, moved, copy/pasted, find/replaced, and churned lines. That is a deeply unglamorous measurement scheme, which is exactly why it is useful. Maintainability failures rarely announce themselves with a siren. They arrive as a few more copied functions, a few fewer refactors, and one more reviewer thinking, "I have seen this helper before, but where?"

The headline pattern is bad. Moved code, which GitClear treats as a signal of refactoring and reuse, fell from 24.8% of changed lines in 2021 to 9.5% in 2024. Copy/pasted code rose from 8.4% to 12.3% over the same period. In 2024, copy/pasted code exceeded moved code for the first time in the five-year dataset.

The duplicate-block data is worse. After improving its duplicate detection, GitClear reported an eightfold increase during 2024 in commits containing code blocks with five or more duplicated lines. LeadDev's coverage put the practical concern plainly: duplicated code is not only untidy. It multiplies the number of places a future fix must be applied, and cloned blocks have a long research history of dragging bugs along with them.

Churn is the bill

GitClear also tracked churn: code that gets authored and then revised shortly afterward. All-line churn rose from 3.3% in 2021 to 5.7% in 2024. For newly added lines, GitClear found that the percentage revised within a month increased by about 20% to 25% compared with the 2021 baseline.

That is not automatically damning. Sometimes revising recent code means a team is polishing work quickly. But GitClear's broader pattern points the other way: more new lines, less moved code, more copy/paste, more duplicate blocks, and more recent revision. That is the profile of code that was easy to generate and harder to keep.

This matters because software maintenance is a coordination problem, not a typing problem. If a team has one canonical implementation of billing retry logic, a future fix lands once. If the same logic was copied into four services with slightly different names, the fix becomes archaeology. Someone has to know all four copies exist, understand which differences matter, update each copy, test each path, and hope no fifth copy is lurking in a file nobody has opened since the last reorg.

AI coding assistants are very good at making a local answer appear quickly. They are less naturally good at knowing which existing function should be reused across a repository, which abstraction carries business meaning, or which older helper is quietly deprecated because the team learned something painful in 2022. The model can see context. It does not own the codebase.

The report's timing matters too. GitClear measured these changes during the first broad wave of AI coding assistant adoption, when many teams were still treating generated code as normal developer output that happened to arrive faster. That means the surrounding process often did not change much. Same pull request templates, reviewer capacity, test suites, and delivery pressure. More generated code entered a pipeline whose quality gates had been designed around humans writing and revising at human speed.

What the study can and cannot prove

GitClear's report should not be read as line-level proof that every duplicated block was written by an AI assistant. The study measures code-change trends during the period when AI coding assistants spread through the industry. It also uses GitClear's own classification system, and GitClear sells tools for measuring engineering work. That does not invalidate the data, but it does mean the claims need the normal vendor-report caution label.

The caution label does not make the trend comforting. DevClass and LeadDev both covered the same findings because the numbers match what many developers have been feeling in code review: AI makes it trivial to produce code that looks complete, even when the healthier move would be deleting code, moving code, or reusing code that already exists.

This is the part of "AI productivity" that activity dashboards usually miss. Lines added are easy to count. Tickets closed are easy to celebrate. The future cost of maintaining six near-identical implementations is harder to express in a weekly leadership meeting, especially when the duplicated code passes tests today.

Why it belongs here

Vibe Graveyard includes studies that document systemic AI failure patterns. GitClear's report fits because it quantifies a concrete software-engineering failure mode: AI-assisted development appears to be shifting codebases toward less reuse and more duplication.

The harm is not one catastrophic outage but accumulating maintenance debt across long-lived repositories. That debt eventually becomes slower onboarding, harder reviews, repeated bugs, more fragile changes, and a growing fear that touching one function means finding the other six copies before the customer finds the bug first.

The old advice was "don't repeat yourself." The new workflow says "press tab and repeat yourself at machine speed." That is not progress. That is a photocopier with a pull request button.

Vibe Graveyard

GitClear study finds AI coding assistants are pushing codebases toward copy-paste debt

Incident Details

Tech Stack

References

Copy-paste got easier

Churn is the bill

What the study can and cannot prove

Why it belongs here

Discussion