Anthropic agrees to $1.5B payout over pirated books

Tombstone icon

Anthropic accepted a $1.5 billion settlement with authors who said the Claude team scraped pirate e-book sites to train its chatbot. The deal pays roughly $3,000 per book across 500,000 works, heads off a December trial, and forces one of the richest AI startups to bankroll the writing community it previously treated as free training data.

Incident Details

Severity:Catastrophic
Company:Anthropic
Perpetrator:AI Vendor
Incident Date:
Blast Radius:Record copyright settlement drains cash, sets precedent for other AI labs, and fuels public distrust of Anthropic’s data practices.

The case is Bartz v. Anthropic, and it starts with pirate e-book libraries. Anthropic, the company behind the Claude chatbot, downloaded more than 7 million digitized books from "shadow libraries" - pirated book databases - to train its large language models. The primary sources were LibGen and a smaller collection called PiLiMi. Anthropic started with roughly 200,000 books from a dataset called Books3, which had been assembled by AI researchers outside of OpenAI to match the scale of training data used for ChatGPT. From there, the company expanded to millions more.

Senior U.S. District Judge William Alsup, overseeing the case in the Northern District of California, found in his June 2025 summary judgment ruling that Anthropic had downloaded more than 7 million books and "knew they had been pirated." That's not an allegation. That's a judicial finding.

The Split Ruling

Judge Alsup's June ruling was a split decision that gave both sides something to claim as a victory, though one side had much more to celebrate than the other.

On the question of whether training AI models on copyrighted books constitutes copyright infringement, Alsup sided with Anthropic. He found that using copyrighted works to train AI systems that produce their own text qualified as "fair use" under U.S. copyright law because it was "quintessentially transformative." The AI industry largely praised this part of the ruling, since it established - at least in the Northern District of California - that AI companies can train on copyrighted material as long as they obtain copies legally.

On the question of how Anthropic obtained the books, Alsup reached a different conclusion. Downloading millions of pirated copies from shadow libraries is not legal acquisition. Anthropic could argue all day that its AI training process was transformative. It could not argue that downloading from pirate sites was legitimate sourcing. The judge ruled that the authors could bring Anthropic to trial in a class action specifically over the piracy - the method of acquisition, not the training itself.

In July 2025, the court certified a class of all authors whose books Anthropic had downloaded from the pirated databases. A trial was set for December 1, 2025.

The $1.5 Billion Settlement

On September 5, 2025, Anthropic agreed to pay $1.5 billion to settle the class action. The settlement fund was structured at approximately $3,000 per book across an estimated 500,000 works, with the total potentially increasing if more covered works were identified.

Justin Nelson of Susman Godfrey LLP, co-lead plaintiffs' counsel, said the settlement "sends a powerful message to AI companies and creators alike that taking copyrighted works from these pirate websites is wrong." The Authors Guild, which supported the litigation, framed the deal as a benchmark that could pressure other AI companies facing similar copyright claims - including OpenAI, Meta, and Midjourney - to negotiate their own settlements.

Anthropic did not admit liability as part of the deal. Class members who submitted timely, valid claim forms would receive a pro rata share of the fund for each infringed work they owned. The actual payout per author would depend on the number of valid claims filed and whether multiple class members claimed the same work.

The $1.5 billion was a meaningful sum even for Anthropic. The company is one of the best-funded AI startups in the world, with billions in venture capital backing, but $1.5 billion is real money. It's also the kind of number that gets the attention of other AI companies calculating the cost-benefit analysis of training on pirated versus licensed content.

The Judge Was Not Satisfied

The settlement didn't go smoothly. Judge Alsup, reviewing the proposed deal, expressed significant concerns about the terms. He said he was "disappointed" that the parties had left "important questions" for the future, including a definitive list of works covered by the settlement and the processes for notifying potential class members.

Alsup was particularly critical of the settlement's structure. Bloomberg Law reported that the judge was concerned class lawyers were "striking a deal behind the scenes that will be forced down the throat of authors." He admonished class counsel for enlisting an "army" of attorneys to work on the settlement disbursement, including lawyers from the Authors Guild and the Association of American Publishers.

The Verge reported that Alsup rejected the settlement, putting the $1.5 billion deal on hold. The final approval hearing was postponed to April 2026, and the judge ordered the parties to address his concerns about the deal's workability before proceeding.

The rejection didn't kill the settlement entirely, but it created uncertainty. A judge who calls a deal "nowhere close to done" is signaling that the current terms need substantial revision. Whether the $1.5 billion figure holds, increases, or gets restructured depends on what happens during the reworking process.

Why Piracy Was the Linchpin

The Anthropic case is distinct from other AI copyright lawsuits because the liability didn't hinge on whether AI training itself is fair use. Alsup already ruled that it is (at least in his court). The liability came from Anthropic's decision to source its training data from pirated databases rather than purchasing or licensing the books.

This distinction creates a strange legal landscape. Under Alsup's ruling, an AI company that buys 7 million books through legitimate channels and uses them for training is protected by fair use. An AI company that downloads 7 million of the same books from a pirate site and uses them for exactly the same training purpose is liable for copyright infringement. The training is identical; the acquistion method determines legality.

For Anthropic, the practical problem was one of scale and cost. Buying or licensing 7 million books through publishers would have been enormously expensive and logistically complex. Shadow libraries offered the same content for free. The savings were real; so were the legal consequences.

The Precedent Question

The $3,000-per-book figure from the settlement attracted attention from attorneys on both sides of AI copyright disputes. If the settlement holds (and survives judicial revision), it establishes a quantitative benchmark for what it costs an AI company to use a pirated book in training data.

Other AI companies face active lawsuits over similar conduct. OpenAI, Meta, and Midjourney all have pending copyright litigation involving training data sourced from books and other copyrighted works. The Anthropic settlement, even in its contested form, gives plaintiffs' attorneys in those cases a dollar-per-work reference point to anchor their negotiations.

For the AI industry, the Alsup fair use ruling was reassuring. For the same industry, the $1.5 billion settlement over piracy was a reminder that the legal protections only extend as far as the legitimacy of the content pipeline. Train on legally acquired material and you're probably fine. Train on pirated material and your fair use argument is irrelevant to the piracy claim.

Anthropic built Claude on stolen books. The bill arrived at $1.5 billion, and the judge didn't think the payment plan was ready.

Discussion