California's failed bar exam included AI-drafted questions

Tombstone icon

The State Bar of California disclosed in April 2025 that 23 scored multiple-choice questions on its already troubled February bar exam were developed with AI assistance by its psychometric vendor, ACS Ventures. Test-takers had already reported crashes, lag, copy-paste failures, and lost answers. Then the bar admitted that some questions in this licensing exam for future lawyers had been drafted with AI, reviewed by the same outside vendor, and used anyway. The bar asked the California Supreme Court for score relief, while legal academics described the admission as staggering.

Incident Details

Severity:Catastrophic
Company:State Bar of California
Perpetrator:Public agency
Incident Date:
Blast Radius:Thousands of California bar applicants affected; score adjustments sought; confidence in the licensing exam damaged; millions in follow-on costs and vendor fallout
Advertisement

A Licensing Exam Ran Like a Beta Test

The February 2025 California bar exam was already in trouble before anyone said the letters "AI" out loud. Applicants reported crashes, sluggish screens, copy-paste failures, saved answers disappearing, and proctoring problems that interrupted the exam itself. This was not a classroom quiz or an internal training exercise. It was the licensing exam that determines who gets to practice law in the largest state bar in the country.

Then the State Bar of California made the situation worse. On April 21, 2025, the Committee of Bar Examiners announced that it would recommend scoring adjustments to address the exam's broader failures. Buried in the same release was the detail that a subset of the scored multiple-choice questions had been developed with the assistance of AI by ACS Ventures, the bar's independent psychometrician.

According to the State Bar's own breakdown, 100 of the 171 scored multiple-choice questions came from Kaplan, 48 came from the First-Year Law Students' Exam, and 23 came from ACS. Those 23 ACS questions were developed with AI assistance and then reviewed before being used on the real exam. That is a small share of the test, but it is not a trivial one. When a licensing exam goes sideways, "only 23 scored questions" is not a reassuring sentence.

Why This Landed So Badly

Bar exams are not supposed to be improvisational. They are expensive, high-stakes instruments that are supposed to be dull in the most professional sense of the word. Candidates need the exam to be technically stable, legally accurate, and procedurally defensible. The California bar managed to raise doubts on all three fronts at once.

The technical failures were visible immediately. The AI question issue added a second layer of mistrust because it suggested the exam's content pipeline had been treated with less caution than the stakes required. Legal educators quoted by the AP and the Los Angeles Times reacted with a mix of disbelief and anger. Their criticism was not a generic anti-AI objection. It was aimed at where the tool had been used, how little had been disclosed ahead of time, and how the oversight structure appears to have been arranged.

One of the most damaging details in the reporting was the role overlap. ACS Ventures helped draft the AI-assisted questions and also sat inside the validation process for those same questions. Even if every review was conducted in good faith, that setup looks bad. High-stakes testing depends on independent checks. Letting one vendor help produce exam items and then bless the result is the sort of governance shortcut that would be embarrassing in a corporate certification test. In a professional licensure exam, it is harder to defend.

The State Bar's Defense

The bar did not concede that the AI-assisted questions were inherently invalid. In its April 21 release, Executive Director Leah Wilson said the State Bar had confidence in the validity of the multiple-choice questions and in their ability to assess minimum competence. The release also stressed that the combined scored multiple-choice section exceeded the psychometric reliability target.

That response addressed only part of the problem. Reliability is not the same thing as trust, and it is not the same thing as legitimacy. A licensing body can tell the public that the statistical properties of the test still look acceptable. That does not erase the optics of using AI-assisted drafting for real exam questions in a rollout already marred by broad technical breakdowns. It also does not answer why the process was disclosed only after candidates had already sat for the exam and reported major problems.

The Los Angeles Times reported that California Supreme Court staff learned only that week that AI had been used in the drafting process. If accurate, that made the episode more than a public relations problem. It suggested the court overseeing the profession had not been given a full picture of how the new exam was assembled. That sort of omission has a way of turning an operational mess into an institutional one.

What the Incident Says About AI in High-Stakes Settings

Plenty of people in legal tech would argue that AI can help with test development. That is not an absurd position. AI can generate draft language, suggest distractors, and speed up rote drafting work. The flaw in California's rollout was not merely that AI touched the process. The flaw was that the bar appears to have used AI as part of a larger system that was already under severe operational strain, then expected examinees to trust the output because someone somewhere said it had been reviewed.

Human review is often presented as the universal fix for AI-generated output. In practice, that promise is only as strong as the time, expertise, incentives, and independence of the people doing the review. Here, the same exam administration that could not deliver a smooth testing experience also wanted candidates to trust that its AI-assisted content controls were rigorous enough for a licensing gatekeeper. That is a hard sell.

The issue is sharper in law than in many other fields because bar exams trade on procedural legitimacy. Every applicant is being judged by a process that is supposed to be conservative, repeatable, and insulated from shortcuts. If the State Bar wants the profession to take its testing regime seriously, it has to behave like an institution that understands why process matters. Outsourcing part of question drafting to AI and then disclosing it after the fact moved in the opposite direction.

The Cost of Doing This Cheaply

California did not just create a controversy. It created downstream work. The bar sought score adjustments. The Supreme Court got pulled deeper into the fallout. Legislators and educators started pressing for answers. Later reporting tied the February exam mess to millions in unplanned costs and contract consequences. Whatever efficiency or convenience the State Bar thought it was getting from its new setup evaporated quickly.

That is what makes this a good Vibe Graveyard entry. The central failure was not a theoretical concern about AI in education. It was a concrete case in which a public body introduced AI into a professional licensing process, failed to disclose it cleanly, wrapped it inside a defective exam rollout, and then had to ask for remedial scoring after candidates were already harmed. The technology did not act alone, but it did not need to. A brittle institution plus AI-assisted shortcuts was enough.

The California bar exam is supposed to screen for the judgment and competence required to practice law. In February 2025, the State Bar ended up raising questions about its own.

Discussion