Deloitte to refund Australian government after AI-generated report

Tombstone icon

Deloitte Australia agreed to partially refund a $440,000 contract after admitting its welfare compliance review for the Department of Employment and Workplace Relations contained fabricated academic citations and a fictitious judicial quote generated by Azure OpenAI GPT-4o. University of Sydney researcher Christopher Rudge found the revised report introduced even more hallucinated references than the original.

Incident Details

Severity:Facepalm
Company:Australian Government
Perpetrator:Consultant
Incident Date:
Blast Radius:Refund issued; public-sector trust and procurement review; reputational harm.

In 2024, Australia's Department of Employment and Workplace Relations (DEWR) hired Deloitte to produce the "Targeted Compliance Framework Assurance Review," a report examining the government's automated penalty system for welfare recipients. The system fines job seekers who miss appointments or fail to report income, and DEWR wanted an assessment of whether the underlying IT infrastructure was legally compliant, aligned with legislation, and functioning correctly. The contract was worth approximately $440,000 AUD.

Deloitte delivered the report in July 2025. DEWR published it in August. Within weeks, Dr. Christopher Rudge - deputy director of health law at the University of Sydney - noticed something odd about the citations.

Academic References to Papers That Don't Exist

Rudge identified fabricated academic sources throughout the report. Citations pointed to works attributed to Professor Lisa Burton Crawford of the University of Sydney, who confirmed she had never authored the papers in question. Another reference cited a fictional report attributed to Professor Bjorn Regnell from Lund University in Sweden. The initial version contained at least three nonexistent academic sources.

In addition to the phantom scholarship, the report included a fabricated quote attributed to Federal Justice Jennifer Davies. Deloitte had also misspelled the judge's name as "Davis" - the kind of double failure (inventing a quote and misspelling the name of the person you're falsely quoting) that doesn't instill confidence.

In late August 2025, the Australian Financial Review published Rudge's findings. The report hadn't been in circulation long, but the errors were already drawing scrutiny from both academics and politicians.

The Fix That Made Things Worse

DEWR and Deloitte released an updated version of the report on October 3, 2025. DEWR's website described it as including "a small number of corrections to references and footnotes." That framing significantly understated the situation.

Rudge examined the revision and found that instead of simply replacing the fabricated references with legitimate ones, Deloitte had introduced even more. "Instead of just substituting one hallucinated fake reference for a new 'real' reference, they've substituted the fake hallucinated references, and in the new version, there's like five, six, or seven or eight in their place," Rudge told the Australian Financial Review.

His assessment was direct: "It's highly likely they asked GPT to fabricate a justification, and it did."

The revision did remove the fabricated judicial quote from Federal Justice Davies. And on page 58 of the updated report, buried in the methodology section, was a disclosure that hadn't appeared in the original: Deloitte's technical team had used "a generative AI large language model (Azure OpenAI GPT-4o) based tool chain" during its analysis. The AI tool was licensed by DEWR and hosted on the department's own Azure infrastructure.

Deloitte stated the AI had been used to address "gaps in traceability and documentation." The firm insisted the AI use did not influence the report's core findings or recommendations.

The $440,000 Question

Deloitte agreed to refund the final payment on the contract. The exact amount was not publicly disclosed - it was described as a partial refund - and a Deloitte spokesperson confirmed the matter had been "resolved directly with the client." The company did not issue a public statement beyond that.

For a firm that generated $107 billion globally in 2024 and had secured nearly $25 million in DEWR contracts alone since 2021, the financial cost was negligible. The reputational math was different.

The Political Fallout

Senator David Pocock told the Australian Broadcasting Corporation that Deloitte "misused AI and used it very inappropriately: misquoted a judge, used references that are non-existent." Labor Senator Deborah O'Neill, who sits on a Senate committee investigating the integrity of consulting firms, was more blunt: "Deloitte has an issue with human intelligence."

Rudge argued the damage went beyond a few bad citations. "When the foundation of a report is built on a flawed, undisclosed, and non-expert methodology, its recommendations cannot be trusted," he said. The report was supposed to evaluate whether an automated system that imposes financial penalties on welfare recipients was functioning correctly and legally. If the analysis underpinning those conclusions was partly generated by an AI that fabricated its own supporting evidence, the conclusions themselves become suspect.

The Disclosure Problem

The absence of AI use disclosure in the original report is the detail that separates this from a simple quality control failure. Deloitte didn't tell DEWR or the public that it had used generative AI as part of its analytical process. The disclosure appeared only in the revised version, after Rudge had already flagged the hallucinated citations and the Australian Financial Review had reported on them.

This matters because the report was a government procurement deliverable. DEWR paid for expert analysis of a critical welfare infrastructure system. If Deloitte had disclosed its AI use from the start, DEWR could have applied appropriate scrutiny or stipulated human verification requirements. Instead, the department published a report full of fabricated references under the Deloitte brand, and both DEWR and the public proceeded as though the citations had been verified by qualified consultants.

Deloitte described the AI as addressing "gaps in traceability and documentation" - in other words, the consultants used an AI tool to fill in the parts of the analysis where they didn't already have answers. That's the exact scenario where AI hallucination risk is highest: generating content to fill knowledge gaps, without subject matter expertise to catch fabricated outputs.

The Timing

The incident landed awkwardly for Deloitte's broader AI strategy. In September 2025 - one month before the refund was reported - Deloitte announced a $3 billion investment in generative AI development through fiscal year 2030. The firm actively markets AI consulting services to enterprises and governments, positioning itself as a company that understands how to deploy these tools responsibly.

Selling AI expertise to clients while simultaneously failing to quality-check AI-generated content in your own government deliverables creates a credibility gap that no dollar amount can easily close. The $440,000 contract was small. The implication - that one of the world's largest consulting firms couldn't verify whether its own report's citations were real - was not.

Discussion