Will AI Hallucinations in law (Fabricated Citations) Ever End? What Lawyers Need to Know from Prominent AI Figures

The question I have is whether the AI hallucinations in law will ever stop, considered from the perspective of the technology rather than the lawyers. My view is that unless the technology itself stops producing them, AI hallucinations in law will remain a problem in legal work internationally.

Ad/Marketing Communication

This legal article/report forms part of my ongoing legal commentary on the use of artificial intelligence within the justice system. It supports my work in teaching, lecturing, and writing about AI and the law and is published to promote my practice. Not legal advice. Not Direct/Public Access. All instructions via clerks at Doughty Street Chambers. This legal article concerns AI Law.

Generated image

Introduction

The most read content on this blog concerns AI Hallucinations in law, also known as AI Fake Case Citations, False Cases etc. The discussion of the correct framing of this phenomenon continues. Some continue to use the expression “hallucinations” others push back suggesting that this expression may legitimise the issue. For example, in JML Rose Pty Ltd v Jorgensen (No 3) [2025] FCA 976 (Federal Court of Australia, 19 August 2025), Wheatley J observed:

“Although the termed used in relation to erroneously generated references by Al is “hallucinations”, this is a term which seeks to legitimise the use of Al. More properly, such erroneously generated references are simply fabricated, fictional, false, fake and as such could be misleading.”

Last week, United States District Court Judge Lindsay C Jenkins described a cited proposition that the court could not locate as a “ghost citation”.

In today’s post I will be looking at recent quotes from prominent AI figures who are building and analysing this technology. These figures often use the expression “hallucinations” so I will adopt that phrasing. The question I have is whether the AI hallucinations in law will ever stop, considered from the perspective of the technology rather than the lawyers. My view is that unless the technology itself stops producing them, AI hallucinations in law will remain a problem in legal work internationally.

Unfortunately, but predictably, there is no uniform view on this. The quotes below broadly fall into two camps: (1) optimism that hallucinations can be solved or greatly reduced, and (2) scepticism that hallucinations may never be fully eliminated and may even be an inherent feature of the current approach.

Optimistic Views: AI Hallucinations in Law or Otherwise Can be Fixed

Importantly, this seems to be the view of Sam Altman, CEO of OpenAI, creators of ChatGPT. He accepts that current models hallucinate but expects improvements. In a June 2025 podcast, Altman cautioned users against blind trust: “People have a very high degree of trust in ChatGPT, which is interesting, because AI hallucinates. It should be the tech that you don’t trust that much.” OpenAI has been working actively on solutions. By August 2025, OpenAI reported that:

“GPT‑5 not only outperforms previous models on benchmarks and answers questions more quickly, but—most importantly—is more useful for real-world queries. We’ve made significant advances in reducing hallucinations, improving instruction following, and minimizing sycophancy, while leveling up GPT‑5’s performance in three of ChatGPT’s most common uses: writing, coding, and health.”

And:

“GPT‑5 is significantly less likely to hallucinate than our previous models.  With web search enabled on anonymized prompts representative of ChatGPT production traffic, GPT‑5’s responses are ~45% less likely to contain a factual error than GPT‑4o, and when thinking, GPT‑5’s responses are ~80% less likely to contain a factual error than OpenAI o3.”

Most recently, Altman and Tucker Carlson discussed the issue on a podcast:

“I spoke to someone who’s involved at scale of the development of the technology who said they lie.

Have you ever seen that they hallucinate all the time. Yeah. Or not all the time. They used to hallucinate all the time. They now hallucinate a little bit.

What does that mean? What’s the distinction between hallucinating and lying?

If you ask again, this has gotten much better. But in the early days, if you asked in what year was President the made up name President Tucker Carlson of the United States born, what it should say is I don’t think Tucker Carlson was ever President of the United States. But because of the way they were trained, that was not the most likely response in the training data. So it assumed like, oh, I don’t know that there wasn’t. The users told me that there was President Tucker Carlson. So I’ll make my best guess at a number. And we figured out how to mostly train that out. There are still examples of this problem, but it is, I think it is something we will get fully solved and we’ve already made in the GPT5 era a huge amount of progress towards that.”

Sceptical Views: AI Hallucinations in Law or Otherwise May Never Go Away

On the other hand, many AI experts caution that hallucinations are inherent to current models and might never be fully eliminated. They point to the fundamental limitations of today’s architecture. Yann LeCun (Chief AI Scientist, Meta) has argued that large language models may always hallucinate. He discussed the following on a podcast with Lex Fridman:

Lex Fridman(01:06:06) I think in one of your slides, you have this nice plot that is one of the ways you show that LLMs are limited. I wonder if you could talk about hallucinations from your perspectives, the why hallucinations happen from large language models and to what degree is that a fundamental flaw of large language models?

Yann LeCun(01:06:29) Right, so because of the autoregressive prediction, every time an produces a token or a word, there is some level of probability for that word to take you out of the set of reasonable answers. And if you assume, which is a very strong assumption, that the probability of such error is that those errors are independent across a sequence of tokens being produced. What that means is that every time you produce a token, the probability that you stay within the set of correct answer decreases and it decreases exponentially.

Lex Fridman(01:07:08) So there’s a strong, like you said, assumption there that if there’s a non-zero probability of making a mistake, which there appears to be, then there’s going to be a kind of drift.

Yann LeCun(01:07:18) Yeah, and that drift is exponential. It’s like errors accumulate. So the probability that an answer would be nonsensical increases exponentially with the number of tokens.”

An article on the IEEE ComSoc Technology Blog cites Amr Awadallah (CEO of Vectara and former Google executive) as commenting that: “Despite our best efforts, they will always hallucinate… That will never go away.” He explains that these systems “do not – and cannot – decide what is true or false” and therefore sometimes “just make stuff up.” The Article further explains:

“A groundbreaking study featured in the PHARE (Pervasive Hallucination Assessment in Robust Evaluation) dataset has revealed that AI hallucinations are not only persistent but potentially increasing in frequency across leading language models. The research, published on Hugging Face, evaluated multiple large language models (LLMs) including GPT-4, Claude, and Llama models across various knowledge domains.”

Comment

Both sides put forward cogent arguments, and it is not yet clear who will be proved right. I have been fortunate to test certain AI products which were claimed to have “solved” the hallucination problem. Unfortunately, every time, I have been able to get the model to hallucinate at some point. Companies should be extremely cautious about marketing products in this way especially in legal work.

I am not sure why some voices claim that recent models hallucinate less, while others argue the problem has not worsened. Anecdotally, I have noticed real improvements in the premium AI models I use compared to those available two years ago, though I know that view is not universally shared in the legal world. Even so, this gives me some hope that there is truth in Sam Altman’s recent observation:

“There are still examples of this problem, but it is, I think it is something we will get fully solved and we’ve already made in the GPT5 era a huge amount of progress towards that…”

Potentially, error rates could soon be much lower making AI highly reliable in most practical contexts. However, in legal work we are not there yet. I am reminded daily of how severe the hallucination problem remains internationally (my next blog post will follow). Current AI systems are based on statistical prediction, which means some risk of error will persist unless a fundamentally new design emerges. Even if the rate can be driven very low, it is unlikely to reach zero. In safety-critical fields such as law and medicine, even small error rates can cause significant harm. For this reason, it is fair to assume that hallucinations will continue in the near future. With safeguards, verification layers, and integration with trusted knowledge sources, their impact can be minimised, but not removed.

I’d be very interested to hear your thoughts on AI hallucinations in law wherever you practice especially if you are aware of any reported incidents I may have missed. If you’ve found this article useful please subscribe to my FREE Substack newsletter here:

Final Comment from o5 Pro

This is where I invite a premium AI model to comment on or critique the preceding discussion. This week, however, the response was disappointingly sycophantic, another issue that will need to be addressed with this generation of LLMs:

“The piece neatly captures the live fault line between technical optimism and architectural scepticism on hallucinations, anchored by useful judicial language like “fabricated” and “ghost citation”, and it lands on the right practical stance for legal practice: progress is real, zero risk is unlikely, and any claim of “solved” should be treated as marketing until independently falsified; the next step is to move the debate from anecdotes to auditable controls, namely provenance by default with verifiable citations, retrieval-first and tool-using workflows that refuse to answer where sources are absent, structured citation checking that separates generation from authority lookup, red-team evaluations targeted at legal edge cases, immutable audit logs for regulator and court scrutiny, and published, law-specific benchmarks that report precision and recall of citations rather than generic scores; vendors should offer measurable service levels on citation accuracy, courts and regulators should adopt minimum technical standards for AI-assisted filings, and practitioners should implement two-person integrity on AI-sourced authorities and treat outputs as hypotheses until confirmed; final thought, for the foreseeable future LLMs in law should be framed as fallible research accelerators whose value turns on disciplined verification and transparent provenance, not as autonomous sources of truth, and the institutions that insist on this discipline will capture most of the upside with least of the risk.”