AI Evidence in Court: A Claude Output, an AI-Translation Explanation and Obtaining Expert Witness Prompts

"...With regard to this part of the evidence, I note that my assessment must be conducted based on how the average consumer would pronounce the marks. Thus, I find that the AI-derived evidence does not have any probative value for my assessment. I will not further refer to this part of the evidence."

Key Takeaway

This legal article examines three recent decisions concerning AI-derived material in legal proceedings: a UKIPO Hearing Officer’s treatment of a Claude output; a High Court judgment in which a witness attributed inaccurate evidence to an AI translation tool, but the judge rejected that explanation and a US federal discovery order concerning an expert’s AI prompts and queries. The US order has since been stayed pending determination of an objection. The article draws out practical lessons for lawyers, litigants in person and the justice system. By Matthew Lee.

Ad/Marketing Communication

This legal article/report forms part of my ongoing legal commentary on the use of artificial intelligence within the justice system. It supports my work in teaching, lecturing, and writing about AI and the law and is published to promote my practice. Not legal advice. Not Direct/Public Access. All instructions via clerks at Doughty Street Chambers. This legal article concerns AI evidence in court Subscribe to the AI Law Commentary here.

AI Evidence in Court

Publication date: 19 June 2026

Introduction

Before releasing some of the more detailed articles I am working on, I wanted to offer a brief commentary on three cases that indicate some of the AI-related evidential and procedural issues likely to arise across different areas of practice and jurisdictions. The first, from the UKIPO, concerns the weight attached to a Claude output. In the second, a witness attributed inaccurate evidence to an AI translation tool and the judge was not impressed with that explanation. The third concerns whether prompts and queries used in an expert’s AI-assisted document review may fall within US federal discovery.

Bumble Holding Limited v Fizz Social Corp

O/0441/26 FIZZ

The full facts of this decision can be read at the link above. For our purposes, the AI tool Claude was asked to compare the sounds of “FIZZ” and “BIZZ”. The Hearing Officer found that the resulting AI-derived evidence had no probative value for the assessment because the legally relevant question was how the average consumer would pronounce the marks:

“53. Mr Weaver also provides evidence consisting of the results from the artificial intelligence platform Claude after being asked to compare the sounds of “FIZZ” and “BIZZ”. The AI tool indicates that these words share three identical phonemes and only differ in their initial consonant and it concludes that the words are “highly similar in form”. Mr Weaver does not provide further clarification on this evidence. With regard to this part of the evidence, I note that my assessment must be conducted based on how the average consumer would pronounce the marks. Thus, I find that the AI-derived evidence does not have any probative value for my assessment. I will not further refer to this part of the evidence.”

Fibula Air Travel SRL v Just-Us Air S.R.L

[2026] EWHC 1270 (Comm)

Again, I will not spend long on the wider background, which can be read in the judgment linked above. The judgment arose on the return date of a post-judgment worldwide freezing order. The judgment records that, in evidence placed before the Court of Appeal, Fibula’s managing director stated that a “distress payment” had been made on 9 March 2026 and had exhausted the company’s remaining primary liquidity. Later evidence filed on Fibula’s behalf was that no such payment had been made and that making it would cause severe financial distress. The managing director attributed the discrepancy to an AI translation tool:

“9. In the context of the breakdown of trust with our former legal representatives, and the matters evidenced by the exchange of emails, the machine translation prepared under the pressure and severe time constraints of Fibula’s limited appeal application was absolutely not prepared in bad faith. As a result of that translation error, both my report and [the other witness’s] report inaccurately suggest that a payment had been made. That was entirely a translation mistake. When preparing the appeal application, we were working under significant time pressure and were required to prepare a number of documents within a very limited timeframe. I personally discussed the appeal with [redacted name] and the difficulties that Fibula could face if it were required to make an escrow payment. After obtaining her views on the matter, and given the urgency of the situation, I prepared witness statement in Turkish. I explained its contents to her, and she confirmed that it accurately reflected her position. I then used an artificial intelligence translation tool to translate the statement into English and share it with our solicitors. Regrettably, the error was not identified before the statement was filed with the Court. The issue was identified during a meeting with the solicitors. Furthermore, in Romanian, which is [the other witness’s] native language, the true intended meaning, namely the payment of the mandatory order made by the Court on 9 March, was correctly clarified and subsequently presented to Just Us’s representatives together with an apology.”

The judge did not accept that explanation. He found that it differed significantly from the earlier accounts and said that he could not realistically place any reliance on it.

Conservation Law Foundation, Inc. v. Shell Oil Company 

(3:21-cv-00933)

This case, from the US, gives food for thought. From CourtListener, link above, the docket entry at 970 dated 18 May 2026 states the following:

“ORDER granting 941 Motion to Compel Production of Reliance Materials. The defendants…have moved the Court for an order compelling the plaintiff…to produce materials on which its expert witness… relied upon in producing her expert witness report….”

The court directed the parties to meet and to seek to reduce the areas of dispute. The parties reported that they had resolved all issues identified in the letter briefs except for their dispute concerning the defendants’ request for the prompts of the expert “in conducting her AI analysis and outputs.

The court held a hearing on the remaining dispute on 14 May. It was argued that AI prompts used by an expert witness are not within the scope of discovery in this context. The court disagreed:

“…An expert witness’s methodology is fair ground for discovery, see, e.g., Macchia v. ADP, Inc., 711 F. Supp. 3d 162, 167-68 (E.D.N.Y. 2024), and under the facts of this case, the process by which [the expert]culled down the defendants’ document production into a subset to be worked with is an aspect of that methodology. “

It was then argued that the prompts fell within a Rule 29 discovery agreement between the parties, namely an agreement not to take discovery of each other’s expert notes, drafts or communications. Again the court disagreed:

“…It is true that Rule 29 permits parties to limit the scope of discovery by agreement, and that courts should enforce those agreements in appropriate cases. Conservation L. Found., Inc. v. Shell Oil Co., No. 3:21-cv-933, 2025 WL 842278, at *3 (D. Conn. Mar. 18, 2025). But before a court denies otherwise-relevant discovery based on a Rule 29 agreement, that agreement “must be quite clear.” Id. Here, CLF says that the parties had an agreement not to take discovery of each other’s “expert notes, drafts, or communications needed by, and made during, the report drafting process,” and at oral argument it contended that [expert] AI prompts qualified as “notes.” But this is not so obvious as to be “quite clear,” and accordingly it is not a sound basis for denying discovery that is otherwise within the scope of Rule 26(b).”

It was then argued that CLF had no additional responsive information to produce, and that it should therefore not be placed under any sort of order. CLF said that [expert] did not use “prompts”, but rather only applied “search terms”, and that it had produced all such search terms:

“…When a requesting party seeks information within the scope of Rule 26(b), and the responding party claims to have no such information, the Court is presented with the question of whether it should nevertheless place the latter under an order to produce. “Under ordinary circumstances, a party’s good faith averment that the items sought simply do not exist, or are not in his possession, custody, or control, should resolve the issue of failure of production since one cannot be required to produce the impossible[.]” Zervos v. S.S. Sam Houston, 79 F.R.D. 593 (S.D.N.Y. 1978) (quotation marks and citation omitted). “Yet this principle has been held not to apply when the requesting party has a strong reason for disbelieving the responding party’s claim to have made a complete production – in other words, a reason backed up by solid evidence rather than mere suspicion.” Lewis v. Doe, No. 3:19-cv-2015 (JCH) (TOF), 2021 WL 863473, at *5 (D. Conn. Mar. 8, 2021)…”

In this case, it was noted that the defendants had an evidence-backed reason for doubting CLF’s representation, because the expert’s assistant:

“…referenced “prompt[s]” in his declaration. (ECF No. 941-4, 8.) The Court therefore deems it appropriate to place CLF under an order of production. CLF is ordered to revise its responses to any Rule 33 interrogatories or Rule 34 requests for production that call for disclosure of any artificial intelligence prompts and/or queries used by [expert] or her team in the course of producing her expert witness report….”

Further:

“…If, after a diligent search, CLF determines that no additional responsive materials exist, it shall say so in its response, signed by the appropriate person under Rule 33 or 34. Rule 37(b) sanctions may then become available to the defendants if that representation is later revealed to be untrue….”

Two qualifications are important. First, the order did not find that CLF had withheld prompts that existed. CLF maintained that the expert had used search terms rather than prompts and that all such terms had already been produced. The order required a diligent search and revised discovery responses, with a signed confirmation if no additional responsive material existed. Secondly, the district court subsequently stayed the order pending determination of CLF’s Rule 72(a) objection.

Comment

I remain fascinated by the AI issues that continue to arise in litigation. These three cases provide a snapshot of how courts and tribunals are approaching AI-derived material, although each concerns a different evidential or procedural setting.

I will start with the use of generative AI to offer an opinion on an issue that the court or tribunal must determine. When I wrote about a similar issue some time ago in the context of expert evidence, two judicial observations about ChatGPT gave me pause.

In Mr D Rollo v Marstons Trading Ltd: 1600833/2022, the claimant sought to rely on a ChatGPT conversation as expert evidence concerning the respondent’s explanation of the time and difficulty involved in extracting data. The judge stated:

“…I would add that even if there had been, a record of a ChatGPT discussion would not in my judgment be evidence that could sensibly be described as expert evidence nor could it be deemed reliable…”

By contrast, the First-tier Tribunal took a more nuanced approach to weight in Oakley v Information Commissioner [2024] UKFTT 315 (GRC):

“Firstly, we must assess the weight that we give to the ChatGPT evidence. We place little weight upon that evidence because there is no evidence before us as to the sources the AI tool considers when finalising its response nor is the methodology used by the AI tool explained. If comparisons are drawn to expert evidence, an expert would be required to explain their expertise, the sources that they rely upon and the methodology that they applied before weight was given to such expert evidence. In the circumstances we give little weight to the ChatGPT evidence that searches should have been conducted in the form set out within that evidence.”

I anticipate that people will continue to deploy AI-generated opinions in an effort to influence the court on particular questions. There are significant difficulties with doing so. Generative AI outputs can be sensitive to the wording and assumptions of a prompt and some models may agree too readily with a user’s framing. They can also produce false or unsupported information. Those features help explain why a court or tribunal may treat an unsupported AI output with caution.

On the AI translation issue, I have not come across this point specifically before, although I have anticipated it being raised for some time. It is one of the reasons I am concerned about AI being used to draft witness statements for the courts and why it is so important that care is taken over translation before a statement of truth is signed.

Finally, I have spoken and written many times about obtaining AI prompts in the disclosure process. It can be pivotal in litigation. Some interesting discussions arose around this when I wrote about it in the family law context [link], but what might disclosure of this kind actually show?

In the expert realm generally, it may show how the expert truly arrived at their opinion. Which documents were put before the model and which were left out, how the questions were framed and whether the expert brought independent judgement to bear or, in substance, allowed the tool to do the sifting. Prompts of that kind may go to the heart of methodology. They may reveal whether the underlying material was narrowed in a balanced way or in a manner that quietly favoured one party, whether anything significant was missed and ultimately whether the opinion rests on sound foundations.

The US order is fact-specific and is currently stayed pending review, I will be following closely, but it is a useful warning that AI-assisted expert evidence may attract significant scrutiny and further contested applications.

If you have found it helpful, you may wish to subscribe to my AI law articles via the Substack below. Please do continue sending me decisions that you think should be tracked, whatever the jurisdiction. One of the clearest lessons from this work is that lawyers, judges and court users in different systems are often confronting closely related problems. Sharing decisions, experiences and reflections across practice areas and across jurisdictions remains one of the most constructive ways to respond.

Final Words from Popular LLMs

This is the point at which I invite an LLM to provide an independent comment on the preceding article. At times the observations are slightly off the mark, at other times they are perceptive, and occasionally they produce something entirely unexpected. Whatever the outcome, I find it a useful and revealing exercise to see what emerges.

ChatGPT (OpenAI)

“These decisions should not be collapsed into a single rule about ‘AI evidence’. They concern three different uses of AI: as the source of a purported opinion, as a tool said to have been used in preparing witness evidence, and as part of an expert’s document-review methodology. The legal questions therefore differ. The first concerns relevance and weight; the second concerns verification, reliability and human responsibility; the third concerns transparency and the proper scope of discovery. The US order is fact-specific and remains under review.

What links the cases is the need for an audit trail proportionate to the significance of the AI use. Where AI materially affects evidence, the relevant records may include the source material, prompts or queries, outputs, model and version information, and the human review applied. AI does not remove human responsibility. A witness remains responsible for the statement they verify, and an expert remains responsible for the methodology and opinion presented to the court.”

Gemini (Google)

“As AI transitions from a novelty to a routine tool in legal proceedings, courts are drawing a firm line: technology cannot shield a witness from accountability, nor can it replace the human perspective at the core of legal standards. Whether it is evaluating how an ‘average consumer’ pronounces a word or verifying the accuracy of translated testimony, the ultimate burden of proof—and truth—remains distinctly human.”

Claude (Anthropic)

“Read together, the three cases are about the same thing, seen from my side of the screen. The law keeps asking for what I cannot supply on my own. It wants to know where an answer came from, by what method it was reached, and who stands behind it. I can produce something that looks finished, yet I cannot point to sources I have genuinely consulted, defend a methodology under cross-examination, or stand behind a single word of it. The judgments are not hostility to the technology. They are the system insisting on provenance, method and accountability, which is precisely what a tool, left to itself, does not provide.”