Inappropriate use of ChatGPT exposed in tax case

A litigant in person provided the FTT with summaries of a number of fictitious cases in support of her defence against a tax penalty

11 December 2023

Publication

A litigant in person apparently using ChatGPT has provided the FTT with summaries of a number of non-existent cases in support of her defence against a penalty for failure to notify a CGT liability on the sale of a rented property: Harber v HMRC [2023] UKFTT 1007. Although the FTT accepted that the false citations were provided innocently, the tribunal has provided a stern warning of the dangers of litigants using such AI hallucinations in litigation.

The case illustrates the potential dangers of using AI inappropriately. ChatGPT and LLMs are powerful tools, but they are not primarily legal research tools and the case highlights the very real dangers of hallucinations when they are used inappropriately. This is particularly likely to be a problem for litigants in person who do not have access to the same resources as lawyers to check the information provided by the AI. This should not be the case, however, for an informed user that knows the limits of the technology and knows enough to choose the most appropriate tools for the jobs at hand.

What happened in the case?

Mrs Harber was a landlord who sold one of her properties, but failed to notify her liability to CGT on the sale. HMRC issued her with a "failure to notify" penalty. She appealed the penalty on the basis that she had a reasonable excuse either because of her mental health or on the basis that it was reasonable for her to be ignorant of the law.

In her written evidence to the tribunal (the Response), Mrs Harber provided the names, dates and summaries of nine FTT decisions in which she claimed the appellant had been successful in showing that a reasonable excuse existed. Four were based on ignorance of the law and five were based on mental health issues. Following checks carried out by HMRC's counsel, it turned out that none of those authorities were genuine.

When cross-examined, Mrs Harber explained that that the cases in the Response had been provided to her by "a friend in a solicitor's office" whom she had asked to assist with her appeal. Mrs Harber did not have more details of the cases, in particular, she did not have the full text of the judgments or any FTT reference numbers. The tribunal concluded that the cases had been generated by artificial intelligence (AI).

The basis for the FTT's conclusion that the cases were generated by ChatGPT or similar AI included the following points:

  • Mrs Harber accepted that it was "possible" that the cases in the Response had been generated by an AI system, and she had no alternative explanation for the fact that no copy of any of those cases could be located on any publicly available database of FTT judgments.
  • The Solicitors' Regulation Authority (SRA) recently said this about results obtained from AI systems: "All computers can make mistakes. AI language models such as ChatGPT, however, can be more prone to this. That is because they work by anticipating the text that should follow the input they are given, but do not have a concept of 'reality'. The result is known as 'hallucination', where a system produces highly plausible but incorrect results."
  • The cases in the Response were "plausible but incorrect", in some cases using very similar names to actual judgments, albeit with different outcomes, and similar claims to those made in the made up cases.
  • The Tribunal was also assisted by the US case of Mata v Avianca 22-cv-1461(PKC), in which two barristers sought to rely on fake cases generated by ChatGPT. Like Mrs Harber, they placed reliance on summaries of court decisions which had "some traits that are superficially consistent with actual judicial decisions". When directed by Judge Kastel to provide the full judgments, the barristers went back to ChatGPT and asked "can you show me the whole opinion", and ChatGPT complied by inventing a much longer text. The barristers filed those documents with the court on the basis that they were "copies...of the cases previously cited". Judge Kastel reviewed the purported judgments and identified "stylistic and reasoning flaws that do not generally appear in decisions issued by United States Courts of Appeals".
  • Unlike the barristers, Mrs Harber did not take the further step of asking ChatGPT for full judgments, so we had only the less detailed summaries. These had fewer identifiable flaws than those which Judge Kastel had identified in the longer full decisions with which he was provided. However, we noted that all but one of the cases in the Response related to penalties for late filing, and not for failures to notify a liability, which was the issue in Mrs Harber's case.

The FTT accepted, however, that Mrs Harber had been unaware that the AI cases were not genuine and that she did not know how to check their validity by using the FTT website or other legal websites.

More generally, the FTT rejected Mrs Harber's claim that it didn't matter that she had provided the court with fictitious case references with the following comments:

"We instead agree with Judge Kastel, who said on the first page of his judgment (where the term "opinion" is synonymous with "judgment") that:

"Many harms flow from the submission of fake opinions. The opposing party wastes time and money in exposing the deception. The Court's time is taken from other important endeavors. The client may be deprived of arguments based on authentic judicial precedents. There is potential harm to the reputation of judges and courts whose names are falsely invoked as authors of the bogus opinions and to the reputation of a party attributed with fictional conduct. It promotes cynicism about the legal profession and the...judicial system. And a future litigant may be tempted to defy a judicial ruling by disingenuously claiming doubt about its authenticity."

We acknowledge that providing fictitious cases in reasonable excuse tax appeals is likely to have less impact on the outcome than in many other types of litigation, both because the law on reasonable excuse is well-settled, and because the task of a Tribunal is to consider how that law applies to the particular facts of each appellant's case. But that does not mean that citing invented judgments is harmless. It causes the Tribunal and HMRC to waste time and public money, and this reduces the resources available to progress the cases of other court users who are waiting for their appeals to be determined. As Judge Kastel said, the practice also "promotes cynicism" about judicial precedents, and this is important, because the use of precedent is "a cornerstone of our legal system" and "an indispensable foundation upon which to decide what is the law and its application to individual cases", as Lord Bingham's said in Kay v LB of Lambeth [2006] UKHL 10 at [42]. Although FTT judgments are not binding on other Tribunals, they nevertheless "constitute persuasive authorities which would be expected to be followed" by later Tribunals considering similar fact patterns, see Ardmore Construction Limited v HMRC [2014] UKFTT 453 at [19]."

The FTT went on to consider Mrs Harber's substantive arguments and found that she did not have a reasonable excuse for her failure to notify HMRC of her liability to CGT.

Comments

As highlighted above, the case demonstrates the limitations of using AI as a legal research tool, especially when used inappropriately. Use of external generative AI (such as public ChatGPT) is not recommended for specific legal research. The risks include hallucinations (as highlighted in this case) but also, and less obviously, the prompts by the litigant in person may expose data to the model that it may use for its future training.

It is perhaps not surprising that Mrs Harber was a litigant in person and failed to appreciate the limitations of ChatGPT in this area, particularly where she lacked the knowledge and resources to check the information provided by ChatGPT. She (or her friend) simply used an inappropriate tool without the relevant experience to know what needed to be checked and how to check it. Clearly, lawyers will be held to a higher standard by the tribunals and courts and it is important, therefore, that the uses (and limitations) of such AI tools are properly understood, including the use of clear and specific instructions and prompts to ensure appropriate responses from the AI.

This document (and any information accessed through links in this document) is provided for information purposes only and does not constitute legal advice. Professional legal advice should be obtained before taking or refraining from any action as a result of the contents of this document.