Summary
- Courts are naming names - in a decision Eugene Volokh covered in January 2026, Judge Carlton Reeves ordered a lawyer to pay $4,000 and his client $1,000 after a declaration carried AI-fabricated quotes, writing that the lawyer’s duty to review “does not absolve” the client who signed it.
- How big is this pattern? Damien Charlotin’s hallucination-cases database tracked 1,420 legal decisions as of early May 2026, up from the roughly 160 a French report counted the previous summer.
- The precedent everyone cites is Mata v. Avianca - the 2023 ChatGPT-fake-citations case where Judge P. Kevin Castel found bad faith and imposed a $5,000 penalty jointly on the lawyers and their firm.
- A recorded sign-off is a design decision - the cases keep punishing humans who vouched without checking, so build review steps where a named person sees the output and the approval gets recorded. See how Tallyfy structures that.
A doctor in Mississippi signed a declaration his lawyer filed in federal court. It carried fabricated quotes - AI-generated material nobody had checked against reality. At the turn of the year, the judge made them split the bill: $4,000 from the lawyer, $1,000 from the doctor himself.
Not the lawyer alone. Both of them.
That allocation is worth more attention than another think piece about hallucinations, because it answers the question every operations leader running AI is quietly asking: when the model gets something wrong, who pays? The emerging answer from actual courtrooms is unglamorous and useful. Accountability lands on the humans who were positioned to check and didn’t, and where nobody was positioned to check, it lands on whoever deployed the thing. Which means the shape of your process - where review steps sit, who signs, what gets recorded - is doing more legal work than your model choice ever will. That’s the argument of this post, and it’s also a preview of the way AI is changing who answers for work far beyond the legal profession: the interesting question stopped being what the model can do and became who stands behind what it did.
Start with the cases, not the commentary
Two sanctions decisions, two and a half years apart, bracket the pattern neatly. Both ended at exactly $5,000. The similarities stop there, and the differences are the lesson.
Approval Management Made Easy
The recent one first. In Pauliah v. University of Mississippi Medical Center, covered by Eugene Volokh on the Volokh Conspiracy in January, a declaration filed in the Southern District of Mississippi contained fabricated material, and opposing counsel burned 40.4 hours - $8,570 worth of work - identifying and answering it. Judge Carlton Reeves sanctioned the filing lawyer, Mr. Begley, who attended the depositions, had the transcripts, and still let fake quotes through. Expected.
The notable move is what came next: the client got sanctioned too. Dr. Pauliah had, in the court’s words, “admitted to signing a declaration, under the penalty of perjury, without verifying - or even attempting to verify, it seems - the truth of its contents.” The opinion gives the principle its plainest form: “Whether he recognizes it or not, Dr. Pauliah may not draft his own declaration with disregard for the veracity of its contents.” And the judge closed the loophole both sides might have hidden in: “Mr. Begley’s obligation to review his client’s declaration does not absolve Dr. Pauliah.”
Read that twice if you run AI anywhere near documents. The reviewer’s duty did not erase the signer’s duty. Each human in the chain owned their own failure to check, and the court priced each failure separately.
The famous one came earlier. Mata v. Avianca is the 2023 case everyone in legal tech can recite - the brief stuffed with six nonexistent opinions ChatGPT invented, names like Varghese v. China Southern Airlines that sounded plausible enough to file.
What people misremember is the ruling’s center of gravity. Judge P. Kevin Castel’s opinion opens with a sentence AI vendors love to quote: “Technological advances are commonplace and there is nothing inherently improper about using a reliable artificial intelligence tool for assistance.” The next sentence is the one that matters: “But existing rules impose a gatekeeping role on attorneys to ensure the accuracy of their filings.” The lawyers, he wrote, “abandoned their responsibilities when they submitted non-existent judicial opinions with fake quotes and citations” - and then kept defending them after opposing counsel raised flags. The court found “bad faith on the part of the individual Respondents based upon acts of conscious avoidance and false and misleading statements to the Court,” and imposed a $5,000 penalty jointly and severally on both lawyers and their firm.
Notice what got punished in each case.
Not the use of AI. The absence of the check.
How often does this actually happen?
More than almost anyone guesses, and the count has a curator.
Damien Charlotin, a legal researcher, maintains a database of court decisions where tribunals addressed AI-hallucinated content - fake citations, mostly, but other fabricated material too. As of this writing it lists 1,420 cases across dozens of jurisdictions, from Argentina to Australia, and it only counts decisions where a court explicitly found or implied that a party relied on hallucinated content. When the database reached Hacker News in mid-2025, it tracked a fraction of that; one French write-up that summer counted about 160 instances involving AI-generated pleadings. The growth curve since is the point. This stopped being a collection of cautionary anecdotes and became a body of law.
That growth curve is the real headline.
Reactions to these cases tend to skip the procedural detail and go straight to dread. The lone commenter on the HN thread about the Pauliah ruling, tim-tday, put it plainly: “If we’re not careful AI will bring about a world where facts mean nothing.” And on the database thread, the crowd spent surprising energy debating whether hallucination was even the right word for what models do - which tells you how unsettled the vocabulary still is, nearly three years after Mata.
The dread is understandable and, I’d argue, aimed at the wrong layer. Courts are not drowning in unknowable AI behavior. They’re applying old rules - verify what you submit, stand behind what you sign - to a new volume of unverified output, and the rules are holding up fine. Nothing about a language model confused Judge Reeves or Judge Castel. Both opinions treat the technology as a detail and the skipped verification as the offense. What the 1,420 cases mostly document is people skipping a check that was always their job.
So extract the pattern, with the obvious caveat that I run a workflow company and this is not legal advice. In the cases fetched and read for this post, accountability followed the humans who were positioned to verify and didn’t: the lawyer who filed without checking, the client who signed without reading, the firm that kept vouching after the flags went up. Where I’m extrapolating, and I’ll label it as such: nothing in these rulings suggests organizations get a softer version of the same logic when there’s no individual in the chain at all. An agent that acts with no named human positioned to check leaves only one place for responsibility to land - on the organization that deployed it. That extrapolation is the safe planning assumption, and regulators drafting AI oversight rules are converging on the same expectation of named human oversight from the other direction.
The thing is, you get to choose between those two postures. That choice is process design.
Process design decides who answers
Strip the legal framing and every one of these cases describes a process with a missing or fake step.
Mata’s chain was: model generates citations, lawyer files them, court discovers they’re fiction. The verification step existed in theory - it’s called being a lawyer - and in practice was skipped, then papered over. Pauliah’s chain had a designed checkpoint: the client signs under penalty of perjury, the lawyer reviews before filing. Both checkpoints got waved through without anyone looking. The court’s response was to hold each owner to their checkpoint anyway. The signature meant what it said even though the signer treated it as a formality.
Now look at the same structure from the design side, because operations leaders get to build this chain on purpose rather than discover it in a sanctions order.
Route the agent’s output through a review step with a named owner, and accountability distributes the way Pauliah distributes it: the reviewer answers for the quality of their check, visibly, with their name on a recorded decision. Let the agent act straight into the world with no checkpoint, and accountability concentrates the way the deployment extrapolation predicts: the organization answers for everything the agent did, with no record of diligence to point at. Neither posture avoids responsibility. One distributes it onto people equipped to carry their slice and creates evidence of care; the other pools it at the top and creates evidence of nothing.
Move it out of the courtroom for a second, because litigation is just the arena where these failures become visible. Say an agent drafts supplier price updates straight into your ERP - a made-up shop, but a real pattern. Version one pushes updates live; months later a mispriced contract surfaces, and the explanation on record is that the system did it, which is no explanation at all. Version two parks each update at a step where a category manager approves before anything posts. Same agent. Same error rate, probably. But in version two the bad update has a named reviewer, a timestamp, and a comment trail, and the conversation afterward is about a person’s judgment call on a specific Tuesday rather than about why the company let software spend its money unsupervised.
Only one of those conversations is survivable.
There’s a quality version of this argument - review gates catch the errors in AI-built workflows - and this post is its colder companion. Even when the gate fails to catch the error, its existence changes what you can prove afterward: that a qualified person looked, when, at what. Courts in these cases punished absent and fake diligence. A recorded, real check is the opposite fact pattern. And the deeper your automation goes, the more that record is the only artifact distinguishing “we supervise our AI” from “we hoped.” The two-agent reviewer pattern that engineers keep converging on makes the same discovery from the build side: a generation step plus a review step with one owner beats anything cleverer, and the review’s value is precisely that someone specific holds it.
A question for your own stack, and be honest: for the most consequential thing your AI touched this week, could you name the person who was supposed to check it?
If the answer is a team, that’s a no. If the answer is the model is pretty reliable, that’s also a no - reliability rates are an argument about how often you’ll need the defense, and no part of an argument about having one.
Make the sign-off worth signing
Here’s where Pauliah gets genuinely instructive for process design, because it shows that a checkpoint can exist and still be worthless. The doctor did sign. The signature did nothing for him, because he signed “without verifying - or even attempting to verify.”
A sign-off step protects people exactly to the degree that it’s real. Designing a real one is less about software than about four properties any tool, ours included, has to serve rather than supply.
The reviewer is named, singular, and competent for the thing reviewed. “Legal reviews it” is not a gate; a specific lawyer with the transcripts is. Pick the person who could actually catch the failure mode - which for AI-fabricated content means someone positioned to check claims against sources rather than vibes against vibes.
The reviewer sees the actual output, with means to verify it. A summary of the thing is not the thing. Begley had the depositions and the transcripts in hand; the court sanctioned him because verification was possible and skipped. Attach the source material to the review task itself, so checking costs minutes instead of a painful hunt through file shares.
The decision is recorded with identity and time. An approval that lives in a hallway conversation is a memory by Friday. What caught us flat-footed at Tallyfy, years ago: enterprise buyers cared less about how approvals worked than about whether anyone could later prove one happened - the record turned out to be the product. A recorded decision in a task with an approval step carries who, when, and what was in front of them, which is the exact triplet a sanctions order reconstructs when things go wrong.
Rejection has somewhere to go. A gate that can only say yes isn’t a gate. The workflow behind a real one routes a rejection back with the reviewer’s comment attached, and that loop - draft, check, fix, re-check - is what makes the eventual approval mean something.
An error we made in Tallyfy’s early years, for what it’s worth: we treated approvals as a feature checkbox, one more step type in the builder. Watching how compliance-heavy customers actually used them changed our minds - they were building defensibility, run by run, and the approval record was the artifact their auditors and lawyers reached for first. That’s also why a template like the one below pairs the review with the evidence rather than bolting a yes/no onto the end.
None of this slows the AI down, which is the objection I hear most. The model still drafts the contract analysis in seconds. The change is that the draft waits at a step where a named lawyer reads it against the contract before it goes anywhere consequential - a few minutes of human time purchased against the 40.4 billable hours opposing counsel spent unwinding the alternative in Mississippi.
What process design cannot do for you
Three honest notes, so this doesn’t read like a workflow vendor promising legal immunity.
It can’t make a bad check good. Pauliah’s signature was a process step, executed emptily, and the court priced it accordingly. A review step staffed by someone who clicks approve unread builds you a record of negligence with excellent timestamps. The design goal is making real review cheap - small units, attached evidence, clear criteria - because an expensive check gets skipped, and a skipped check repeats Pauliah’s empty signature with better software.
It can’t tell you where the law ends up. Sanctions orders against filers are not the full map of AI liability, and the doctrine is moving while this post ages. What the fetched cases establish is narrower and sturdier: courts expect a human check on AI output, and they’re willing to bill the specific humans who skipped it. Questions past that - vendor exposure, negligence standards for agent deployments, how this plays outside the US - belong with your counsel, not your workflow tool.
What it can do is decide, in advance and on purpose, who answers - and arm that person to answer well. The organizations that come through this era clean won’t be the ones whose models never erred. They’ll be the kind of shop that can produce, for any consequential AI action, the name of the person who checked it, the moment they did, and what they saw - the boring trio that disciplined process design has always banked, and that now pays out in courtrooms.
The model will make a wrong call eventually. The only open question is whether your process catches it - and failing that, whether it can show anyone tried.