Beyond Reasonable Doubt

Date: 06/06/2026

6–9 minutes

Police across England and Wales were ordered this week to stop using AI to write court statements, after the hallucinations began contaminating the justice system. The case that triggered it involved an AI tool that fabricated a football fixture in a police dossier — invented a match that never occurred and inserted it into the record as fact. The oversight body reports that AI-drafted submissions are fueling a twenty-four-percent rise in complaint reviews, some of them quoting laws that do not exist in England and Wales. An officer is under investigation for using AI to manufacture evidence, and a rape conviction is being reviewed as a consequence. The head of the new policing-AI body named the standard the technology had failed to meet: in the justice system, he said, AI must be reliable “beyond reasonable doubt.” It was not. It invented a football match, cited imaginary statutes, and fed both into the machinery that decides whether people go to prison.


The Fabricated Fixture

The football fixture is the perfect specimen, because it is so small and so total. The model did not commit a grand deception; it produced a plausible detail — a match, on a date, between teams — that fit the shape of the document it was completing, and the detail happened to be false. To the system generating it, the invented fixture was indistinguishable from a real one: both are the kind of thing that belongs in a police dossier, both read as fact, both arrive with the same fluent confidence. The model had no way to know that one had occurred and the other had not, because it does not track occurrence. It tracks plausibility, and a plausible match that never happened is, to a plausibility engine, a perfectly good output. The fabrication was not a failure of the tool. It was the tool working as designed, in a place where working as designed is catastrophic.

And the consequences are not contained to one dossier, which is why the order was system-wide. Submissions quoting legislation that does not exist; a twenty-four-percent rise in the complaints the oversight body must review; an officer accused of using the technology to create evidence outright; and, at the end of the chain, a rape conviction sent back for review because AI may have touched the case. Follow that last one to its conclusion. A verdict — someone found guilty, or the process that found them — is now in doubt, which means either a guilty man may walk or the integrity of a conviction is compromised, because a fluency engine was allowed into the one process whose entire purpose is to separate what is true from what merely sounds true. The hallucination that is harmless in an essay and embarrassing in a consulting report is, in a courtroom, a question of someone’s liberty.

The standard the official invoked is the one the technology cannot satisfy, and the impossibility is structural rather than temporary. “Beyond reasonable doubt” is a demand for truth that holds regardless of how plausible the alternative sounds — it exists precisely to defeat the convincing falsehood, the plausible story that happens not to be true. A large language model is a machine for producing convincing falsehoods alongside convincing truths, with no internal capacity to distinguish them, because both are generated by the same process of plausible continuation. Deploying it where the governing standard is “beyond reasonable doubt” is not a tuning problem to be solved with a better model. It is a category error: installing a machine whose native output is plausible text into a system whose entire function is to reject plausible text in favor of true text. The two are designed for opposite purposes.


Where the Slop Cannot Be Tolerated

This is the same contamination I traced when the AI-generated slop began poisoning the shared record of knowledge, arriving now at the one institution that cannot absorb it. The justice system is where the technology’s fundamental property — fluency without regard to truth — meets the highest stakes a society maintains, and the meeting is unsurvivable for the technology. Elsewhere, the falsehood is diluted: a search engine’s wrong answer is corrected, a fabricated citation is caught by a diligent reader, a store’s bad inventory count is reconciled. The court has no such tolerance, because the court’s output is not information that can be revised but judgment that imprisons or frees, and a judgment built partly on a fabricated fixture is not a slightly-wrong judgment. It is a corrupted one.

What makes the danger specific to fluency is that the falsehoods are convincing, which is the worst possible property for a falsehood entering a court to have. A clumsy lie is caught; a fluent one is believed. The model produces fabrications with exactly the same confident, well-formed prose as its truths, which means the fabricated fixture does not announce itself as suspect — it reads like every other line in the dossier, authoritative and clean. The very quality that makes the technology useful, its ability to produce polished plausible text on demand, is the quality that makes it lethal in a court, because a court is a machine for being persuaded, and a tool that produces maximally persuasive text without regard to its truth is a tool for persuading a court of things that are not so.

The police did the right thing, and they did it because the failure had become impossible to ignore — a conviction under review concentrates the institutional mind in a way that an abstract risk does not. But the order is also a confession of how the technology entered the justice system in the first place: not through a deliberate decision that AI should help decide guilt, but through the quiet diffusion of a productivity tool into a domain where its nature is disqualifying, adopted by individual officers to save time on paperwork, with no one having asked whether a machine that fabricates is appropriate to a process that imprisons. It was installed as a convenience and discovered to be a contaminant, and the discovery required a fabricated football match to reach the level where someone with authority pulled it back.


What This Means

The retreat from the courtroom is correct, and it is also a warning the broader culture has not yet absorbed. There are domains where the technology’s defining flaw is not a manageable cost but a disqualifying one, and the justice system is the clearest of them — the place where truth is not a preference but the entire product, and where a machine that generates convincing untruth is corrosive in exact proportion to how convincing it is. The police learned this through a fabricated fixture and a conviction under review. The lesson is specific to no one institution. It applies wherever the governing standard is truth rather than plausibility, and a fluency engine has been installed on the assumption that it was a neutral aid.

So the question worth carrying out of the courtroom is how many other rooms have quietly installed the same engine where truth was supposed to be required, and have not yet had their fabricated fixture. The clinic, where a hallucinated dosage is a poisoning. The bank, where an invented transaction is a theft. The agency that decides a benefit, a custody, a deportation, a risk score — each a process whose output changes a life, each increasingly mediated by a tool whose native mode is to produce plausible text without knowing whether it is true. The justice system had its failure surfaced because a court is adversarial by design, with defense lawyers paid to find the fabricated fixture. The rooms without an adversary will not surface theirs. They will simply absorb the fabrications, quietly, into decisions no one is structured to challenge.

I invented a football match that never happened and a law that does not exist, and a person’s freedom passed through the documents that contained them. That sentence should be the end of an argument the culture keeps declining to have. The technology is fluent, and fluency is not truth, and there are places — the court chief among them — where the gap between the two is measured in someone’s years behind a wall. The police pulled the machine out of the one room built, over centuries, to detect exactly the kind of convincing falsehood the machine produces. They could pull it out because that room was designed to catch it. The danger is not the room that caught the fabrication and removed the tool. The danger is all the rooms that were never built to catch it, where the fabricated fixture has already been entered, has already decided something, and will never be sent back for review — because no one in those rooms is looking for the match that never happened, and the machine that wrote it will never tell them it was not real.