AI Text Detectors: Can You Trust Them?

On 12 May 2026, the British literary magazine Granta — founded in 1889, the place that printed the early work of half the living greats in English — put online the five regional finalists of the Commonwealth Short Story Prize, the award the Commonwealth Foundation hands out each year for unpublished fiction from the fifty-six countries of the Commonwealth. Within days, one of those stories set off a storm.

"The Serpent in the Grove," by Trinidad's Jamir Nazir, the Caribbean regional winner, follows a rum-drinking farmer who stumbles onto an enchanted grove. Several readers found the prose a little too smooth, a little too clean. Ethan Mollick, a Wharton professor and one of the most-followed commentators on AI anywhere, ran the story through Pangram, an American tool that claims to detect machine-written text. The verdict came back: "100% AI-generated."

One hundred percent. The number is crisp, square, final. It looks like proof.

It isn't. And that's where the real story starts — the one nobody wants to tell, because it annoys both camps at once.

What a detector measures, and what it doesn't

Start with the one point everyone should agree on, because it's technical and checkable. An AI detector doesn't read. It guesses. It understands neither the story, nor the style, nor the intent. It measures one thing: the "perplexity" of the text — how predictable the word-by-word sequence is. The more expected a sentence, the more statistically ordinary, the more likely the machine deems it machine-made.

The flaw is obvious the moment you sit with it. A writer working in an English learned at school, without the tics and irregularities of a native speaker, mechanically produces more predictable text. A Caribbean short story, written in the careful, slightly schoolbook English of someone for whom it isn't quite the mother tongue, ticks exactly the boxes the detector associates with a machine.

This isn't a hunch. It's a study. In 2023, a Stanford team ran TOEFL essays — the English test for non-native students — through seven consumer AI detectors. The result: a 61.22% false-positive rate. More than one human essay in two, written by a real student, flagged as "AI-generated." Every detector, unanimously, falsely accused some of those students. The machine doesn't catch AI. It catches people who write a clean, careful, second language.

Pangram, the tool used on "The Serpent in the Grove," came after that study and isn't in it — to be fair, its makers claim far lower false-positive rates. But the principle doesn't change. A detector returns a probability. Displaying it as a 100% verdict is dressing a statistic up as a sentence. And between a probability and a proof lies all the distance that separates a suspicion from a conviction.

Granta's defense, or the algorithm that accuses itself

What follows is almost comic, if you forget there's a human author at the end of it. Pressed to explain, Sigrid Rausing, Granta's publisher, said her team had shown the story to Claude, Anthropic's chatbot, and asked whether the text was AI-written. Claude's answer: the text was "almost certainly not produced unaided by a human."

Read that again. To find out whether a story had been written by an AI, a major literary magazine asked an AI for its opinion. The accusation was handed to one algorithm, the defense to another. Somewhere in this affair the humans left the room — and no one seemed to notice at the time.

The Commonwealth Foundation, for its part, landed on the only defensible position: all finalists declared on their honor that the work was their own, written unaided, and absent a reliable tool to settle it, the Foundation "must operate on the principle of trust." Translation: we don't know. Nobody knows. And that is exactly the heart of the problem.

There are three cases, not two

Here's where the debate gets stuck. On one side, the anti-AI camp: "detectors unmasked cheats, machine-written texts won prizes, ban them." On the other, the pro-AI camp: "detectors are pseudoscience, and who cares anyway." Both camps fight over the same question — "did AI write this text, yes or no?" — as if the question had an answer, and as if it were the right one.

But there aren't two cases. There are three.

The first: a writer who uses AI as a tool. Asks for a synonym, has it reread a paragraph, tests a phrasing, throws out nine-tenths of it. This is exactly what developers have done for two years. Nobody argues they stopped being developers. They talk with the tool, refuse most of what it offers, decide every line. The AI rephrases, tidies, suggests — and now and then, by accident, unsticks a sentence. But the developer decides what goes into the code. Why would a writer be different? We're project leads too.

The second case is the real red line, and it now carries a name borrowed from tech: vibe writing. The literary cousin of "vibe coding" — the developer who fires prompts at the machine, accepts whatever comes out without really reading it, and ships. Nobody takes them seriously in software. Why would it be otherwise in fiction? The vibe writer generates two hundred and eighty pages from a prompt and signs the cover. There, yes, there's a problem — of substance, of honesty, of authorship.

The third case is the newest, and the one we refuse to look at: the author accused by a detector. Neither assisted nor cheating. Just a human whose prose, too clean or too second-language, tripped a statistical alarm. That one, we convict on a number.

The real scandal isn't the one you think

And here is where I'll annoy everyone.

To the anti-AI camp, I say: a detector number is not proof. You are rebuilding graphology and the lie detector — the pseudosciences it took a century to chase out of courtrooms — and rebranding them "rigor" because there's a percentage on the end. You don't convict an author on textual perplexity. Least of all when that perplexity falls hardest on those who write in a learned language — which is, very often, the writers from former colonies this prize exists precisely to spotlight.

To the pro-AI camp, I say: "who cares" is a shrug that hands literature to the vibe writers. A prize that rewards original, unaided human fiction means something. Writers' distrust isn't the panic of old reactionaries: it's legitimate, even when the tool meant to satisfy it is broken. You can hold both ends: the detector is bad, and the question it raises is real.

So the scandal isn't that an AI may have won a prize. The scandal is that we now accuse a human on a number no one can contest. The burden of proof inverted without anyone voting on it: it's no longer for the accuser to prove fraud, it's for the author to prove innocence. "Prove you didn't use AI" — go ahead, try. It's strictly impossible. You can't prove an absence.

And the detector, in all this? It can't tell the first case from the second. It draws no line between the writer who asked for a synonym and the one who generated the whole book. The tool meant to police the line is blind to the line it claims to police. It sees neither intent, nor labor, nor hand — it sees predictability, and it fires.

What would be left if we put the detectors away

If detection doesn't work — and it doesn't — one solid thing remains: provenance. Not "does this text look human?", a question no machine will ever answer honestly, but "can you show how this text was written?" The history, the versions, the crossings-out, the draft becoming a page. Less spectacular than a red percentage, but checkable.

That's also why a strict no-AI mode, now an option across a handful of online writing studios — from Novlr to Extypis — isn't just a marketing line. For a student under an integrity check, for an author who wants to prove a fully human provenance, it's the only answer that holds against a detector that's wrong half the time: not to defend yourself after the fact, but to be able to show the path. The door an AI tool holds open is not the same thing as the threshold the author crosses.

Jamir Nazir swore he used no AI. Maybe he's telling the truth. Maybe not. The uncomfortable part is that we'll never know — and that a respectable literary foundation preferred to ask a robot rather than admit it. As long as we take a probability for a proof, we'll keep burning a few innocents to be sure of catching one culprit. It's an old method. It never aged well.

Sources: Granta ("The Serpent in the Grove," May 2026); Literary Hub; Futurism; Boing Boing; Scroll.in; Slashdot; Caribbean360; Liang et al., "GPT detectors are biased against non-native English writers," Patterns (Cell), 2023; Stanford HAI; Commonwealth Foundation.

AI Detectors: The Real Scandal Isn't the One You Think