The physics of AI hallucination: New research reveals the tipping point for large language models

Colin Hunter
Sep 3
6 min read

Updated: Sep 30

Physicist Neil Johnson has mapped the exact moment AI can flip from accurate to false, and he says understanding their underlying physics could be the key to safer systems.

Physicists have uncovered a formula that predicts the exact moment when an AI system will tip from accuracy into fabrication — a discovery they say could make future chatbots safer, more predictable, and more trustworthy.

Niel Johnson, professor of physics at George Washington University, in a blue shirt with folded arms stands in a bright room, smiling confidently. Light-colored wall and window in the background. — *Neil Johnson, professor of physics (Credit: George Washington University)*

Neil Johnson, a professor of physics at George Washington University, has modelled large language models (LLMs) as physical systems, revealing that AI hallucinations aren’t just random glitches. They’re baked into the system’s structure, much like phase transitions in magnetism or thermodynamics. In his new paper, Johnson pinpoints a critical “tipping point” where factually correct answers give way to falsehoods, and says that once you can describe that point mathematically, you can begin to engineer solutions.

The irony is that Johnson first learned about the reach of his work through an AI-generated podcast — one that explained his research perfectly. However, the very physics he describes shows how easily that same AI could have strayed into error, underscoring both the promise and the peril of these systems.

How AI perfectly explained the science of its own flaws

Just hours after Johnson posted his latest paper on arXiv, “he came across a podcast explaining the research.

That was fast, Johnson thought.

Inhumanly fast.

The entire production – script, voices, and the chipper banter between two co-hosts – was AI-generated.

Even more surprising than the speed was the accuracy. The AI-generated podcast’s explanation of his work contained no hallucinations. The robotic co-hosts “absolutely nailed it,” says Johnson, a professor of physics at George Washington University in DC.

“It did a much better job than I could have done,” he confesses.

The irony wasn’t lost on him: Johnson’s paper is about how generative AIs, like the one that conjured the podcast, can suddenly veer from accurate to dangerously wrong — hallucinating incorrect “facts” apparently out of thin air.

In this case, there were no fabrications. The podcast is a crisp summary of the physics framework Johnson and co-author Frank Yingjie Huo calculated to understand AI hallucinations. It even came up with a handy metaphor that the authors themselves hadn’t considered.

“Weirdly," Johnson says on a call from his Washington office, “the podcast started talking about magnets, and I would never have gone there straightaway. But it worked! I was massively impressed."

Still, he’s not naive. That same system could easily have tipped into nonsense — he knows this because his research proves that AI hallucinations are, unfortunately, “baked in” by the laws of physics.

Phase transitions: Why AI can suddenly hallucinate

Johnson’s key finding is that hallucinations in large language models (LLMs) aren’t just bugs or edge cases — they emerge naturally from the physics of the system itself.

His research argues that the mechanism driving AI outputs is mathematically equivalent to a multi-spin thermal system — a well-studied structure in physics.

Each token in a prompt (a word or fragment) is modelled as a “spin" in a high-dimensional space. The LLM’s attention mechanism (its way of focusing on certain parts of a prompt) amounts to calculating interactions between these spins. In physics terms, it’s a two-body interaction, like gravity or magnetism.

Or, as the AI-generated podcast on the subject handily explains it:

“Imagine every word or concept the AI knows is a tiny magnet. Each magnet has a north and south pole… Your prompt, what you type into the chat box, is like placing the first few magnets on the board. And the AI’s job is to place the next magnet. It looks at the collection of magnets so far… and calculates the overall magnetic field of the system. Then it picks the next word magnet that best aligns with that field. In physics terms, it's looking for the lowest energy state — the most stable and natural fit.”

This setup allows the authors to derive an equation predicting when a model’s output will shift, or “tip,” from good (that is, factually accurate) to bad (misleading or false).

“The tipping point is determined from the very start," Johnson explains. “It’s hard-wired by the structure of the prompt and the model’s training."

The math predicting AI hallucinations

At the heart of Johnson’s model is Equation 1 — a compact formula that estimates how many tokens of accurate text the model will produce before it flips into hallucination.

“We’re not saying this explains everything about these models," Johnson cautions. “But despite all their complexity, they’re built from a common atom: the attention head. That’s where the physics lives."

An attention head is the core computational unit in a language model; it’s what decides which parts of the input to “pay attention" to when generating the next word in a string.

The attention head compares and weighs all the tokens in the prompt, and calculates how strongly they relate to one another.

Diagram of a model with steps illustrating token encoding, attention, and bias in outputs. Includes words "THEY ARE EVIL,” vs. "THEY ARE GOOD" sections (a), (b), (c). — (a) Attention, shown in its simplest form, underpins generative AI systems, though no first-principles theory yet explains why it works or when it fails. (b) Our derivation reveals the physics of attention: each spin S(i) maps directly to a token in embedding space, with wiggly lines denoting the emergent 2-body interactions from Eq. 1. (c) The Context Vector N(0) corresponds to a bath-projected 2-spin Hamiltonian, biased by training or fine-tuning. This bias can tip outputs toward inappropriate (“bad”) vs. appropriate (“good”) content, with Figures 3–4 showing the resulting phase boundary in detail. (Credit: Huo and Johnson, *arXiv:2504.04600v1*)

In a single-layer model, tipping occurs when the system’s internal magnetization, like an averaged spin, shifts just enough to make misleading content seem more probable than accurate content.

In multilayer models like ChatGPT or Claude, that risk is amplified in a snowball effect. As tokens pass through layer after layer, they undergo what Johnson calls “fusion and fission”—blending or breaking apart in meaning.

“We found that different types of content, like good and bad responses, can cluster together in the embedding space," he explains. “And that can make a tipping event much more likely by the time the final layer is reached."

Physics, not guesswork: Why AI hallucinations aren’t random

Hallucinations are often dismissed as random errors and the inevitable fuzziness in probabilistic systems.

Johnson argues that most of an AI system is deterministic, governed by the same laws of physics as everything else, and therefore comprehensible and predictable.

“The only random bit is the last step — choosing the next token based on probabilities. Everything before that is repeatable."

He believes physics is the only discipline equipped to provide those missing fundamentals. Just as physics transformed our understanding of heat into thermodynamics, he thinks it can turn neural networks into systems we can model, predict, and engineer.

Neural networks have a long history of being studied through the lens of physics — from John Hopfield’s pio neering work in the 1980s modelling them as spin systems, to decades of research on attractors and phase transitions.

What Johnson and Huo have added is a focused lens on the attention head itself, the core unit of today’s large language models.

How “gap cooling” could stabilize AI reasoning

Understanding these tipping points could help scientists develop solutions to make LLMs more stable, predictable, and ultimately safer.

“We’re not saying the system is broken,” Johnson explains. “But if you can describe something in the language of physics, then you can also fix it that way.”

One possible fix he and Huo propose is the physics concept of “gap cooling”. By increasing the conceptual distance between accurate and inaccurate content in the model’s embedding space, they hope to make it harder for the system to tip into hallucination. It’s a way of reinforcing the guardrails at the atomic level of the model’s reasoning.

The crux of Johnson’s work is that generative AI isn’t a mysterious black box. Beneath its complexity lies a structure governed by dynamical and statistal laws that can, in theory, be decoded and harnessed.

A digital circuit board in a head emits glowing orange, blue, and green smoke in a dark setting, symbolizing technology and imagination.

“It’s not some unknowable magic,” says Johnson. “It’s a physical system. And physical systems can be modelled, tested, and understood.”

Despite the dangerous implications of AI getting things wrong (and the precision with which his research predicts those tipping points will inevitably happen) Johnson is no doomer. Quite the opposite.

“What gives me hope,” he says, “is that this isn't some magical, unknowable process. We can describe it. We can write down equations. And once you can do that, you can engineer around the problems.”

Johnson envisions future AI models with built-in diagnostic tools that reveal, before a model says a single word, whether it’s headed for a safe answer or a hallucination. He even imagines models that self-correct mid-sentence, cooling their internal “spin state” before tipping into error.

Techniques like gap cooling could, in principle, help reduce obvious missteps by reinforcing the separation between more accurate and less accurate outputs.

But context complicates things. What counts as “good” or “bad” depends heavily on the prompt and situation. That means eliminating hallucinations isn’t a simple fix. Yet, even partial guardrails may help make these systems more predictable and trustworthy than they are today.

“I am absolutely 100 percent an optimist,” he says. “I believe AI and physics have a very rosy future,” he says. “This is just the beginning. The more we apply real science to these systems, the less mysterious they become. And the more trustworthy they’ll be.”

Colin Hunter is a science communicator, filmmaker, and contributor to FirstPrinciples. He previously led the communications teams at the Perimeter Institute for Theoretical Physics and the Institute for Quantum Computing (IQC) at the University of Waterloo.