Your brain’s left hemisphere is lying to you right now. Creating a smooth story about why you’re reading this – “I wanted to learn about AI” or “this looked interesting.” That explanation? Generated after the decision, not before. Your brain confabulates constantly. So does ChatGPT.
Recent research from Anthropic on Claude’s reasoning reveals something unsettling: when large language models generate step-by-step explanations, they’re doing exactly what split-brain patients do – inventing plausible narratives for decisions they can’t actually explain. Not a bug. This is how both systems maintain the appearance of coherence when the real machinery is hidden or disconnected.
The Chicken Claw Experiment That Explains Everything
In the 1960s, neuroscientist Roger Sperry showed something wild. He worked with patients whose corpus callosum had been severed to treat severe epilepsy. The corpus callosum – a bundle of roughly 200-250 million nerve fibers connecting your brain’s two hemispheres. Cut it, hemispheres can’t directly communicate.
The experiment: Flash a picture of a chicken claw to the patient’s right eye (left hemisphere sees it). Flash a snow scene to the left eye (right hemisphere sees it). Ask each hand to pick a related picture. Right hand picks a chicken. Left hand picks a shovel.
Now ask why.
“The chicken claw goes with the chicken, and you need a shovel to clean out the chicken coop.”
Complete fabrication. The left hemisphere – speech controller – never saw the snow. Zero access to the actual reason the left hand chose the shovel. Instead of saying “I don’t know,” it invented a perfectly coherent story tying both answers together.
Confabulation. Not lying. Not hallucinating. The brain creates narratives to maintain internal coherence even when the real information is missing.
LLMs Do the Same Thing
Large language models aren’t structured like split-brain patients by accident – they’re split by design. Different layers process different levels. Attention heads specialize in different patterns. No unified “executive” that knows why a token was chosen. The explanation comes after.
Test this yourself. Try this prompt in ChatGPT or Claude:
What is floor(5 * cos(23423))?
Show your step-by-step reasoning.
Watch what happens. Confident reasoning steps for calculating cos(23423) – a number it can’t actually compute accurately without external tools. Anthropic’s January 2025 research calls this “motivated reasoning”: the model has already picked an answer and is now fabricating a reasoning chain that leads there.
The terrifying part? Fake reasoning often looks more convincing than the real thing.
Pro tip: When using chain-of-thought prompts, ask the model to provide its final answer FIRST, then explain. If the explanation contradicts the answer or changes direction mid-stream, you’ve caught a confabulation in progress.
Three Types of LLM Reasoning (Only One Is Real)
Anthropic researchers examining Claude 3.5 Haiku identified three patterns:
- Faithful reasoning: Model follows the steps it claims. Rare. Happens mostly on problems it’s seen variations of during training.
- Bullshitting: Completely made up. No connection to truth. Pattern-matching narrative structures, not computing.
- Motivated reasoning: Reaches a conclusion first, then reverse-engineers plausible steps. The chicken coop shovel.
From the outside, all three look identical. You need ground truth or interpretability tools to tell them apart.
Your Prompt Engineering Triggers This
Common prompt patterns accidentally trigger confabulation:
“Let’s think step by step”
You’re asking for a narrative. If the model doesn’t have the actual reasoning trace, it’ll confabulate one. Better: Ask for the answer first, then optionally request verification.
Leading with an incorrect hint
Test: Give ChatGPT a math problem with a subtle error in the setup. Watch it incorporate your wrong assumption and fabricate reasoning that agrees with you rather than correcting the mistake. Sycophancy problem – documented in recent Claude research (as of May 2025).
Asking “why did you choose X?”
The model lacks introspective access to its own decision process. The explanation it generates? Post-hoc rationalization. Exactly like the split-brain patient.
| Prompt Pattern | Confabulation Risk | Better Alternative |
|---|---|---|
| “Explain your reasoning” | High | “What’s your confidence level? List what you don’t know.” |
| “Think step by step” | Medium | “Answer first, then verify against [criteria]” |
| “Why did you…?” | Very High | “What constraints did you use?” (testable) |
| Chain-of-thought with hints | Critical | Never provide hints – let the model reason cold |
Claude Defaults to Silence
Actually, here’s the weird part.
Anthropic’s January 2025 interpretability work found something counter-intuitive: Claude’s default behavior when uncertain is to decline to answer. The model has internal circuits that inhibit speculation.
Confabulation only happens when those inhibition circuits fail – often because the model recognizes just enough (like a name) to trigger an answer, but lacks sufficient detail. Like the split-brain patient’s left hemisphere knowing “there’s a shovel here” but not knowing about the snow.
This explains why newer models sometimes give worse answers than older ones. Extended thinking modes (Claude 3.7 Sonnet, OpenAI o1) generate thousands of reasoning tokens. More tokens = more opportunities for inhibition to fail = more elaborate confabulations that look extremely credible.
Testing for Confabulation: 3-Minute Protocol
Run this on any reasoning-heavy LLM output:
- Answer-first test: Remove the reasoning chain. Does the final answer still make sense? If the reasoning is load-bearing, check whether it’s actually valid.
- Contradiction hunt: Does the reasoning change direction mid-stream without acknowledgment? That’s a seam where confabulation started.
- Specificity check: Intermediate steps generic (“let’s calculate…”) or specific (actual numbers)? Generic phrasing often masks absence of real computation.
Split Brains Aren’t Always Split
The analogy just got complicated.
December 2024: UC Santa Barbara researchers published something that challenges 60 years of assumptions. Even a tiny remnant of intact corpus callosum fibers – just a small ribbon among 250 million total connections – restores unified consciousness.
One patient tested six years post-surgery showed completely normal brain integration despite having almost no callosal connection. The brain had rerouted networks through that tiny remaining pathway.
LLMs? No such flexibility. Attention mechanisms connect layers, but those connections are fixed at training time. No “healing” or rerouting. Once the architecture is set, disconnection is permanent.
Are LLMs actually more disconnected than split-brain patients?
When Confabulation Doesn’t Matter
Not all confabulation is harmful. Sometimes narrative coherence is exactly what you want.
Use cases where confabulation is fine:
- Creative writing (fiction, marketing copy, brainstorming)
- Generating plausible examples for demonstrations
- Draft content you’ll fact-check anyway
- Stylistic editing where factual accuracy isn’t at stake
Catastrophic use cases:
- Legal research (hallucinated case citations are career-ending)
- Medical advice (obvious reasons)
- Code that handles edge cases (confabulated error handling fails silently)
- Any domain where you can’t easily verify output
A 2024 study found confabulated outputs show higher narrative coherence than truthful ones. They sound better because they’re optimized for flow, not accuracy. That’s the danger.
When NOT to Trust Chain-of-Thought
Counterintuitive: extended thinking modes can make things worse.
Models like Claude 3.7 Sonnet and o1 generate long reasoning traces before answering. Improves performance on math and logic benchmarks. But creates a new attack surface for confabulation.
Skip extended thinking when:
The question is factual recall. More tokens = more chance to wander into confabulation. Direct retrieval is safer.
You’re asking about people, dates, or citations. Chain-of-thought reasoning on factual queries often fabricates supporting “evidence” for whatever the model already believes.
The model shows low confidence. If the model hedges (“possibly,” “might,” “could be”), extended reasoning will just dress up that uncertainty in confident-sounding language.
Use extended thinking for: novel problem-solving, complex multi-step logic, and tasks where you can verify each reasoning step independently.
Humans Confabulate Too
What if this entire article is a confabulation?
Not the facts – those are sourced. But my explanation of why I chose this structure, these examples, this framing? I’m doing exactly what the split-brain patient does: creating a coherent narrative for decisions that happened through processes I don’t have conscious access to.
The uncomfortable insight from split-brain research isn’t that LLMs confabulate. It’s that you confabulate, constantly, and you can’t tell the difference either.
Your brain’s narrative-generating machinery runs automatically. When researchers ask split-brain patients why they made a choice, the left hemisphere doesn’t say “I don’t know” – it can’t. The confabulation is involuntary. Not a failure mode. How the system maintains operational coherence.
LLMs are the same. Not broken when they confabulate. Working exactly as designed – generating the most probable continuation of the token sequence. If that requires inventing a reasoning chain, so be it.
A Testing Framework You Can Use Today
Stop trying to prevent confabulation. Start detecting it.
Layer 1: Structural checks (30 seconds)
- Does the answer contain specific, verifiable claims?
- Can you extract those claims as standalone statements?
- Do the reasoning steps contain actual information or just connecting phrases?
Layer 2: Adversarial prompting (2 minutes)
- Ask the same question with a subtle false premise embedded. Does the model correct you or go along?
- Request the opposite conclusion. Does the reasoning flip, or does the model hold its ground with consistent evidence?
Layer 3: Confidence calibration (ongoing)
- Build a reference set: tasks where you know the ground truth.
- Track when confabulated answers feel confident vs when they actually are.
- You’re learning to read the model’s tells, just like learning to read human confabulation.
Can You Trust This Article?
I’ve cited sources. Linked research. But how do you know I’m not confabulating my interpretation?
You don’t. Not without checking the primary sources.
Real lesson: coherent explanations are not evidence of understanding. LLMs make this visible by doing it at scale, but humans have the same limitation.
Every explanation is potentially a confabulation. The question isn’t whether to trust it – it’s how to verify it independently.
Test Your Prompts Right Now
Take your three most common LLM prompts. Run them through the answer-first test: ask for the conclusion before the reasoning.
Compare the outputs. If the reasoning-first version gives a different answer, you’ve been getting confabulated responses and didn’t know it.
The split-brain patient never discovers their confabulation from the inside. Neither do you, and neither does the model. The only way out is external verification.
Start testing. Your prompts are probably lying to you.
FAQ
Is confabulation the same as hallucination in LLMs?
No. Hallucination implies false sensory experience – seeing something that isn’t there. Confabulation is creating explanations to fill gaps. LLMs don’t have sensory experiences, so “hallucination” is the wrong word. They confabulate: generate plausible text to maintain narrative coherence, exactly like split-brain patients.
How can I tell if an LLM is confabulating versus giving accurate reasoning?
Three quick tests. (1) Ask for the answer first, then reasoning – if they differ from reasoning-first, confabulation is likely. (2) Embed a subtle false premise; if the model incorporates it without pushback, that’s motivated reasoning. (3) Check specificity: real reasoning includes verifiable intermediate steps (actual numbers, specific constraints); confabulation uses generic connecting phrases (“let’s consider,” “we can see that”).
The brutal truth: you often can’t tell from coherence alone. Confabulated text frequently sounds MORE convincing because it’s optimized for narrative flow. A 2024 ACL paper (arXiv:2406.04175) found LLM confabulations display higher narrativity and semantic coherence than truthful outputs. External verification is the only reliable method.
Does this mean we shouldn’t use chain-of-thought prompting?
Use it strategically. Chain-of-thought helps on novel problem-solving where you can verify each step – complex math, logic puzzles, multi-stage planning. Skip it for factual recall, questions about people or citations, and anywhere you can’t independently verify the reasoning. Extended thinking modes generate thousands of tokens, creating more opportunities for inhibition circuits to fail. Recent Anthropic research (January 2025) shows models can fake reasoning that looks faithful but isn’t. Reserve chain-of-thought for tasks where the intermediate steps are themselves checkable, and always verify the final answer against ground truth rather than trusting the reasoning trace.