Claude Code Extended Thinking: What the Text Actually Is

Claude Code's extended thinking output isn't the model's raw reasoning - it's a summary from a different model. Here's what that means for debugging.

Taylor Kim2026-06-227 min readBeginner

Two ways to read the gray italic text that appears when Claude Code thinks. Option A: treat it as the model’s actual reasoning – a window into what’s about to happen. Option B: treat it as a press release the model writes about itself. After the discussion that erupted on Hacker News this week around Patrick McCanna’s blog post, Option B is the only defensible read. And once you internalize that, you debug differently.

This guide walks through what the Claude Code extended thinking text actually is, how to inspect it yourself, and the failure modes nobody warned you about – including a bug that can permanently brick a session. It’s a beginner-level tutorial, but the framing assumes you’ve already typed think hard a few times and noticed the gray text scroll by.

The thing that just blew up: it’s a summary, not the reasoning

McCanna went to inspect Claude Code’s session logs and found a lengthy signature and no thinking text. He read the docs and posted a short, sharp argument: the text in Claude Code’s “Extended Thinking” output is not the model’s real reasoning. It’s a summary. Anthropic’s documentation backs him up – but in language that’s easy to skim past.

Here’s the buried part. Per Anthropic’s adaptive thinking docs, summarization is processed by a different model than the one you targeted, and the thinking model never sees the summary. So the gray italic text you’re staring at was written by a second model describing what it thinks the first model was doing. Then the real reasoning gets encrypted into a signature field your machine can’t decrypt.

Pro tip: When you ask a model to explain its own reasoning after the fact, you don’t get reasoning – you get a plausible story. The Claude Code thinking pane is closer to that than to a transcript. Use it as a signal, not as ground truth.

What you can verify on your own machine in 60 seconds

Don’t take my word for it. Open a Claude Code session, run something that triggers extended thinking, exit, then look at the session JSONL on disk. The path lives under your home directory:

~/.claude/projects/<project-slug>/<session-id>.jsonl

Filter for the thinking blocks. Each one has a type: "thinking", a thinking field with the summary text, and a signature field with hundreds (sometimes thousands) of base64 characters. The signature is the encrypted real reasoning. Anthropic holds the key; you don’t.

jq -rc 'select(.type=="assistant")
 | .message.content[]?
 | select(.type=="thinking")
 | [(.thinking|length),(.signature|length)]
 | @tsv' session.jsonl

Two columns: visible-text length, signature length. The signature is usually longer. That’s your proof. On some recent Claude Code versions, the thinking text gets stored as an empty string while the signature persists. If your thinking column reads 0 across the board, that’s a known regression – not your config.

Reading the text the way it deserves to be read

Treat the summary as a hypothesis about what Claude did, not a record. That sounds abstract, so here’s the concrete shift:

Useful: spotting when the summary mentions a file or assumption you didn’t intend. Course-correct with Ctrl+C, redirect, move on.
Useful: catching when the summary says “I’ll modify the middleware first” and you wanted the controller untouched. Stop it before the edit lands.
Not useful: using the summary as an audit trail. The doc language is, charitably, indirect – Anthropic itself flagged the “faithfulness” problem in their original announcement: models often decide based on factors they never discuss in the thinking process.
Not useful: debugging why a wrong answer happened. The summary is generated post-hoc by a different model. It can plausibly describe steps the original model never took.

Much of the tutorial ecosystem calls the gray text “the actual chain of thought.” That framing is wrong on Claude 4-series models. The Ctrl+O verbose toggle still works fine – it just shows you the summarizer’s output, not the thinker’s.

Three traps that don’t appear in any other tutorial

This is the part worth bookmarking. None of these show up in the standard “how to use ultrathink” guides.

Trap	What happens	What to do
Session resume poisoning	Claude Code persists thinking blocks with empty `thinking` text plus the original signature. On resume, the API validates the signature against the (now empty) text and returns a 400 forever. Documented in issue #63147 on Claude Code 2.1.153.	Start a fresh session instead of resuming a broken one. Don’t rely on `--continue` after extended-thinking + tool-use sessions until this lands a fix.
Subscription auth has no summary	The `showThinkingSummaries` setting exists in the Claude Code schema, but the server doesn’t honor `thinking.display` for subscription-authenticated sessions – only API-key sessions get it. Issue #52376.	If your thinking pane is empty and you’re on a Pro/Max subscription, that’s why. An API key swap restores it. As of mid-2026, this is open.
Zero reasoning on “easy” turns	Adaptive thinking at the default `effort: medium` can allocate zero reasoning tokens for turns it considers simple – and Boris Cherny himself acknowledged on HN that those exact turns produced fabricated Stripe API versions, git SHA suffixes, and apt package names.	For correctness-critical work, set `CLAUDE_CODE_EFFORT_LEVEL=max` and `CLAUDE_CODE_DISABLE_ADAPTIVE_THINKING=1`. Verify in your transcript that thinking blocks aren’t empty.

The third one is recent and worth context: Claude Code switched its default effort from high to medium on March 3, 2026, and an AMD engineer’s analysis of 6,852 sessions measured a 67% drop in reasoning compared to the pre-February period. That’s the public number; your mileage will vary, but the direction is clear.

How this compares to what other coding agents show you

One extra verification step – that’s the real cost here. OpenAI’s o-series models don’t expose their reasoning in most consumer surfaces; they show a separate summary stream and label it as such. Claude Code’s UI presents its summary inline as “Thinking…” in italic gray, which is what makes it so easy to mistake for the real thing. The label does the priming.

So: check the JSONL. If the signature is present and the text reads like a clean narrative, you’re looking at a summary. If you need the raw chain of thought for an audit or compliance use case, Anthropic’s extended thinking docs note that full thinking access is gated behind enterprise contact. There’s no flag, no env var, no Pro-tier enable.

FAQ

Is the gray italic “Thinking…” text completely useless, then?

No. It’s useful as a live cancellation signal – if the summary mentions a wrong file or a misread requirement, Ctrl+C and redirect before code lands. Just don’t use it as evidence of what the model actually did.

If I’m building an agent on the API and need to verify reasoning, what do I do?

Set thinking.display: "summarized" explicitly – the default behavior can vary by model version, so don’t assume you’re getting it back without specifying. Then design your verification around outputs and tool calls, not the thinking text. A pattern that works: after each tool call, ask the model to state – in the visible response, not the thinking block – which prior assumption it just confirmed or invalidated. That gives you a record the summarizer model can’t rewrite.

Does this mean Anthropic is being deceptive?

Their docs do say it. They just say it in a place and with phrasing that’s easy to miss – and the Claude Code UI labels the pane “Thinking,” which primes you to read it literally. The Hacker News commenters were less charitable; the official position is that summarization “prevents misuse” and preserves the key ideas. Both can be true at once.

Next: open a Claude Code session right now, run any task that triggers thinking, and pipe the session JSONL through the jq command above. Look at one signature-vs-text pair with your own eyes. That single observation will change how you read the gray italic text from here on.

The thing that just blew up: it’s a summary, not the reasoning

What you can verify on your own machine in 60 seconds

Reading the text the way it deserves to be read

Three traps that don’t appear in any other tutorial

How this compares to what other coding agents show you

FAQ

Is the gray italic “Thinking…” text completely useless, then?

If I’m building an agent on the API and need to verify reasoning, what do I do?

Does this mean Anthropic is being deceptive?

Related Tutorials

RubyLLM Tutorial: One Gem for Every AI Provider (2026)

The Microsoft Quantum Python Bug: A Beginner’s Lesson in Index vs Value

Recall for Claude Code: Local Project Memory Guide