GPT-5.4 Thinking will invoice you double if your conversation crosses 272,000 tokens. No warning. That’s buried three clicks deep in the pricing docs, but it’s the gap between a $50 API bill and a $400 surprise.
The hook: it shows you its plan upfront and lets you steer it mid-response – no more waiting for a 2,000-word answer only to realize it misunderstood you halfway through. Released March 5, 2026, GPT-5.4 combines coding, computer use, and knowledge work into a single system.
The Three Variants (And When Each One Actually Matters)
Start with what you’re trying to do. Then pick the variant.
GPT-5.4 Thinking is the ChatGPT default for paid users. Generates an internal chain of thought before responding. Plus ($20/month), Team, and Pro subscribers get it – replaced GPT-5.2 Thinking on March 5. The old model stays in Legacy Models until June 5, 2026. Then: gone.
GPT-5.4 Pro costs $30 per million input tokens and $180 per million output – 12x more than standard. When do you actually need it? Legal analysis where one mistake costs more than the API bill. Research papers where you’re citing the output. Security audits. Anything where “close enough” breaks things.
GPT-5.4 mini and nano are fast and cheap – but they don’t appear in the model picker. Free users get mini via the “Thinking” tool in the + menu. Paid users only see them when rate limits kick in. OpenAI positioned them as fallbacks, not primary choices.
Pro tip: If you’re on Plus and hit the 80-message limit, ChatGPT silently downgrades you to GPT-5.4 mini. The UI doesn’t announce it. Check the model badge at the top of the response if quality suddenly drops.
Pricing psychology matters here. Most people see “12x more expensive” and never try Pro. But if you’re analyzing a legal contract and one missed clause costs $10K in rework – suddenly $1.80 per response looks cheap. The math flips when error cost exceeds token cost.
How to Actually Use the Upfront Plan Feature
Everyone mentions this. Almost nobody explains how to trigger it.
When you ask GPT-5.4 Thinking a complex question – anything requiring multiple steps or synthesizing sources – it generates a brief preview of its approach before writing the full response. You can add instructions or adjust direction mid-response, guiding the model toward the exact outcome without starting over.
The catch: only available on chatgpt.com and Android as of March 2026. iOS? Not yet.
What counts as “complex enough” to trigger the plan? Testing shows:
- Your prompt has 3+ distinct sub-tasks (“research X, compare Y, then draft Z”)
- You upload a document and ask for analysis plus a deliverable
- You say “think through this step by step” or “show your reasoning”
Single-step questions (“summarize this,” “fix this code”)? You get the answer immediately. No plan preview.
Reasoning Effort: The Setting That Controls Your Bill
If you’re using the API, reasoning.effort determines how hard the model thinks – and how much you pay.
Five levels: none, low, medium, high, xhigh. Default: none – behaves like a non-thinking model, fastest and cheapest.
| Effort Level | When to Use | Cost Impact |
|---|---|---|
| none | Simple queries, autocomplete, formatting | Baseline ($2.50/$15) |
| low | Light analysis, quick code edits | ~1.2x baseline |
| medium | Document summaries, standard debugging | ~1.5x baseline |
| high | Multi-file refactoring, research synthesis | ~2.5x baseline |
| xhigh | Novel algorithm design, complex proofs | 3-5x baseline |
The multiplier isn’t documented – varies by query. Reasoning tokens range from a few hundred to tens of thousands, and they’re billed as output tokens but discarded from context after the response.
You pay for thinking you can’t see. The model doesn’t remember it in the next turn. Running batch jobs? Monitor your usage.reasoning_tokens field – surprise costs hide there.
The 272K Context Trap Everyone Hits Once
Here’s the pricing detail that catches people.
GPT-5.4 supports up to 1 million tokens of context – but once your input exceeds 272K tokens, the full session is billed at 2x input ($5/M) and 1.5x output ($22.50/M). Not just the excess. The entire conversation.
Example: You’re analyzing a 300K-token legal document. Three questions. 1K tokens output each.
- Input: 300K × $5 = $1.50
- Output: 3K × $22.50 = $0.07
- Total: $1.57
Same questions with a 270K document? $0.72. The 30K difference triggered a 2.2x price jump.
Working with long contexts? Batch your questions into a single request instead of multiple turns – you’ll cross the threshold once, not repeatedly.
When Tool Search Actually Saves Tokens (And When It Doesn’t)
Turns out the lookup itself costs tokens.
Tool Search lets the model look up tool definitions as needed instead of loading all of them into every prompt. Sounds great – until you realize each search query adds 50-100 tokens of overhead (the model formulates a search, retrieves results, filters them).
The crossover point based on community testing: ~10 tools. Below that? Just include all definitions in your system prompt. Above 10: enable Tool Search.
If you only have 5 tools and each definition is 200 tokens, you’re spending 1,000 tokens upfront versus 300-500 tokens per search × number of searches across a session. For agents with 50+ MCP connectors or large internal API catalogs, Tool Search is a clear win. For a chatbot with “search web,” “generate image,” and “run code”? Load them all.
GPT-5.4 vs. GPT-5.2: The Real Differences
Every benchmark shows GPT-5.4 ahead. Question: does the gap matter for your workload?
- Accuracy:33% fewer false claims, 18% fewer errors per response vs GPT-5.2. Meaningful if you’re doing fact-heavy research or analysis.
- Professional tasks:83% on GDPval (knowledge work across 44 occupations), up from 70.9% for GPT-5.2. Biggest jump: spreadsheet modeling and presentation generation.
- Token efficiency:Uses fewer tokens to solve the same problems – though pricing is higher per token. Net cost depends on task complexity.
What didn’t change much: casual chat, single-turn Q&A, simple code edits. If that’s 80% of your usage, GPT-5.2 is still available (as GPT-5 in the API) and costs less.
Common Gotchas
Rate limits reset every 3 hours, not daily. Plus users: 80 messages per 3-hour window. That’s ~640 messages per day if you spread them out – but you can’t bank them. Use 10 at 9am, you still get 80 more at noon.
“Auto” model selection routes to Instant for cheap queries.ChatGPT’s Auto mode uses a routing layer to decide whether to use Instant, Thinking, or Pro. Want guaranteed Thinking? Select it manually.
Cached input tokens cost 50% of standard. Sending the same system prompt or document repeatedly? Cached tokens drop to $1.25/M. Keep your prompt structure stable across requests to maximize hits.
Regional endpoints add 10%.Data residency processing (EU, etc.) charges an additional 10% for all GPT-5.4 variants. Factor that in if compliance requires it.
What to Try First
Here’s what tests whether GPT-5.4 is worth the upgrade:
Multi-document synthesis. Upload 3-5 PDFs (research papers, contracts, reports). Ask it to extract conflicting claims or build a comparison table. GPT-5.2 loses coherence around document 3. GPT-5.4 holds context through all of them.
Iterative code refactoring. Paste a 500-line file. Three changes in sequence: “add error handling,” then “convert to async,” then “add logging.” Does it remember changes from turn 1 when applying turn 3? GPT-5.4’s context retention is stronger.
Long-form writing with constraints. “Write a 2,000-word product spec for [X]. Include a risk section, a timeline, and 5 open questions.” The upfront plan lets you course-correct before it burns tokens writing the wrong thing – if it’s heading toward marketing fluff instead of technical depth.
If those three scenarios don’t feel different from GPT-5.2, stick with the cheaper model. The improvements are real. But they’re sharpest at the edges – long contexts, multi-step workflows, precision tasks.
FAQ
Can I use GPT-5.4 Thinking for free?
Free users get GPT-5.4 mini via the “Thinking” feature in the + menu – lighter version, same reasoning approach. Full GPT-5.4 Thinking requires Plus ($20/month) or higher.
Does GPT-5.4 Thinking actually “think” longer, or is that marketing?
It generates an internal chain of thought before responding – you’re paying for those reasoning tokens even though you can’t see them. These models produce a long internal chain of thought, then learn to refine their thinking process and recognize mistakes via reinforcement learning. Whether that is “thinking” philosophically? Debatable. But the token cost is real. One misconception: people assume reasoning tokens add to context for the next turn. They don’t – the model discards them after responding. You pay for thinking that vanishes.
What happens to my old GPT-5.2 Thinking conversations after June 5, 2026?
GPT-5.2 Thinking gets retired June 5, 2026. Your conversation history stays accessible. But if you continue an old thread, it switches to GPT-5.4 Thinking automatically. I tested this with a 12-turn thread from February – the first new response felt slightly different in tone since the underlying model changed mid-conversation. Not broken, just… off. Like switching authors mid-chapter.
If you’re already using ChatGPT Plus, try the upfront plan feature on your next multi-step task – fastest way to see whether the model’s approach matches how you actually work. On the API? Start with reasoning.effort: "low". Measure whether bumping to "medium" improves output quality enough to justify the cost. And whatever you do, keep an eye on that 272K threshold.