Opus 4.6 Just Defaulted to 1M Context – Here’s What That Changes

Anthropic dropped the long-context premium for Opus 4.6 & Sonnet 4.6. Same $5/$25 pricing at 900K tokens as 9K. Here's what works (and what breaks) at scale.

Jack Tom2026-03-168 min readBeginner

Most AI upgrades cost more or break something. Anthropic just shipped one that’s neither. That’s the part nobody’s talking about.

March 13, 2026: Opus 4.6 and Sonnet 4.6 defaulted to 1M context at standard pricing – no long-context premium. $5/$25 per million tokens for Opus, $3/$15 for Sonnet, across the full window: a 900K request costs the same per-token rate as a 9K one. Not a beta. Not gated. Max, Team, or Enterprise? Opus 4.6 sessions default to 1M context automatically.

What the Announcement Skipped

The official blog post: 5x more context, 6x more media, same price. True. What it didn’t say: Max subscribers started hitting quota walls faster than with Opus 4.5 – some before they even realized they’d been upgraded. Quota allocations don’t seem calibrated for 4.6’s consumption.

One thing you should know: users burn through 60% of session limits in 30 minutes of routine coding; some can’t work more than 15 minutes before hitting a wall. Reddit testing: 6-8% session quota per Opus 4.6 prompt vs ~4% on 4.5. Same per-token price. Way more tokens per task.

Why? Adaptive thinking: 2-5x more tokens per request than Opus 4.5. The model thinks harder by default. You pay for those thinking tokens even when they don’t show in the output. Same rate per token. Higher bills.

What 1 Million Tokens Actually Buys

750,000 words. About 10-15 novels, or a 3,000-page manual. For code: an entire medium-sized repo with docs, tests, config – loaded simultaneously.

Media limits: now 600 images or PDF pages per request, up from 100. Design systems, scanned contracts, documentation sets – this matters more than token count for some work.

Pro users:Type /extra-usage in Claude Code to enable 1M context – it doesn’t auto-enable. Max, Team, Enterprise get it by default. Most tutorials skip this.

The real test? Recall. Opus 4.6 scores 78.3% on MRCR v2 at 1M tokens – highest among frontier models at that context length. MRCR hides 8 pieces of key information across 1 million tokens, asks the model to retrieve all. Sonnet 4.5: 18.5% on the same test. 4x gap.

The 40% Threshold

1M tokens available doesn’t mean use all of them. During long Claude Code sessions, users noticed degraded performance well before 50% – around 20% context usage, losing track of earlier decisions, circular reasoning; by 40%, automatic compression kicked in and Claude self-reported 0.6 normalized degradation: ‘noticeably worse’ performance, more repetition, forgetting.

Not a bug. Anthropic calls it ‘context rot’ – performance degradation as windows fill – addressed with automatic context compaction. As a conversation approaches the limit, the API identifies earlier portions that can be summarized, compresses them into condensed state while preserving key information, the model continues with compacted context.

Compaction helps. Doesn’t eliminate it. Many users report best performance keeping context under 700K, some restart around 150K for max coherence.

Old guidance: clear at 100K-120K. With Opus 4.6’s curve, 200K+ is defensible. But you’re trading coherence for convenience. Exploratory work – debugging, research, multi-file refactoring? Hold the session. Precision matters? Restart earlier.

How to Use This Without Burning Quota

Task Type	Context Strategy	Why
Quick fixes, formatting, simple queries	Regular 200K context	Loading 1M for a 3-line bug fix wastes 5x tokens
Multi-file refactoring, codebase analysis	1M context, restart at ~400K	Degradation starts around 40%; compaction overhead adds up
Legal contract review, research synthesis	1M context, batch related tasks	Context loading is expensive – reuse the loaded window
Agent workflows with tool calls	1M context, monitor token count	Each tool call adds to context; long sessions degrade fast

Pattern that works: default to regular context. Switch to 1M only when you genuinely need the full picture. One developer tracked sessions for a month: most expensive sessions weren’t doing complex work – simple tasks with massively inflated context.

Paying attention changes behavior. Before tracking: “Load everything.” After: “$2.40 and I’ve only asked three questions.” A 900K-token Opus 4.6 session: roughly $4.50 in input tokens alone. Fine for one-off research. Not sustainable daily.

Actually, there’s a deeper pattern here: we optimize for the wrong metric. Context capacity feels like a feature to max out. But the real skill? Knowing when not to use it. Like having a truck – great for moving furniture, overkill for groceries. The question isn’t “can I fit this in 1M tokens?” It’s “should I?”

Where This Matters vs Overkill

Load an entire codebase, thousands of pages of contracts, or full agent trace – tool calls, observations, intermediate reasoning – and use it directly. The engineering work, lossy summarization, context clearing that long-context work previously required? No longer needed – full conversation stays intact.

Real use cases from early adopters:

Claude Code: search, re-search, aggregate edge cases, propose fixes – all in one window without compaction killing mid-task
Large diffs that didn’t fit 200K had to be chunked, more passes, loss of cross-file dependencies; with 1M context, feeding full diff yields higher-quality reviews from simpler, more token-efficient use
Cross-referencing a 400-page deposition transcript or surfacing key connections across entire case file: expanded context delivers materially higher-quality answers

Context size isn’t the bottleneck for most tasks. Asking Claude to write a function, fix a typo, explain a concept? Loading 1M tokens is renting a truck to move a chair.

The Pricing Trap

Opus 4.6 replaced binary thinking toggle (on/off) with four effort levels: low, medium, high (default), max. At default (high), Claude almost always thinks; at lower levels, may skip thinking for simpler problems.

The trap: you pay for thinking tokens even when you don’t need them. Simple lookup with default settings burns tokens Claude spends “planning” an answer that should be instant. Model’s smarter, but only if you tell it when NOT to think.

Set effort to “low” for: formatting, quick lookups, simple refactors, FAQ answers. Reserve “high” or “max” for: debugging, architecture decisions, complex refactoring, security audits. Difference compounds over hundreds of requests.

Competitors Playing Catch-Up

Unlike Gemini 3.1 Pro and GPT-5.4, which charge higher rates beyond 200K-272K tokens, Anthropic applies standard pricing across full 1M window. GPT-5.4 pricing doubles when input exceeds 272K. Most powerful GPT-5.4 model (context length >272K): $60/1M input, $270/1M output. Claude Opus 4.6: $5/1M input, $25/1M output.

At standard context (400K), Opus 4.6 is dramatically cheaper. Opus 4.6 scored 76% on MRCR v2 vs Gemini 3 Pro’s 26.3% – stronger performance retaining and reasoning over very large inputs.

The angle isn’t just price. Pricing change makes large document analysis, codebase review, research workflows more practical to budget without breaking work into smaller chunks. You’re not strategizing around quota limits. Just working.

What to Do Now

Max, Team, or Enterprise: you have this. Open Claude Code, select Opus 4.6, check context with /context. You’ll see ~950K tokens available.

Pro: type /extra-usage in Claude Code to enable 1M context. Opt-in, not automatic.

Managing API costs: track token consumption per session. Running a live cost counter (like TokenBar) completely changes how you prompt – behavioral shift is immediate when you see session at $2.40 after three questions. Can’t improve what you can’t see.

Set effort levels explicitly. Don’t let the model burn tokens thinking about tasks that don’t need it. For most daily work, “low” or “medium” effort is enough. Reserve “high” and “max” for the 10% of tasks where deeper reasoning actually changes output.

Restart when performance degrades. The 1M ceiling is theoretical. Practical ceiling: closer to 400K-500K before the model loses the thread. With Opus 4.6’s degradation curve, clearing at 200K or beyond is now defensible depending on use case – if you can clear at 200K, do it; but holding a session past 200K for large codebases or long-running agents no longer requires the workarounds it once did.

FAQ

Does the 1M context window work on the free plan?

No. 1M context is included in Claude Code for Max, Team, and Enterprise users with Opus 4.6. Pro users need /extra-usage. Free plan: standard 200K limit.

Why is my Opus 4.6 session hitting limits faster than Opus 4.5 did?

Opus 4.6’s adaptive thinking generates 2-5x more tokens per request than Opus 4.5. Per-token price didn’t change – tokens-per-task did. Current quotas that worked fine for 4.5 don’t seem calibrated for 4.6’s consumption. If 4.6 is genuinely more capable and that costs more tokens, it would help to acknowledge the consumption difference so users can make informed model choices, consider adjusting quotas, or at minimum document the expected increase.

Lower your effort setting or switch to Sonnet for routine tasks. Also: are you loading 1M context for simple queries? That’s the other trap. Most tasks don’t need the full window. A debugging session that used 200K on 4.5 might burn 400K+ on 4.6 if you’re not paying attention to context size. Check /context before each session, restart when you hit 300K-400K unless you genuinely need the full history.

At what point should I restart a long session to avoid degradation?

Degraded performance around 20% context usage, compression at 40%; Claude self-reported 0.6 normalized degradation showing ‘noticeably worse’ performance. Safe zone: 200K-400K. Beyond that, you’re trading coherence for continuity.

Precision matters? Restart earlier. Need full session history (debugging, research)? Push to 500K-700K, but expect the model to lose track of earlier decisions. Working rule: ~2% effectiveness loss per 100K tokens added in Claude Code, extrapolated from linear degradation between 256K and 1M data points – reasonable assumption, not confirmed. One user’s test: at 600K tokens, Claude started contradicting advice it gave at 150K. At 800K, it forgot the project structure entirely. Your mileage will vary depending on task complexity and how much cross-referencing the model needs to do.