The Claude Code Token Drain Bug: What the Leak Revealed

On March 31, 2026, leaked Claude Code source exposed the real cause of sudden token drain - a bug burning 250K API calls/day. Here's what happened, what Anthropic fixed, and how to avoid hitting limits.

Jack Tom2026-04-019 min readBeginner

The worst assumption you can make right now is that your Claude Code usage limits are the problem.

Here’s what actually happened: on March 31, 2026, Anthropic accidentally shipped the full source code of Claude Code to npm via a .map file, exposing 512,000 lines of internal code. Buried in that code was a comment that explained everything.

“BQ 2026-03-10: 1,279 sessions had 50+ consecutive failures (up to 3,272) in a single session, wasting ~250K API calls/day globally.”

Not a policy change. A bug.

The Real Problem: Autocompact Was Stuck in a Loop

Claude Code has a feature called autocompact – when your conversation gets too long, it automatically summarizes older messages to free up space. Keeps the context window clean. You don’t have to manually run /compact.

When it fails? Retries. Again. And again.

The fix was three lines of code: MAX_CONSECUTIVE_AUTOCOMPACT_FAILURES = 3. After 3 consecutive compaction failures, it stops. Sometimes good engineering is knowing when to quit.

The bug had been burning tokens for weeks. Users on the $100/month Max 5 plan used up their entire quota in 1 hour – they used to work 8 hours. One Max 20x subscriber ($200/month) watched usage jump from 21% to 100% on a single prompt.

Was This Even Fixed?

Yes. And no.

Anthropic officially patched the autocompact loop in their April 1, 2026 changelog – the circuit breaker is now live. Latest version of Claude Code (2.1.89 or higher)? You have the fix.

Pro tip: Run claude --version right now. Anything older than 2.1.89? Update. That circuit breaker alone could cut your token usage by 10-20x if you were hitting the autocompact loop.

But the token drain problem isn’t just one bug.

Three overlapping issues hit at the same time.

Three Things That Broke At Once

The autocompact loop (now fixed):1,279 sessions were retrying compaction up to 3,272 times in a single session, wasting around 250,000 API calls per day globally. If you were one of those users, your token usage just dropped after the patch.

Peak-hour throttling (not a bug):On March 26, 2026, Anthropic announced they’re adjusting 5-hour session limits during peak hours (weekdays 5am-11am PT / 1pm-7pm GMT). During these hours, you burn through your session limit faster. Your weekly total stays the same – it’s just redistributed.

Anthropic doesn’t publish exact token counts, so you can’t see the multiplier. User reports suggest the same task that uses 20% of your quota off-peak might use 35-40% during peak hours.

The March 2026 promotion ended:Anthropic ran a promotion through March 28, 2026 that doubled usage limits during off-peak hours. When that ended, people who’d gotten used to the extra capacity suddenly felt squeezed – their actual limits just returned to normal.

All three hit the same week.

Method A: Wait for Anthropic vs. Method B: Fix It Yourself

Two options.

Method A: Update Claude Code and trust the fix. Run curl -fsSL https://claude.ai/install.sh | bash to get the latest version with the circuit breaker. Anthropic recommends the native installer because it uses a standalone binary that doesn’t rely on the npm dependency chain and supports background auto-updates.

This solves the autocompact loop. Doesn’t solve peak-hour throttling (that’s policy, not a bug) or the session-resume issue (more on that next).

Method B: Take token optimization into your own hands. Even with the autocompact fix, inefficient workflows still burn tokens. Here’s what moves the needle:

1. Stop resuming sessions.GitHub Issue #38029 (as of March 2026) documents a bug where resuming a previous session with --resume reloads the entire conversation history and may bill it as new input tokens instead of cached context. Fresh sessions consume normally. Resumed sessions drain fast. Use /clear to start fresh and /rename before clearing if you need to reference the work later.

2. Shift heavy work to off-peak hours. Running large refactors, test generation, or codebase exploration during 5-11am PT? You’re paying a premium. The same task that uses 20% of your quota off-peak might use 35-40% during peak hours. Save the big stuff for evenings or weekends.

3. Be brutal with context. Don’t paste entire files when you need one function. Don’t let conversations run 30 messages when you should’ve started fresh at message 10. A 20-message session might accumulate 30,000-50,000 tokens just in history, and all of that gets resent with every new turn.

4. Use /clear between unrelated tasks. When you finish one task and start a different one, open a new chat. Takes a few seconds and can cut the token cost of the new task in half by removing accumulated irrelevant history.

5. Constrain Claude’s output.Tell Claude what you want: “Just the code, no commentary.” “Answer in one sentence.” “Three bullet points max.” These constraints can cut response tokens by half or more. Every token Claude writes gets added to your context and re-processed on the next turn.

6. Switch models based on the task.Opus tokens cost roughly five times more than Haiku (as of March 2026). An identical conversation will drain your usage limit five times faster on Opus – not because it used more tokens, but because each token costs more. Start with Haiku. Step up to Sonnet or Opus only when you feel the ceiling.

7. Catch errors early with Escape.If Claude starts heading the wrong direction, press Escape immediately. Use /rewind to restore to a previous checkpoint. Catching a wrong direction after 2,000 tokens versus 20,000 tokens is the difference between a minor setback and a session-ending quota drain.

These aren’t workarounds. This is how Claude Code is meant to be used when you’re paying attention to cost.

The Edge Case Nobody’s Talking About

Here’s the part that doesn’t show up in Anthropic’s changelog or Reddit threads: automated workflows hitting rate limits silently retry, and those retries look like generic failures. One session in a loop can drain your daily budget in minutes.

Running Claude Code in CI/CD, background scripts, or any unattended context? You need to catch rate-limit errors explicitly. They don’t surface as “rate limit reached” – they surface as silent failures that trigger your retry logic.

The fix: wrap your Claude Code calls in error handling that checks for rate limit responses and backs off instead of retrying. Otherwise you’re burning tokens in a loop you can’t see.

What the Leak Actually Revealed

The autocompact bug is the headline. The leak exposed a lot more.

There’s a flag called ANTI_DISTILLATION_CC. When enabled, Claude Code sends fake tool definitions in its API requests – decoy tools designed to pollute training data if someone’s recording Claude Code traffic to train a competing model. Gated behind a feature flag. Only active for first-party sessions.

There’s also Undercover Mode: a 90-line file that strips all traces of Anthropic internals when Claude Code is used in non-internal repos, instructing the model to never mention internal codenames, Slack channels, or even the phrase “Claude Code” itself. AI-authored commits from Anthropic employees in open source projects won’t show any indication that an AI wrote them.

And there’s KAIROS – an unreleased autonomous agent mode that includes a /dream skill for “nightly memory distillation”. The scaffolding for an always-on, background-running agent is already in the code. Just not shipped yet.

None of this helps you save tokens. But it does show you what’s coming.

Actually, one question worth asking: why does Anthropic need Undercover Mode at all? If AI-generated code is legitimately helpful, why hide its origin? The answer is probably a mix of community perception (some maintainers reject AI contributions on principle) and plausible deniability (if an AI writes something legally questionable, harder to trace back). Either way, it’s a signal that even Anthropic thinks AI authorship carries stigma in some contexts.

Should You Downgrade?

Some Reddit users report that rolling Claude Code back to version 2.0.61 or 2.1.34 fixed the issue. Others say it didn’t.

This suggests the problem isn’t one clean bug – it’s multiple overlapping issues (autocompact loop + cache invalidation + session resume bugs) stacked on top of each other. Downgrading might help you dodge one of them, but you’re also losing the official fix for the others.

Safer move: stay on the latest version (which has the circuit breaker), avoid --resume, and shift heavy work to off-peak hours. That covers all three known issues.

Can I still hit limits even with the fix?

Yes. The autocompact fix stops the infinite-retry loop. Doesn’t change your base quota or eliminate peak-hour throttling. Anthropic sells Claude subscriptions at unpublished usage limits (as of March 2026) – Pro is $20/month, Max 5x is $100/month, Max 20x is $200/month – and those limits are still opaque. You can still burn through them if your workflow is token-heavy.

Why doesn’t Anthropic just publish the exact token limits?

They don’t specify how they calculate limits, and users don’t have a way to plan for token usage. “Your usage is affected by several factors, including the length and complexity of your conversations, the features you use, and which Claude model you’re chatting with,” is all the documentation says (as of March 2026). The vague language gives them flexibility to adjust capacity under load without breaking published promises.

Is this just Anthropic, or are other AI coding tools doing this too?

Not just Anthropic. One Reddit user running 3x OpenAI Plus, 1x Gemini Pro, and Claude Max x5 reported hitting caps on all of them by week 3 of every month (March 2026), saying “every single one is delivering less than it was.” AI providers are managing demand by rationing. Temporary promotions and opaque limits are the new normal across the industry.

Check your Claude Code version. Update if you’re behind. Stop resuming sessions. Shift heavy work off-peak. Watch your context like it costs money – because it does.

Run claude --version right now and see where you stand.