Why does Claude burn through your token allowance in two prompts after a 10-hour break? That’s the question blowing up Hacker News this week, after a developer’s post titled I Cancelled Claude: Token Issues, Declining Quality, and Poor Support went viral. The frustration is real – and if you’ve felt it, you’re not imagining things. But cancellation isn’t the only option. Once you understand why the token math feels broken, you can squeeze 3-5x more out of the same plan.
This is a hands-on fix guide, not a hot take. We’ll diagnose what actually happened in that viral post, then walk through the settings that change the burn rate.
The viral post: what actually broke
The author describes a scenario most heavy users will recognize. In his account, he sent two simple, unrelated questions to Claude Haiku after a long break, and watched usage jump to 100%. Later, asking Opus to refactor a project ate roughly 50% of his five-hour allowance – partly because Opus produced a lazy workaround, which he had to reject and re-prompt.
Two things are happening here, and they’re connected.
First: usage isn’t measured in messages. It’s measured in tokens, and tokens include everything the model re-reads on every turn – system prompt, tool definitions, conversation history, attached files, MCP server schemas. One developer who instrumented his sessions found that 98.5% of his tokens went to re-reading history, and only 1.5% to actual output. A 20-message session burns ~105,000 tokens; a 30-message session burns ~232,000.
Second: Claude Code is built on prompt caching. When the cache hits, you pay roughly 10% of normal input cost. When it misses, you pay full price for the entire context. Thariq Shihipar, an engineer on the Claude Code team, wrote that the entire product is architected around this – they alert internally when cache hit rates drop. Switching models mid-session, editing the system prompt, or even putting a timestamp in the wrong place invalidates the prefix and forces a full reprocess. That’s how 50% of an allowance disappears on one refactor.
What the new limits actually say
Some context for the timing. On July 28, 2025, Anthropic announced new weekly rate limits on top of the existing 5-hour rolling window, effective August 28, 2025. The company said the new caps would affect less than 5% of subscribers. The same week, Slashdot noted Claude Code had hit at least seven outages in the prior month.
| Plan | 5-hour window | Weekly Sonnet | Weekly Opus |
|---|---|---|---|
| Pro ($20/mo) | ~45 messages | 40-80 hrs | limited / web only |
| Max 5x ($100/mo) | ~225 messages | 140-280 hrs | 15-35 hrs |
| Max 20x ($200/mo) | ~900 messages | 240-480 hrs | 24-40 hrs |
Figures above are as of July 2025 per TechCrunch and Geeky Gadgets – check Anthropic’s help center for current values. One detail every quota table glosses over: per that same page, your usage across claude.ai, Claude Code, and Claude Desktop counts toward the same bucket. Long browser conversations eat your CLI allowance and vice versa.
Five settings that actually cut your burn rate
This is where the article diverges from every other “Claude tips” piece. Generic advice (“use shorter prompts”) ignores the real mechanic. The fixes below all target one thing: keeping the prompt cache alive.
- Stop switching models mid-session. If you start in Sonnet 4.5 and flip to Opus to “think harder,” you’ve just busted the cache. Pick a model at the start of a session with
/modeland stick with it. If you genuinely need Opus for planning, do the planning in a separate Claude.ai chat and bring the result into your Code session as text. - Run /compact at phase boundaries, not when the warning appears. Auto-compact triggers near the 200K context wall. Manual
/compactafter finishing a feature is cleaner – you control what gets summarized. Don’t use/clearunless you’re switching tasks entirely; clearing throws away the cached prefix. - Use Projects for any document you reference more than once. Projects use retrieval – they pull only relevant chunks instead of stuffing the whole PDF into context. If you re-upload the same brand guide every chat, you’re paying for it every chat.
- Disable MCP servers and tools you aren’t using right now. Tool definitions sit in the system prompt. Every loaded MCP server adds tokens to every turn, cached or not, and schema changes can invalidate caches across sessions.
- Use /usage to actually see what you have left. Don’t guess. The CLI shows real consumption and reset time. The web UI hides this under Settings → Usage.
Pro tip: When you’re correcting Claude (“no, I meant…”), don’t send a new message – scroll up and edit the original prompt instead. A new message stacks on the full history. An edit forks from that point, and Claude only re-processes from the edit forward.
Advanced: the cache invalidation traps no tutorial mentions
If you’ve followed the basics above and your sessions still feel short, you’re probably hitting one of these. They come straight from the engineering team’s own writeup, but they aren’t in the user-facing docs.
# Things that look harmless but bust the cache:
- Adding/removing an MCP server mid-session
- Putting the current timestamp in your CLAUDE.md
- Reordering tool definitions (non-deterministic loaders)
- Updating tool parameters mid-conversation
- Switching from Sonnet to Opus (or back)
# Things that DON'T bust the cache:
- /compact (the cached prefix is reused for the compaction request itself)
- Adding new files to Projects (RAG, not context)
- Long pauses (cache entries expire - see note below)
That last point about pauses matters more than people realize. Per Anthropic’s prompt caching documentation (verify current values at platform.claude.com, as cache lifetime policies may change), cached entries have a minimum lifetime of around 5 minutes by default. Step away for coffee, come back, and your next message may pay full input cost on the entire conversation context – exactly the “two questions burned 100%” scenario the viral post describes.
What you can’t fix (the honest part)
Some of the complaints in the cancellation post don’t have workarounds. Worth being upfront about them.
Support escalation is genuinely thin. The author describes contacting an AI bot, then receiving an automated email and the channel closing. TrueFoundry’s analysis confirms the structural issue: support teams can’t manually reset or extend quotas. If you hit a wall, your only options are waiting for the reset, buying extra usage at API rates, or switching tools.
Reset times can also lie. An open GitHub issue (#9236) on the anthropics/claude-code repo reports the “limit will reset at 3pm” message persisting for almost 24 hours instead of the 5 hours it implies. There’s no documented workaround – community reports suggest waiting it out or using /logout and /login to refresh credentials.
And then there’s quality. Was Opus really worse this month than last? It might be. It might also be that your cache miss rate quietly climbed because you added two MCP servers. Nobody outside Anthropic knows for certain, and complaints have come in waves with each model rollout. Treat “the model got dumber” claims with the same skepticism you’d treat “my Wi-Fi feels slower today.”
So should you cancel?
Probably not – at least not before trying the cache-aware workflow above for a week. The viral post resonated because the frustration is legitimate, but the prescription (cancel and switch) skips the diagnostic step. Fix two things first: stop switching models, and start ending phases with /compact. Most Pro users will stop hitting the wall. If you’re on Pro and still hitting weekly caps after that, that’s the signal to upgrade or move heavy work to the pay-as-you-go API – though actual API costs vary significantly by usage pattern and cache configuration, so check current Anthropic API pricing before committing.
Open /usage right now and screenshot it. Run your normal workflow for a day. Compare. That’s the only number that matters for your specific situation.
FAQ
Does cancelling and resubscribing reset my weekly limit?
No. Weekly limits track on the account, not the subscription cycle.
Why did two simple Haiku questions burn through my budget after a long break?
Almost certainly cache expiry. Cached prefixes expire after a short window (around 5 minutes by default – check Anthropic’s current docs), so a 10-hour break means your next message pays full input cost on the entire conversation context – system prompt, tools, history, files. If those two “simple” questions were sent into an existing long-running session, you weren’t paying for two questions; you were paying to re-process everything before them. Starting a fresh session for unrelated questions avoids this.
Is the API cheaper than a Pro subscription if I’m a heavy user?
It depends entirely on your cache configuration. Anthropic’s own cost breakdowns show a long Opus session running $50-100 without caching versus $10-19 with a 90% cache hit rate – a 5-8x difference. The API gives you direct control over that configuration; a subscription plan does not. Whether the API works out cheaper for your specific usage pattern requires checking current Anthropic API pricing, since rates change and the comparison shifts with each model generation.