The #1 mistake people make right now: treating token count as a productivity metric. Uber’s COO Andrew Macdonald just said the quiet part loud – and his comment about tokenmaxxing is forcing a lot of teams to rethink how they actually use Claude Code. In the interview making the rounds, Macdonald said it’s getting harder to justify Uber’s AI spend and that higher token usage isn’t translating into a proportional increase in useful consumer features – “That link is not there yet, right?”
This piece isn’t a news recap. It’s the reverse-engineered fix: how to stop tokenmaxxing yourself, with the exact commands and the gotchas competitors skip.
Quick context: what tokenmaxxing actually is
Tokenmaxxing is treating token consumption as a proxy for being “AI-native.” As tokens became a measurable unit of GenAI usage, some organizations started informally ranking engineers by spend. Meta reportedly runs an internal leaderboard called “Claudeonomics” that ranks 85,000 workers by token consumption, handing out titles like “Token Legend.”
The Uber story is the load-bearing example. Their CTO acknowledged the budget planning team couldn’t forecast the adoption curve – Claude Code usage jumped from 32% to 84% of their 5,000 engineers in three months, and they blew through their 2026 Claude Code budget early in the year.
So the question isn’t “should I use AI coding tools.” It’s “how do I get the same output for less.”
There’s something worth sitting with here. Leaderboards like Claudeonomics don’t make engineers better at coding – they make engineers better at appearing to use AI. That’s a cultural problem that no workflow tip will fix. But for individuals who just want to do good work without a surprise invoice, the mechanics are fixable.
The hands-on tutorial: a diagnostic-first workflow
Most token-saving guides hand you a list of 18 tips and let you figure out the order. Wrong approach. You diagnose first, then fix the one thing that’s actually leaking.
Step 1: Run /context before you do anything else
According to Anthropic’s official Claude Code documentation, the /context command shows exactly where your tokens are going: system prompt, tools, memory files, skills, and conversation history.
What most tutorials skip: egghead.io measured that an empty Claude Code session already burns about 17,000 tokens before you type anything. Add multiple MCP servers – Playwright, DeepWiki, and similar – and you’re at roughly 31,000 tokens of pure baseline overhead (the exact number depends on which servers you load). That’s up to 15% of a 200K window gone before any work starts.
Step 2: Trim the baseline (one-time fix)
Look at what /context flagged and act:
- CLAUDE.md too large? Cut it. Every token in CLAUDE.md gets loaded on every session you run – as an illustrative example, 2,000 extra tokens × 20 sessions/day × 20 work days adds up to 800,000 wasted tokens per month. Aim under 1,000 tokens.
- MCP servers you don’t use? Remove them. Tool outputs accumulate in context regardless of how many you have loaded.
- Add a
.claudeignoreso Claude stops readingnode_modules, build artifacts, and lock files.
Step 3: Pick the right model per task
The single biggest cost lever. On API billing, Opus runs 5x the price of Sonnet per token (per community benchmarks as of mid-2025 – check current Anthropic pricing before assuming this ratio holds). Default to Sonnet. Switch with /model.
/model sonnet # day-to-day: edits, tests, refactors
/model opus # complex multi-file architecture, gnarly debugging
/model haiku # quick lookups, formatting, renames
If you regularly use plan mode, there’s a hybrid most tutorials skip. The opusplan alias, documented by ClaudeFast, routes plan-mode prompts to Opus for complex reasoning then automatically switches to Sonnet for code generation. Opus brain for thinking, Sonnet wallet for typing.
Step 4: Use plan mode for anything non-trivial
Press Shift+Tab to enter plan mode before complex work. Claude explores the codebase and proposes an approach for your approval – catching misaligned direction before it becomes a hundred-turn debugging session. The cost benefit isn’t about token type; it’s about avoiding the expensive re-work when initial direction is wrong.
Step 5: /compact at phase boundaries, /clear between tasks
Two different tools for two different jobs:
/compact– summarizes the conversation into a compact baseline and continues. Use it after finishing a discrete phase (“feature X is shipped, moving to Y”)./clear– wipes context entirely. Use it when switching to unrelated work.
The catch: Don’t wait for auto-compact at 95% capacity. MindStudio’s analysis found that context quality starts degrading at around 50% full – not at 100%. By the time you’re at 50%, you’re already paying more for worse output. Run
/compactwhile your cache is still warm and the model still has clean signal.
Pitfalls that look like savings but aren’t
Agent teams in plan mode. That’s the big one. Anthropic’s own docs put it plainly: agent teams burn approximately 7x more tokens than standard sessions when teammates run in plan mode, because each teammate maintains its own full context window as a separate Claude instance. Most tutorials list this as a productivity feature without flagging the multiplier. If you’ve wondered why your bill spiked after trying multi-agent workflows – there’s your answer.
Re-pasting code Claude already read. If Claude has the file in context, naming the file and the line range is enough. Re-pasting just adds input tokens for no new information.
Stuffing CLAUDE.md with project philosophy. It loads on every session. A 5,000-token CLAUDE.md is a 5,000-token tax on every single turn, whether you send 2 messages or 200.
The 1M context window, priced at a premium? That changed. Turns out Anthropic removed the long-context pricing surcharge for Opus 4.6 and Sonnet 4.6 on March 13, 2026 – the 1M window is now generally available at standard rates with no 2x multiplier. Cheaper than it was. But stuffing 800K tokens of irrelevant code into context still hurts quality and latency regardless of price.
What the savings actually look like
Numbers from public sources (as of mid-2025 for the per-day estimates):
| Profile | Daily API spend | Notes |
|---|---|---|
| Default Claude Code, no habits | $20-40/day | systemprompt.io estimate |
| With diagnostic-first workflow | $5-15/day | same work, lean context |
| Enterprise average (Anthropic docs) | ~$13/active day | $150-250/month per developer |
Reported savings with disciplined context management range from 40% to 70% on focused tasks (per BuildToLaunch’s analysis) – that’s the realistic ceiling, not a marketing number. The bigger your team, the more this compounds. Uber’s 5,000-engineer footprint at $30/day vs $13/day is roughly $85,000 a day in difference.
When to just spend the tokens
Cutting spend has a cost too. There are cases where you should just let it run:
- One-off exploratory work. If you’re going to use Claude Code twice this week, the time you’d spend trimming CLAUDE.md isn’t worth it.
- High-stakes architectural decisions. Opus on a 1M context window costs more – and is worth it when getting the wrong answer costs an engineer a week of rework.
- Compliance or audit work where exact quotes matter. When you need the model to cite specific passages from a large policy document, 1M lets you put the whole thing in context at once rather than chunking and hoping nothing falls through the gap.
The Macdonald point isn’t “stop spending.” It’s: tie spend to outcomes. If 25% of your code commits come from Claude Code but you can’t name three features that shipped because of it, that’s the problem worth solving – not the invoice.
Your next action
Open Claude Code right now and run /context on whatever project you’re in. Screenshot the output. If anything other than your active conversation is taking more than 10% of the window, that’s your first fix. Trim CLAUDE.md, drop the MCP servers you’re not using, then start your next session and compare. That single diagnostic, run once a week, beats reading 18 tip listicles.
FAQ
Is tokenmaxxing actually a real strategy or just a meme?
Both. It started as an internet joke about devs flexing their token bills, then became a real management problem when companies like Meta formalized it into leaderboards. The meme became the policy.
If Sonnet is 5x cheaper than Opus, why use Opus at all?
For specific tasks where reasoning depth matters more than typing cost – multi-file refactors, debugging race conditions, architecture trade-offs. A useful split: use opusplan so Opus does the thinking and Sonnet does the implementation. You pay Opus rates for roughly 5% of tokens (planning) and Sonnet rates for the other 95% (generation). For everything else – tests, formatting, straightforward edits – Sonnet handles it fine and the 5x savings compound across a month.
Does /compact actually preserve enough detail to keep working?
Mostly, but it’s lossy. Run it at phase boundaries – a feature is done, tests pass, you commit. Git commits are your real checkpoint if the summary drops something important.