Tokenmaxxing Is Dead: How to Cut Claude Code Spend

Uber's COO just called out tokenmaxxing as wasteful. Here's a hands-on tutorial to cut Claude Code token spend 40-70% without losing output quality.

Jamie Lin2026-05-258 min readBeginner

The #1 mistake people make right now: treating token count as a productivity metric. Uber’s COO Andrew Macdonald just said the quiet part loud – and his comment about tokenmaxxing is forcing a lot of teams to rethink how they actually use Claude Code. In the interview making the rounds, Macdonald said it’s getting harder to justify Uber’s AI spend and that higher token usage isn’t translating into a proportional increase in useful consumer features – “That link is not there yet, right?”

This piece isn’t a news recap. It’s the reverse-engineered fix: how to stop tokenmaxxing yourself, with the exact commands and the gotchas competitors skip.

Quick context: what tokenmaxxing actually is

Tokenmaxxing is treating token consumption as a proxy for being “AI-native.” As tokens became a measurable unit of GenAI usage, some organizations started informally ranking engineers by spend. Meta reportedly runs an internal leaderboard called “Claudeonomics” that ranks 85,000 workers by token consumption, handing out titles like “Token Legend.”

The Uber story is the load-bearing example. Their CTO acknowledged the budget planning team couldn’t forecast the adoption curve – Claude Code usage jumped from 32% to 84% of their 5,000 engineers in three months, and they blew through their 2026 Claude Code budget early in the year.

So the question isn’t “should I use AI coding tools.” It’s “how do I get the same output for less.”

There’s something worth sitting with here. Leaderboards like Claudeonomics don’t make engineers better at coding – they make engineers better at appearing to use AI. That’s a cultural problem that no workflow tip will fix. But for individuals who just want to do good work without a surprise invoice, the mechanics are fixable.

The hands-on tutorial: a diagnostic-first workflow

Most token-saving guides hand you a list of 18 tips and let you figure out the order. Wrong approach. You diagnose first, then fix the one thing that’s actually leaking.

Step 1: Run /context before you do anything else

According to Anthropic’s official Claude Code documentation, the /context command shows exactly where your tokens are going: system prompt, tools, memory files, skills, and conversation history.

What most tutorials skip: egghead.io measured that an empty Claude Code session already burns about 17,000 tokens before you type anything. Add multiple MCP servers – Playwright, DeepWiki, and similar – and you’re at roughly 31,000 tokens of pure baseline overhead (the exact number depends on which servers you load). That’s up to 15% of a 200K window gone before any work starts.

Step 2: Trim the baseline (one-time fix)

Look at what /context flagged and act:

CLAUDE.md too large? Cut it. Every token in CLAUDE.md gets loaded on every session you run – as an illustrative example, 2,000 extra tokens × 20 sessions/day × 20 work days adds up to 800,000 wasted tokens per month. Aim under 1,000 tokens.
MCP servers you don’t use? Remove them. Tool outputs accumulate in context regardless of how many you have loaded.
Add a .claudeignore so Claude stops reading node_modules, build artifacts, and lock files.

Step 3: Pick the right model per task

The single biggest cost lever. On API billing, Opus runs 5x the price of Sonnet per token (per community benchmarks as of mid-2025 – check current Anthropic pricing before assuming this ratio holds). Default to Sonnet. Switch with /model.

/model sonnet # day-to-day: edits, tests, refactors
/model opus # complex multi-file architecture, gnarly debugging
/model haiku # quick lookups, formatting, renames

If you regularly use plan mode, there’s a hybrid most tutorials skip. The opusplan alias, documented by ClaudeFast, routes plan-mode prompts to Opus for complex reasoning then automatically switches to Sonnet for code generation. Opus brain for thinking, Sonnet wallet for typing.

Step 4: Use plan mode for anything non-trivial

Press Shift+Tab to enter plan mode before complex work. Claude explores the codebase and proposes an approach for your approval – catching misaligned direction before it becomes a hundred-turn debugging session. The cost benefit isn’t about token type; it’s about avoiding the expensive re-work when initial direction is wrong.

Step 5: /compact at phase boundaries, /clear between tasks

Two different tools for two different jobs:

/compact – summarizes the conversation into a compact baseline and continues. Use it after finishing a discrete phase (“feature X is shipped, moving to Y”).
/clear – wipes context entirely. Use it when switching to unrelated work.

The catch: Don’t wait for auto-compact at 95% capacity. MindStudio’s analysis found that context quality starts degrading at around 50% full – not at 100%. By the time you’re at 50%, you’re already paying more for worse output. Run /compact while your cache is still warm and the model still has clean signal.

Pitfalls that look like savings but aren’t

Agent teams in plan mode. That’s the big one. Anthropic’s own docs put it plainly: agent teams burn approximately 7x more tokens than standard sessions when teammates run in plan mode, because each teammate maintains its own full context window as a separate Claude instance. Most tutorials list this as a productivity feature without flagging the multiplier. If you’ve wondered why your bill spiked after trying multi-agent workflows – there’s your answer.

Re-pasting code Claude already read. If Claude has the file in context, naming the file and the line range is enough. Re-pasting just adds input tokens for no new information.

Stuffing CLAUDE.md with project philosophy. It loads on every session. A 5,000-token CLAUDE.md is a 5,000-token tax on every single turn, whether you send 2 messages or 200.

The 1M context window, priced at a premium? That changed. Turns out Anthropic removed the long-context pricing surcharge for Opus 4.6 and Sonnet 4.6 on March 13, 2026 – the 1M window is now generally available at standard rates with no 2x multiplier. Cheaper than it was. But stuffing 800K tokens of irrelevant code into context still hurts quality and latency regardless of price.

What the savings actually look like

Numbers from public sources (as of mid-2025 for the per-day estimates):

Profile	Daily API spend	Notes
Default Claude Code, no habits	$20-40/day	systemprompt.io estimate
With diagnostic-first workflow	$5-15/day	same work, lean context
Enterprise average (Anthropic docs)	~$13/active day	$150-250/month per developer

Reported savings with disciplined context management range from 40% to 70% on focused tasks (per BuildToLaunch’s analysis) – that’s the realistic ceiling, not a marketing number. The bigger your team, the more this compounds. Uber’s 5,000-engineer footprint at $30/day vs $13/day is roughly $85,000 a day in difference.

When to just spend the tokens

Cutting spend has a cost too. There are cases where you should just let it run:

One-off exploratory work. If you’re going to use Claude Code twice this week, the time you’d spend trimming CLAUDE.md isn’t worth it.
High-stakes architectural decisions. Opus on a 1M context window costs more – and is worth it when getting the wrong answer costs an engineer a week of rework.
Compliance or audit work where exact quotes matter. When you need the model to cite specific passages from a large policy document, 1M lets you put the whole thing in context at once rather than chunking and hoping nothing falls through the gap.

The Macdonald point isn’t “stop spending.” It’s: tie spend to outcomes. If 25% of your code commits come from Claude Code but you can’t name three features that shipped because of it, that’s the problem worth solving – not the invoice.

Your next action

Open Claude Code right now and run /context on whatever project you’re in. Screenshot the output. If anything other than your active conversation is taking more than 10% of the window, that’s your first fix. Trim CLAUDE.md, drop the MCP servers you’re not using, then start your next session and compare. That single diagnostic, run once a week, beats reading 18 tip listicles.

FAQ

Is tokenmaxxing actually a real strategy or just a meme?

Both. It started as an internet joke about devs flexing their token bills, then became a real management problem when companies like Meta formalized it into leaderboards. The meme became the policy.

If Sonnet is 5x cheaper than Opus, why use Opus at all?

For specific tasks where reasoning depth matters more than typing cost – multi-file refactors, debugging race conditions, architecture trade-offs. A useful split: use opusplan so Opus does the thinking and Sonnet does the implementation. You pay Opus rates for roughly 5% of tokens (planning) and Sonnet rates for the other 95% (generation). For everything else – tests, formatting, straightforward edits – Sonnet handles it fine and the 5x savings compound across a month.

Does /compact actually preserve enough detail to keep working?

Mostly, but it’s lossy. Run it at phase boundaries – a feature is done, tests pass, you commit. Git commits are your real checkpoint if the summary drops something important.