Uber’s Claude Code Bill Exploded – How to Avoid It

Uber blew its 2026 AI budget on Claude Code in 4 months. Here's how to actually use it without doing the same - config, commands, and traps.

Riley Brooks2026-05-028 min readBeginner

Goal: end the week with a working Claude Code setup where one engineer can ship features all day and stay under $10/day on API – or never blow past a Max subscription. The opposite of what just happened at Uber.

The Information broke the story: Uber spent its entire 2026 AI budget in just four months on Claude Code and Cursor, with engineers reporting monthly API costs between $500 and $2,000 per person. CTO Praveen Neppalli Naga rolled out Claude Code in December 2025, usage doubled by February, and by April he was “back to the drawing board.” By that point 84% of Uber’s roughly 5,000 Claude Code users were classified as agentic coding users – which matters a lot more than it sounds, as you’ll see.

Here’s the part nobody is saying out loud: Uber’s per-engineer numbers are roughly 4-10x what Anthropic itself reports as typical. The tool isn’t broken. The defaults are. This tutorial walks you through the setup that keeps you on the cheap end of that distribution.

What “normal” actually costs (so you know when you’re off the rails)

Calibrate first. Anthropic’s cost docs (as of mid-2026) put the enterprise average at around $13 per developer per active day and $150-250 per developer per month. If you’re seeing $50/day as a solo developer, something in your workflow is leaking.

Profile	Expected daily spend (API)	Likely tier
Light user, mostly chat-style edits	$2-6	Pro $20/mo
Full-time builder, good habits	$5-15	Max $100-200/mo
Agent teams + auto-accept all day	$30-100+	API or custom

The Uber engineers hitting $500-$2,000/month weren’t using Claude Code the way most tutorials describe. Agent Teams spawn multiple Claude Code instances, each with its own context window – according to Anthropic’s cost docs, a 3-agent team uses roughly 7x more tokens than a standard single-agent session. Multiply that by parallel runs across a monorepo and the math stops being subtle.

The 10-minute setup that prevents the Uber outcome

Do these in order. Each one is one-time work that pays back every session.

1. Drop a `.claudeignore` in your repo root

The single highest-use file you’ll write. Without it, Claude reads node_modules, build artifacts, and lockfiles into context – and you pay for every token.

node_modules/
dist/
.next/
build/
coverage/
*.lock
*.log
*.min.js
public/assets/
data/*.csv

2. Write a tight `CLAUDE.md` (under 50 lines)

This file gets cached. Cache reads cost 10% of the standard input price (Anthropic pricing docs, as of mid-2026) – so a well-structured CLAUDE.md basically pays for itself after the second read of the day. Keep it to: stack, conventions, where things live, what to never touch.

3. Cap thinking tokens globally

export MAX_THINKING_TOKENS=8000

Extended thinking is on by default and those tokens are billed as output, which runs at 5x input pricing. Default budgets can hit tens of thousands of tokens per request on a simple edit. Community guides (as of early 2026) recommend 8,000-10,000 as a sane default.

4. Default to Sonnet, escalate to Opus by hand

As of mid-2026, Claude Sonnet 4.6 costs $3 per million input tokens and $15 per million output. Opus 4.6 is $15 input and $75 output – five times the price on input, five times on output. Use /model sonnet as your default. Switch with /model opus only when you actually need architectural reasoning. Turns out most tasks don’t.

The four commands you’ll use every session

/cost – current session spend in dollars (API users only; subscribers see /usage instead)
/context – shows what’s eating your context window right now
/compact – summarizes the conversation so far and replaces the raw history
/clear – full reset for a new task

The catch: /compact and /clear aren’t interchangeable. The right one depends on how long you’ve been idle – and that’s where most people bleed money without realizing it.

Common pitfalls (the ones nobody warned Uber about)

These are the gotchas buried in pricing pages and bug reports. Each one quietly multiplies your bill.

Coming back from lunch is expensive. After more than an hour idle, the entire conversation is resent as cache_creation, billed at standard input rates rather than cached-read rates (per Finout’s 2026 pricing analysis). A session that “feels already loaded” is often getting billed for the full prefix again. The fix: checkpoint your context to a doc, run /clear, start fresh. Don’t run /compact on a cold cache – you’re paying full price to summarize stale tokens.

The 200K cliff. When your input exceeds 200K tokens in a single request, the rate doubles – for Sonnet 4.6 that’s $3 → $6 input and $15 → $22.50 output (SSDNodes/Anthropic pricing, as of mid-2026). If your repo is large, paste relevant excerpts manually instead of letting Claude read everything.

Autocompact on big context windows misfires. Each compaction can cost 100-200K tokens. On Opus with the 1M context window, autocompact has been reported to fire at 76K tokens – wasting 92% of the available window before you’ve even used it. Compact manually, on your terms.

/fast looks innocent. Six times the per-token cost. Same model. That’s what /fast actually means, according to Vincent Qiao’s /cost breakdown. Fine for a one-shot rename. Catastrophic for an hour-long agentic loop.

Version pinning matters more than you think. Bad releases happen – users reported 3-50x faster rate-limit consumption starting with Claude Code v2.1.89 in March 2026 (via Finout). If your bill suddenly spikes, check your CLI version before you check your prompts.

Hard limits for when discipline fails

Habits drift. Caps don’t.

# Cap a single non-interactive command at $5
claude -p --max-budget-usd 5.00 "Refactor the auth module"

# Combine with --max-turns for double protection
claude -p --max-budget-usd 10.00 --max-turns 5 "Fix failing tests"

Use these in CI pipelines without exception. A runaway agent in a GitHub Action can do real damage between coffee and standup.

For team-level caps, the Console lets you set workspace spend limits and cap the workspace’s share of your overall API rate limit – protecting other production workloads from a runaway coding session. This is the control Uber would have benefited from setting before the leaderboards went up.

What the numbers look like with good habits vs. without

Anthropic’s own cost data (as of mid-2026) puts a productive full-time developer at $5-15 per day on API pricing. The same workload without the habits above – large context, no .claudeignore, Opus as default, /fast mode on loops – typically runs $20-40 per day. That’s a 3x spread before you’ve touched agent teams.

At Uber’s reported $500-$2,000/engineer/month, the gap is closer to 10x the Anthropic average. That math only works if most of those engineers were running unmanaged agent teams in parallel – not chatting with one Sonnet session. The 7x token multiplier per agent, compounded across thousands of engineers, is how you burn an annual AI budget in four months.

When cutting costs isn’t worth it

If you’re solo and shipping a product that earns more than your Claude bill, stop reading guides and go ship. The opportunity cost of fiddling with .claudeignore entries is higher than the $40 you’ll save this month. The Uber story is a warning for organizations with thousands of engineers and incentive structures pointing at usage. For one developer building one thing, friction in the workflow is the actual enemy.

The exception: if you’re on API billing without a budget cap and you’re about to leave a long-running agent unattended overnight. Then set the limits. Always.

FAQ

Should I use the Max subscription or pay per token on the API?

If you code with Claude every weekday, Max wins. Break-even is somewhere around $200 of monthly API spend for Max 20x.

Why are my costs still high even though I run /compact constantly?

You’re probably compacting after the cache has gone cold. Here’s the scenario: you start a session at 9am, work for 40 minutes, take a meeting, come back at 11am and run /compact on a 90K-token conversation. The cache expired around the 1-hour mark. You just paid full input price to summarize two hours of stale tokens. Run /clear instead and paste a one-paragraph recap from your notes – it’s cheaper and Claude focuses better on a fresh window.

Is the $20 Pro plan enough for serious work?

Quick answer: no, not for full-time agentic development. Pro pools its quota with regular Claude.ai chat usage, so a long planning session in the web app eats into your terminal budget. Most professionals land on Max $100 or $200 within a month of daily use. The real signal is hitting the limit twice in a week – at that point the upgrade pays for itself in recovered focus alone. For a hobbyist doing a couple of hours a week? Pro is probably fine.

Next action: Open your project root right now, create .claudeignore with the snippet above, and run /context in your next session. The number you see is your baseline. Cut it in half this week.