How to Use Claude for Coding Complex Projects: A Real Guide

Most Claude Code tutorials skip the one thing that actually breaks complex projects: context window economics. Here's what works at scale.

Riley Brooks2026-05-149 min readIntermediate

The number one mistake people make when using Claude for coding complex projects: they treat the context window like storage. They keep one long session running, paste in errors, ask follow-up questions, iterate for hours – and wonder why Claude starts forgetting decisions made 30 minutes ago. They blame the model. The model isn’t the problem.

Anthropic’s own best-practices documentation states it plainly: most best practices trace back to a single constraint – Claude’s context window fills up fast, and performance degrades as it fills. Everything else is a footnote.

So this guide reverse-engineers the workflow from that constraint. No install walkthrough, no tier comparison table, no toy todo-app demo. Just the patterns that actually keep Claude useful past the 30-minute mark on real codebases.

The 40% rule (and why your sessions feel dumber than they should)

Here’s the part nobody mentions in beginner tutorials. Claude doesn’t gracefully degrade at 100% context – it starts slipping much, much earlier. A widely-cited GitHub issue filed against the official Claude Code repo documented this in forensic detail on the Opus 4.6 1M context model (as of early 2026): around 20% usage, the user noticed degraded performance – losing track of earlier decisions, circular reasoning, forgetting details from the start of the conversation. By ~40% context usage, automatic message summarization kicked in and removed the scrollback history. By ~48%, Claude itself told the user it was “not being effective” and recommended a fresh session – less than halfway through the advertised 1M context window.

Boris Cherny, who created Claude Code, has confirmed this pattern informally. The “dumb zone” kicks in around ~40% context. Newcomers should keep it under 40%, wrap up by 60%. Experienced users should stay below 30% and only push to 60% on simple tasks (per the shanraisshan best-practices repo summarizing his guidance, as of early 2026).

Think of context as a spotlight, not a closet. A small stage is fully lit. A stadium with the same spotlight is mostly dark.

Stop trusting /compact. Use “Document & Clear” instead

The instinct, when you notice degradation, is to run /compact and keep going. Don’t. Turns out the automatic compaction is opaque and error-prone – Shrivu Shankar’s writeup on Claude Code internals describes it as not well-optimized for complex tasks. There’s a deeper problem: if the model is already confused when compaction fires, the summary preserves the confusion. You’ve just compressed corruption.

The pattern that actually works:

# Before context fills up:
> Dump your current plan and progress to PROGRESS.md.
 Include: completed steps, remaining steps,
 key decisions made, and files touched.

# Then:
/clear

# In the new session:
> Read PROGRESS.md and continue from where we left off.

Have Claude dump its plan and progress into a .md, /clear the state, then start a new session pointing it at that file. It sounds ceremonial. It isn’t – this creates durable external memory that survives session resets. Auto-compaction doesn’t.

There’s also an environment variable worth knowing about. Context rot kicks in around 300-400k tokens on the 1M context model; setting CLAUDE_CODE_AUTO_COMPACT_WINDOW=400000 forces earlier compaction while the model can still reason clearly about what to preserve (tip via howborisusesclaudecode.com, as of early 2026).

Plan Mode is not optional for complex work

Press Shift+Tab twice. You’re in Plan Mode. The mechanism itself is almost embarrassingly simple – a single sentence injected into the model’s context: “Please do not write any code yet.” That’s it. That single instruction shifts Claude from execution mode to structured reasoning. Boris Cherny uses it for roughly 80% of his sessions (per the freeCodeCamp Claude Code Handbook).

The discipline is to actually read the plan before you accept it. Not skim – read. The plan is your last cheap intervention point before tokens start burning on implementation. If the plan looks like it’s going wide before it goes deep, push back. Ask Claude to restructure it so each phase ends with something runnable and testable. That’s worth doing before a single line of code gets written.

Try this: When you ask Claude to plan, also ask it to interview you first. “Before you start planning the auth system, ask me 5 questions about constraints you’d need clarified.” The questions surface assumptions you didn’t realize you were making – and that’s often worth more than the plan itself.

Subagents: the actual reason Claude scales to big codebases

Fresh Claude instances spawned for specific subtasks. The child does the dirty work; only the conclusion returns to the parent.

Ask yourself: “will I need this tool output again, or just the conclusion?” 20 file reads + 12 greps + 3 dead ends stay in the child’s context – only the final report comes back. This is how Boris Cherny keeps his main session below 30% on serious work.

Practical example: instead of asking the main agent to “find every place we call the old auth API,” tell it to use a subagent to scan the repo and return a structured list. The 47 grep results, the false positives, the dead-end traces – none of that pollutes your working session. You get back a clean report.

A real-world flow: refactoring a service nobody fully understands

A refactor on a service you didn’t write, with no good docs, and tests that mostly pass. Here’s the session breakdown:

Session 1 – read-only, target ~15% context. Plan Mode on. Ask Claude to map the service: entry points, side effects, external dependencies, untested branches. Output goes to ARCHITECTURE.md. Then /clear.

Session 2 – planning, target ~20% context. Fresh start. Claude reads ARCHITECTURE.md and your CLAUDE.md. Ask for a refactor plan broken into phases, each independently testable. Have it interview you first. Output to PLAN.md. Then /clear.

Sessions 3+ – execution, one slice per session. Each new session reads PLAN.md, executes one slice, runs tests, commits. /clear between slices. Spawn a subagent inside each session for “find all callers” type searches – keep the grep noise out of your main thread.

Verification pass. Fresh session, fresh context. Ask Claude to review the diff against PLAN.md as if it didn’t write the code. No anchoring bias, no sunk-cost reasoning on its own earlier decisions.

This is slower than “one chat, ask everything.” It’s also the difference between Claude finishing the refactor and Claude hallucinating a function that doesn’t exist on hour three.

The Opus 4.7 gotcha nobody warned you about

If your code review prompts include instructions like “only flag critical issues” or “don’t nitpick” – and you upgraded to Opus 4.7 – you may see Claude reporting fewer issues than before and conclude the model got worse. It probably didn’t.

Anthropic’s docs flag this directly (as of early 2026): when a review prompt says “only report high-severity issues” or “be conservative,” Opus 4.7 follows that instruction more faithfully than earlier models did. It may investigate the code just as thoroughly, identify the bugs, and then not report findings it judges below your stated bar. Same depth of investigation, fewer reported findings – especially on lower-severity issues. The model isn’t missing bugs. It’s filtering them per your instructions.

Tighten your rubric. Tell it what “high-severity” means in your codebase, or just remove the gating instruction.

Pro tips that compound

Rewind, don’t correct. Use double-Esc or /rewind to roll back before a failed attempt and re-prompt with what you learned. Five back-and-forth corrections burn more context than one clean retry – and they leave a trail of “here’s what didn’t work” that quietly degrades the session.

Use the bigger model. A less capable model costs less per token and often costs more per task. Weaker reasoning needs more correction passes, generates more context clarification, and takes longer to converge – the math usually doesn’t favor the cheaper option (Boris Cherny’s observation, summarized via the onmyway133 blog).

Give Claude a feedback loop. Tests, screenshots, build output – something it can check itself against. Without verification, Claude assumes its output is correct. With it, Claude iterates until something works. Boris Cherny estimates a 2-3x quality improvement with a feedback loop in place.

FAQ

Do I really need a CLAUDE.md if I’m just working on one project?

Yes. Run /init once and refine it over time. The real reason isn’t “giving Claude instructions” – it’s surviving the /clear cycle. Every new session needs to get productive fast, and a CLAUDE.md with build commands, code conventions, and current focus turns a 10-minute ramp-up into 30 seconds. Claude reads it automatically at the start of every conversation.

How is this different from just using Cursor or Copilot?

Claude Code is built for a different problem. Copilot and Cursor are great at localized work – next-line completion, single-file edits, quick chat about a function. But imagine you need to refactor an authentication layer that touches 40 files across three services. That’s not a completion task. The bottleneck isn’t typing speed; it’s sustained reasoning across a large surface area. If you’re never hitting context degradation, you probably don’t need this workflow. If you are, this is where the gains are.

What about the “ultrathink” trick – should I always use it?

No – and the reason is worth understanding. The thinking levels (“think” < “think hard” < “think harder” < “ultrathink”) aren’t just a quality dial – they map directly to increasing reasoning token budgets. More thinking = more tokens consumed = context fills faster. So ultrathink on a gnarly architectural decision makes sense. Ultrathink on “rename this variable” burns your session budget faster and leaves you hitting the 30% degradation threshold sooner. The right level for most tasks is somewhere in the middle, and you should reserve the higher levels for decisions where getting it wrong means expensive rework.

Your next move

Open your current Claude Code session. Check the context indicator. If you’re past 30% and you’ve been at it for more than an hour, stop. Tell Claude to dump progress to a .md file, /clear, and restart from that file. Then notice how much sharper the next 20 minutes feel. That’s the workflow.