Claude’s 4K Output Problem: Why 200K Context Doesn’t Help Long-Form Writing

Claude reads 200K tokens but writes 4K at a time. That's 500 pages of input, 3 pages of output. Here's what actually limits long-form generation - and 3 fixes that work.

Jack Tom2026-02-266 min readBeginner

You open Claude. You ask it to write a 3,000-word article. Strong start – introduction, first section, second section – then it stops at 800 words: “Should I continue?”

You say yes. Another 500 words. Stops again.

After five “continue” clicks, your context window is half-burned on repetition. Flow? Broken. You’re stitching fragments in Google Docs.

The thing is, Claude’s context window is 200K tokens (as of February 2025) – roughly 500 pages. Why can’t it write 10 pages straight?

Everyone obsesses over the context window. Wrong target. The output limit is what stops you.

Context Window ≠ Output Window

Context window: how much Claude can read. Your prompt, uploaded docs, conversation history. Input.

Output: different cap entirely.

Claude Sonnet 4 can produce up to 64,000 output tokens under standard API conditions (per datastudios.org analysis, February 2025). In English, 1,000 tokens ≈ 750 words. So 64K tokens = ~48,000 words. A short book.

Except that’s the API maximum with specific settings. Real usage? Much lower.

Claude Code users hit a 32,000 output token max error mid-generation (GitHub issue #24055). A Make.com community thread reports Claude stopping at 300-400 words during blog tasks – prompted for 1,000-3,000. The stop isn’t a bug. It’s a default cap triggering a confirmation pause.

Why lower caps? Cost and speed. Generating 64K tokens takes minutes and burns compute. Most tasks don’t need it. Default behavior: write a chunk, check if the user wants more.

Quick Q&A? Fine. Long-form writing? Pain.

Projects Won’t Fix This

“Use Projects! Upload your outline and Claude writes the whole manuscript!”

Projects let you upload documents to a knowledge base. Claude references them across chats (per Anthropic’s official Projects documentation). They use retrieval-augmented generation (RAG) – loading only relevant chunks into the 200K context as needed.

That solves input. Claude can reference a 50,000-word novel outline without choking.

Output length? Still capped. Same 4K-8K word limit per response. Projects help Claude remember. They don’t let it write longer drafts in one shot.

Three Fixes

1. Artifacts for Segmented Drafts

Artifacts display content in a separate window (per Albato’s Claude guide). Most tutorials miss this: you can highlight one sentence, and Claude rewrites only that part. Saves usage limits. No unwanted changes elsewhere.

For long documents, this changes everything.

Instead of “write a 3,000-word article” (triggers the output cap), you prompt section by section:

“Introduction. 300 words.”
“Section 2, based on outline.”
“Rewrite paragraph 3 in section 2 – conversational tone.”

Each section: one Artifact. You control flow. No continuation loops eating your context budget.

Artifacts must be enabled in Settings > Capabilities (not default in some setups).

2. Edit Prompts, Don’t Append

Typical workflow: Claude writes 800 words, you reply “continue,” it adds 500, you reply “continue” again. Two messages per cycle. Context fills fast.

Edit your original prompt instead. Hover over it, click edit, refine (“Write intro, 400 words”), regenerate. This creates a new branch without adding messages – you replace, not accumulate (per limitededitionjonathan’s Substack guide).

Traditional iteration: 6 context items. Editing: 2 items. 67% reduction.

Ten iterations? You save 88% of your token budget.

3. API with Manual Continuation (Technical Users)

Comfortable with code? The Anthropic API gives full control. Claude 3.7 Sonnet can produce up to 128,000 output tokens when you set the anthropic-beta: output-128k-2025-02-19 header (per datastudios.org documentation, as of February 2025).

Without that header, script a continuation loop: send prompt, receive response, check stop reason. If max_tokens, append “continue” and resend. API docs include samples.

The catch: $3 per million input tokens, $15 per million output for Sonnet 4 (as of February 2025). But uninterrupted generation.

When to Use Each

Method	Best For	Limitation
Artifacts	Blog posts, articles, reports under 5K words with section-by-section control	Manual segmentation required; won’t work well for 50K+ manuscripts
Edit + Regenerate	Iterative refinement – tweaking tone/structure across attempts	Doesn’t extend single-response length; just saves context
API + Continuation	Technical users needing uninterrupted 10K+ word generation, comfortable scripting	Requires coding knowledge and direct API access

Most writers? Artifacts + editing is the sweet spot. Creative control, no continuation fatigue, fits free/Pro tier limits.

The 300K-Word Novel Case

One writer produced a 301,000-word novel using Claude over 8 months (documented on BSWEN blog via Reddit discussion). Method: treat it like software development. Built a 56,000+ word story bible – every character, location, lore element.

Not “write chapter 1 in one go.” Instead:

Pre-build reference materials (character sheets, world details, style guide)
Generate scenes: 500-1,000 word chunks
Use Claude to check consistency against story bible
Manually stitch scenes into chapters

The output limit forced discipline. Each scene: intentional. Cleaner than coaxing Claude into 5,000-word chapters that drift off-prompt halfway.

Writing over 20K words? Create a “continuity document” in your Project. Paste key character details, plot beats, style rules. Reference it every prompt: “Using the continuity document, write the next scene where…” Claude stays consistent. No re-explaining your world each time.

Ever notice how the best long-form AI writing feels more like assembly than generation? You’re not asking for a finished draft. You’re collecting parts.

ChatGPT Comparison

ChatGPT Plus also caps output around 4K-8K tokens per response. But: community feedback (idratherbewriting.com, October 2023) describes Claude’s output as “less AI-smelly,” with easier-to-read prose. ChatGPT leans formulaic.

For novels, essays, narrative nonfiction: Claude has an edge in voice. For content needing web search or real-time data: ChatGPT’s plugin ecosystem wins.

Neither lets you write 10,000 words in one prompt. Choose by output quality, not context window size.

Frequently Asked Questions

Can I increase Claude’s output limit?

Claude 3.7 Sonnet supports up to 128,000 output tokens with the anthropic-beta: output-128k-2025-02-19 header in API calls (as of February 2025). Not available in Claude.ai web interface. For Pro or free plans, the effective limit is 4K-8K tokens per response. API only for extended output.

Does Claude Pro remove the output cap?

No. Claude Pro costs $20/month (as of February 2025, per Kindlepreneur tutorial and official pricing). It provides higher usage limits – more messages per day, access to Opus models – but doesn’t remove per-response output token caps. You get more turns, not longer turns. The cap is architectural, not a subscription tier.

One user scenario: subscriber switched from free to Pro expecting to generate 5,000-word blog posts in one shot. Same 800-word truncation. Pro gave them 50 messages/day instead of 10, but each message hit the same 4K-8K output limit. If your goal is longer single responses, Pro won’t help – you need API access with custom headers.

Why does Claude stop mid-sentence sometimes?

Two reasons. First: hit max_tokens for that response (the API’s default cap). Second: context window filled – conversation history plus new response exceeded 200K tokens. The stop_reason field in API responses tells you which: max_tokens, model_context_window_exceeded, or end_turn (natural completion). Web interface doesn’t surface this, but same behavior applies.

Next Step

Open Claude. Settings > Capabilities > enable Artifacts. Pick one section you need written – intro, first scene, executive summary. Prompt for that section only. 300-500 words.

Watch the Artifact appear.

Edit one sentence using the highlight feature. Claude rewrites just that part.

This is how you do it. Not “write my book.” Write this scene. Now this one. The output limit stops mattering when you stop fighting it.