AI Psychosis or Productivity Trap? The Karpathy Method Explained

Karpathy stopped coding in December. Before you copy his workflow, here's what the hype articles won't tell you about AI coding agents, hidden costs, and the burnout data.

Jack Tom2026-03-248 min readBeginner

Here’s the #1 mistake developers make after reading about Karpathy’s “AI psychosis”: they immediately dump their entire codebase into Claude Code and expect 10x productivity. What actually happens? Rate limit hits in 3 days. Code ships with 41% more bugs than manual work. Debugging AI hallucinations eats more time than you saved.

The story blowing up right now: Andrej Karpathy – OpenAI co-founder and former Tesla AI director – stopped writing code in December 2025. 80% human/20% AI flipped to the reverse, then to zero. He calls it a “state of psychosis” trying to figure out what’s possible.

Before you follow his lead, let’s reverse-engineer what’s actually happening. And what the viral articles skip.

What Changed in December

December 2025: Claude and other coding agents crossed a coherence threshold. Barely usable → actually able to get things done. Not a model update announcement. A phase transition most developers felt but few could articulate.

The default workflow for building software changed completely in recent months as agentic AI exploded. Karpathy’s not alone – industry-wide shift. He spends 16 hours a day now “expressing intent” to agents instead of typing syntax.

Sounds incredible. What didn’t make the headlines?

The Data Everyone Ignores

Copilot users: 41% more bugs, according to Uplevel’s study of ~800 developers. Not fewer. More. Same study saw zero evidence AI assistants were preventing developer burnout.

METR randomized trial: developers using AI tools like Cursor and Claude were 19% slower on average. Yet they were convinced they’d been faster. Before the study? They predicted 24% speed-up. After finishing slower? Still believed AI had accelerated them by ~20%.

This perception-reality gap drives adoption despite negative results.

AI-produced code often looks consistent but violates local patterns. Omits null checks, early returns, complete exception logic – issues tied to real-world outages.Stack Overflow survey (90,000+ developers): 66% said the most common frustration is AI code is “almost right, but not quite.” Another 45.2% pointed to time spent debugging AI-generated code.

Why does Karpathy make it work?

The Karpathy Workflow (Parts He Didn’t Explain)

He’s not using these tools the way beginners do. Karpathy runs 10-20 AI agents in parallel. One agent waiting on a task? Opens more. Not prompting – orchestrating.

Second: subscription that matters. Claude Max costs $100-200/month, provides 20x higher usage limits than Pro, plus access to Opus 4.6 with a 1M context window.Heavy users can rack up $5,000/month in API-equivalent token usage – Anthropic eats that cost on Max subscriptions.Fewer than 5% of subscribers hit the weekly caps. Karpathy’s in that 5%.

Third: use case is research-grade prototyping, not production systems. Even in controlled setups with specific prompting and simple target applications, developers found issues in generated code constantly – “a bit like whac-a-mole, every time you run the workflow, something else happens.”

Setup Path (Without the Hype)

Two main options: Claude Code via Anthropic subscription ($20-200/month), or Cursor with Claude integration ($20/month Cursor Pro + Anthropic API costs). Claude Code launched February 2025 as a simple terminal tool – you chat with Claude, edit files, run bash commands.

For most people starting out: Cursor. Cursor is a VSCode clone with the same layout and keyboard shortcuts, integrated with Claude Sonnet 3.5 model (state-of-the-art for code generation). Visual diff interface. Claude Code is CLI-first – rewards power users, punishes beginners.

Realistic Expectations

One developer: “I’m more and more dependent on the tool. I now regularly question whether I could do it myself or if Claude would do a better job. Instances where I’d rather wait two hours for a session reset than implement a simple change myself.” This is the dependency trap.

Long sessions: “context rot” – model starts pulling in irrelevant details from earlier prompts. Accuracy drops.More context isn’t always better. Bigger context window should help theoretically, but often distracts the model.

Time-box your AI sessions. Set a 30-minute timer. When it goes off, ship what you have or switch to writing it yourself. Prevents the prompt spiral and perfectionism trap simultaneously.

Hidden Costs

Subscription users don’t see token costs. Dangerous illusion. Claude Code constantly caches context about your codebase – every file read, function understanding, context maintenance goes through the cache system. For large projects with frequent file switching, cache operations dominate your usage completely.

Cache reads: $0.50 per million tokens. Cache writes: $6.25 per million for Opus. Sounds small per token, but billions of tokens? Adds up.One power user tracked a single month: API-equivalent cost would’ve been $5,623 across 201 sessions and 45+ projects – more than five years of the Max 5x plan.

Claude Sonnet 4.6 pricing: $3 input / $15 output per 1M tokens. No price increase despite performance improvements. Cost-conscious and switching to API? Sonnet handles most tasks. Save Opus for genuinely hard problems.

Burnout Trap

“State of psychosis” sounds like hacker bravado. AI is making burnout worse. Not because AI is bad – AI removes the natural speed limits that used to protect us.

Before AI, there was a ceiling on how much you could produce in a day – set by typing speed, thinking speed, lookup time. Frustrating sometimes, but also a governor. You couldn’t work yourself to death because the work itself imposed limits.

Organizations treat every minute saved as a minute available for more work. Not less burnout – a different kind of burnout, hitting people who embraced AI the hardest.

One developer: “I didn’t quit or have a breakdown. I just stopped caring. Code reviews became rubber stamps. Design decisions became ‘whatever AI suggests.’ Going through the motions, producing more than ever, feeling less than ever.”

Recovery isn’t about using less AI. Using AI differently – with boundaries, with intention, understanding you’re not a machine and don’t need to keep pace with one.

Turns out removing natural constraints doesn’t just make you faster. Makes you vulnerable.

What Works (And What Doesn’t)

MVP-style work, side projects, experiments to a first version: speed-up is real.Outside those? Picture changes. You may feel like you’re moving quickly, but getting code production-ready often takes longer.

One developer found that a React component at their company – 18,000 lines long – had never been updated successfully by any AI agent except Claude Code.Cursor still has hiccups – trouble resolving patches, frequent file rewrites, struggles with extremely large files. Claude Code works great with complex tasks, gets stuck incredibly rarely.

Claude Code is likely post-trained with the same tools it currently uses – just more comfortable in the current use. The “tool call” selection they’ve implemented contributes to this.

Limits you’ll hit:

AI lacks deep understanding of business context and domain-specific requirements. Operates on pattern recognition rather than genuine problem space comprehension.
AI code generators aim for consistency, but quality varies. Generated code can lack the meticulousness of human expertise, potentially harboring hidden issues – bugs or security vulnerabilities.
Most advanced proprietary models: ~75% accuracy on structured outputs (University of Waterloo benchmarking). Open-source? Closer to 65%.

When to Write Code Yourself

Security-critical paths.Improper password handling and insecure object references: most prominent patterns in AI-generated security issues. Excessive I/O operations were ~8× more common in AI-authored PRs.

Performance-sensitive code.AI favors clarity and simple patterns over resource efficiency. Optimizing hot paths? Do it yourself.

Architecture decisions.There’s a difference between writing a few lines of code and full-fledged software development. Software development is “90% brain function – understanding requirements, designing the system, considering limitations – while converting this into code is the simpler part.”

AI agents struggle with abstract concepts: design principles, user experience, code maintainability. Hinders their ability to generate code that’s elegant, efficient, aligned with best practices – potentially creating more work in the long run.

Your Next Move

Don’t replicate Karpathy’s workflow. He’s running 20 parallel agents on a Max plan that costs Anthropic more than he pays.

Start with one clearly scoped task. Cursor or Claude Code for an isolated feature – defined boundaries, good test coverage, low blast radius if it breaks. Review every change. Run your tests. Ship it.

Then do it again. Slightly bigger scope. Same discipline.

The goal isn’t zero-code-written productivity psychosis. The goal is keeping the parts of development you’re good at while letting AI handle the parts you hate. Lose that balance? You’ll burn out – shipping faster or not.

FAQ

Is Claude Code worth the $200/month Max plan for serious development?

Most developers who use Claude Code as their primary daily tool land in medium to heavy usage – Max plan becomes worth it there. ~95% Opus usage would cost way more on API pricing. If you’re mostly Sonnet? API costs are ~40% lower. Might work for you.

Will AI coding assistants replace developers?

No. AI can generate functional code across many contexts, but with key limits. AI excels at routine tasks, pattern-based programming, boilerplate. Still requires human oversight for complex architecture, security considerations, business logic implementation. The most effective approach? Viewing AI as a collaboration tool that augments human capabilities rather than a replacement.

How do I avoid the “context rot” problem in long coding sessions?

Two strategies: Separate AI time from thinking time. Morning for thinking. Afternoon for AI-assisted execution. And restart sessions aggressively. When you notice the model pulling in irrelevant context or making mistakes it didn’t make earlier, don’t fight it – start a fresh session with only the current relevant context. One catch: restarting means re-uploading your codebase context, which burns tokens. But fighting context rot in a degraded session burns more.