Two ways to use a coding agent. One lands you above 65% on comprehension tests. The other puts you below 40%. Same tool, same task, same time pressure – and Anthropic’s own randomized controlled trial documented the difference. This guide is about making sure you’re in the right group.
Lars Faye’s post Agentic Coding Is a Trap hit 367 points on Hacker News in late April / early May 2026, spawning a wave of response pieces. The thesis: agents erode the very skills you need to supervise them. The data backs it up. But the fix isn’t to ditch the agent – it’s to change how you talk to it.
The key takeaway, upfront
Two patterns. One score below 40%. One score above 65%. Same AI, same problem, same time pressure.
If you only remember one thing: never accept generated code without first asking a conceptual question. That single habit is what separates the two groups in the research.
What the studies actually found
65% versus 40% – that’s the split that matters, not the headline 17%. Anthropic’s randomized controlled trial (full paper: arXiv:2601.20245, Shen & Tamkin, 2026) found developers using AI scored 17% lower on comprehension tests overall – but that average hides a huge behavioral split. Devs who asked conceptual questions: above 65%. Devs who delegated: below 40%.
Then there’s METR’s early-2025 RCT with 16 experienced open-source developers. Before the tasks, they forecast AI would cut completion time by 24%. After finishing, they still believed it had saved 20%. The actual result: AI made them 19% slower. The confidence interval ran from +2% to +39% slower – so not a precise number.
Method A vs Method B: the two ways people use coding agents
Anthropic’s researchers watched screen recordings and pulled out six distinct interaction patterns. They fall into two clean groups.
| Pattern | Quiz score | Speed |
|---|---|---|
| Method A: Full delegation (“build me X”) | <40% | Fastest |
| Method A: Progressive reliance (start asking, end delegating) | <40% | Fast |
| Method A: Iterative debug-by-AI (“why doesn’t this work?”) | <40% | Fast |
| Method B: Explain after generating | >65% | Medium |
| Method B: Generate + bundled explanation | >65% | Medium |
| Method B: Concept questions only, write code yourself | >65% | 2nd fastest |
Look at that bottom row. Three patterns correlated with scores above 65%: asking for explanations after generating code, requesting code with explanations bundled in, and asking only conceptual questions while writing the code yourself. That last one? Second-fastest overall, trailing only full delegation.
The supposed tradeoff – “learn slowly OR ship fast” – is mostly fake. The slowest learners aren’t the fastest shippers. They’re tied with the second-best learners on speed.
The walkthrough: a 4-step beginner workflow that lands you in the 65% group
I’ll use Claude Code for the example because Anthropic ships the relevant features natively. The pattern works in any agent – Cursor, Copilot, Aider – you just won’t have the built-in commands.
Step 1: Flip the output style before you do anything
Open your terminal in your project, start Claude Code, and run:
/output-style
As of early 2026, Claude Code’s official docs describe two modes worth knowing: Explanatory (adds “Insights” between coding steps, explaining implementation choices) and Learning (adds Insights plus TODO(human) markers asking you to write 5-10 lines yourself). The default mode ships none of that – it’s what produces the 40% score.
For real work, pick Explanatory. For learning a new framework or library, pick Learning.
One thing the docs bury: Learning mode adds extra system prompt instructions every session, which costs additional tokens. Turns out this matters on metered plans – a year of daily debugging sessions adds up. Worth it for unfamiliar code, overkill for boilerplate. Toggle it intentionally.
Step 2: Ask the conceptual question first
Before you ask the agent to write anything, ask it to explain the approach. Not the code. The approach.
# Bad (Method A)
> Add JWT auth to this Express app
# Good (Method B)
> What are the tradeoffs between storing JWTs in httpOnly cookies
vs localStorage for this Express app? Don't write code yet.
This is the move that actually moves the needle. You’re forcing yourself into the conceptual-inquiry pattern that scored 65%+ in the Anthropic data.
Step 3: Generate, then immediately ask “why this and not X?”
Now let it write the code. The instant it finishes, before you even read it, ask:
> Why did you choose [specific decision] over [alternative]?
> What breaks if I change [variable/pattern]?
This is the “explain after generating” pattern. It takes 30 seconds. It’s the difference between code you own and code that owns you.
Step 4: Type the integration glue yourself
Let the agent write the heavy logic. You write the parts that connect it to your codebase – the imports, the error handling, the config wiring. Even 5-10 lines a session.
Pro tip: If you’re using Cursor or Copilot instead of Claude Code, simulate Explanatory mode by adding to your system prompt: “Before writing code, briefly explain the approach in 2-3 sentences. After writing code, list 2 tradeoffs of this implementation.” That’s the entire mechanism Anthropic added – it’s just a prompt.
Edge cases the discourse keeps skipping
The footnote that should be the headline. Anthropic’s study used a chat assistant – not an agentic tool. The researchers note directly that their chat-based setup differs from tools like Claude Code, where “the impacts on skill development are likely to be more pronounced.” So 17% is a floor, not a ceiling. For pure agentic workflows, expect worse.
The METR slowdown is shakier than it looks. Most articles still cite “AI makes devs 19% slower” as settled fact. METR’s February 2026 update walked the number back – developers started refusing to participate in trials without AI access, which biased the no-AI comparison group downward. The revised estimates sit between -18% and -4% speedup. The authors no longer stand behind the original figure.
The base rate is already large. Per METR’s observational data (as of early 2026), roughly 4% of GitHub commits are authored by Claude Code. This isn’t a fringe debate about future scenarios. The patterns you build now will compound.
So is agentic coding actually a trap?
It’s a trap if you treat it like a vending machine. It’s a tool if you treat it like a tutor that occasionally writes things for you.
The honest answer is the one nobody wants: we don’t know yet. Strong coding agents have been mainstream for under a year. The studies we have are snapshots. The patterns we have are plausible. Anyone telling you they’ve figured this out long-term is selling something.
FAQ
Is the 17% comprehension drop a real concern for short projects?
Probably not. The Anthropic study measured learning a new library – comprehension was the whole point. For a one-off script in a language you already know, full delegation is fine.
Does this work outside Claude Code?
Yes – the mechanism is just a system prompt addition. In Cursor, edit your Rules for AI to include “explain approach before coding, list tradeoffs after.” In Copilot Chat, prepend “explain first, then code” to your prompts. The 65/40 split came from behavior, not from the specific tool. One real caveat though: autonomous background agents that open PRs while you sleep make Method B nearly impossible by design – there’s no pause point to ask the conceptual question. That’s a different problem the studies haven’t caught up to yet.
What if I’m a beginner – should I avoid agents until I know the basics?
The study actually tested this. Beginners using AI for conceptual inquiry scored the same as the no-AI control group. The danger is delegation, not exposure. Skip the detox phase – just use Learning mode and ask why before what.
Next action: open your terminal, run /output-style in Claude Code, and switch to Explanatory before your next session. Five seconds, and you’ve moved from the 40% pattern to the 65% pattern by default.