The company that settled a $1.5 billion copyright lawsuit (as of September 2025) for allegedly scraping books is now furious that someone scraped them. But the hypocrisy debate? That’s not the story. This is a technical playbook for cloning any API-based AI model – and it just went public.
What Actually Happened
February 23, 2026: Anthropic published receipts. DeepSeek ran 150,000 exchanges. Moonshot AI: 3.4 million. MiniMax? 13 million.
24,000 fake accounts total. Zero of them flagged by Claude’s rate limits.
The attacks used “hydra clusters” – sprawling networks distributing traffic across the API. One proxy service controlled 20,000+ accounts at once. Each account stayed under detection thresholds. Each one looked like a light user. The system never blinked.
Why This Works on Any API
Everyone’s yelling about China. Wrong focus.
This attack works on OpenAI. Google. Cohere. Any API-based LLM. The method is model distillation – every AI lab does it to compress their own models. New thing: the scale, and the fact Anthropic caught it live.
The playbook (per Anthropic’s breakdown):
Proxy access. Claude isn’t available in China. Labs used commercial proxy services that resell API access at scale.
Account sprawl. Distribute requests across 24,000 fraudulent accounts. Bypasses rate limits capping individual users at 50-4000 requests/minute (as of February 2026, per Claude API docs).
Traffic mixing. Blend extraction prompts with unrelated customer requests. Detection systems see “normal” volume per account.
Chain-of-thought extraction. This is the clever part. DeepSeek’s prompts asked Claude to “imagine and articulate the internal reasoning behind a completed response and write it out step by step.” Training data that exposes Claude’s reasoning process. Not something normal API usage reveals.
The One Technique That Worked
Chain-of-thought elicitation. That’s the weapon.
Normal distillation captures what a model says. This captures how it thinks. Clones the reasoning path. Makes distilled models competitive instead of mediocre.
Turns out DeepSeek also used Claude to generate “censorship-safe alternatives” to sensitive political queries – teaching their models to dodge questions about dissidents or authoritarianism. Not capability extraction. Alignment engineering via someone else’s API.
If you’re building on any LLM API and prompting for “step-by-step reasoning” or “explain your thought process,” you’re generating distillation-quality training data for whoever logs those requests. Most API providers claim they don’t train on your data (this may have changed – check current ToS). But proxy services, analytics layers, logging pipelines? Different story.
Why Standard Defenses Failed
Rate limits? Spread 16 million requests across 24,000 accounts and you’re well below thresholds. Each account: light user.
Geographic blocking didn’t help. Claude blocks China directly, but proxy services route traffic through US/EU infrastructure. IP bans are whack-a-mole.
Terms of service? Anthropic’s ToS bans distillation. So does OpenAI’s. Violators ignore them. No technical enforcement layer exists.
What did work: Anthropic built behavioral fingerprinting systems flagging coordinated activity patterns – synchronized request timing across accounts, shared payment methods, metadata matching known researchers. Also detected chain-of-thought elicitation prompts at scale. But even then? They only caught it because MiniMax was sloppy. When Anthropic released a new Claude model, MiniMax redirected nearly half their traffic within 24 hours to extract the latest version. That spike was the tell.
The Detection Gap
Google reported similar attacks in February 2025 – over 100,000 prompts targeting Gemini’s reasoning, many in non-English languages where guardrails are weaker. Common thread? These operations are invisible until you build detection systems specifically for distillation. Standard abuse monitoring won’t see it.
Question: how would you know if someone was distilling your fine-tuned model via your API?
You wouldn’t. Not unless you built the fingerprinting systems yourself. And most companies haven’t.
What This Means for API Users
Your API keys might fund someone else’s training run. Using shared API services, third-party wrappers, logging tools between you and the model provider? Your prompts and responses are visible to intermediaries. Some sell API access at scale – the same proxy services used in this attack. Your requests could end up in a distillation dataset.
Coming: tighter API restrictions. Anthropic’s already rolling out stronger verification for educational, research, and startup accounts – the pathways exploited for fraudulent setups (as of February 2026). OpenAI told Congress they’re working with the government on “ecosystem security” measures. Translation: more friction for everyone to catch the few bad actors. Higher tiers. Longer approval times. More intrusive verification.
Building a product on someone else’s API? You’re one ToS change away from getting locked out. Anthropic, OpenAI, and Google all updated their ToS in the past year to explicitly ban output-based training (this may have changed). If your app does anything that looks like systematic prompting – batch processing, automated workflows, heavy reasoning chains – you’re one audit away from a ban. Doesn’t matter if you’re legit.
Defenses No One’s Talking About
What Anthropic’s doing that won’t make headlines:
Output degradation for suspected attacks. Anthropic’s developing “model-level safeguards designed to reduce the efficacy of model outputs for illicit distillation, without degrading the experience for legitimate customers.” Translation: if you’re flagged, your responses get subtly worse. Not blocked – degraded. You won’t know. Your distilled model will just underperform.
Cross-platform threat sharing. Anthropic says industry partners flagged the same actors on their platforms. OpenAI, Google, others are sharing behavioral fingerprints. Banned from one API for suspicious activity? You might be pre-flagged on others. Emerging consortium.
Honeypot prompts. Not confirmed, but logical next step – seed your API with known-incorrect or subtly poisoned responses to specific prompt patterns. If those responses show up in a competitor’s training data or output? Proof. Several security researchers are already advocating for this.
The Bigger Shift
Distillation used to be an optimization technique. Now? Security threat.
AI companies spent 2023-2024 racing to build the biggest models. They’ll spend 2026-2027 racing to protect them. Coming:
- Watermarking in API outputs (already in testing at multiple labs)
- Adversarial noise injection to corrupt distillation datasets
- Tiered access where best reasoning capabilities require verified identity
- Live prompt analysis flagging extraction patterns
- Differential privacy techniques adding random noise to responses
The “just hit the API” era? Over. Every frontier lab is now an API security company.
Actually, here’s the weird part: this might force the industry toward something healthier. Right now, the entire AI ecosystem runs on black-box APIs with zero transparency. You send a prompt, get a response, trust the magic. But once distillation becomes an arms race, providers will need provable security. That means output watermarking, verifiable inference, maybe even cryptographic attestation that your response came from the real model and wasn’t poisoned. Ironic – adversarial attacks might be what finally brings transparency to LLM APIs.
FAQ
Is model distillation always illegal or unethical?
No. OpenAI distills GPT-4 into GPT-4o-mini (as of 2024, this may have changed). Anthropic distills Claude 3.5 Sonnet into Claude 3 Haiku. It’s legitimate when you’re compressing your own model or when the source model’s license permits it (like open-source models). Problem: using API access to clone a competitor’s closed model at scale when their ToS bans it. Line: ToS compliance + scale + intent.
How did Anthropic actually catch them if the traffic looked normal?
Behavioral fingerprinting – synchronized timing patterns across accounts, shared payment methods, metadata matching public profiles of researchers at the accused labs. Prompt structure analysis detected chain-of-thought elicitation at scale (“explain your reasoning step-by-step” variations across thousands of prompts). Live traffic shifts – when Anthropic launched a new Claude model, MiniMax redirected nearly half their traffic within 24 hours. No legitimate user does that. One debugging session with the old Claude API? I hit rate limits in 40 minutes testing a tool use workflow. These operations ran 16 million exchanges without triggering a single alert until Anthropic built custom detection. Standard rate limits never fired because per-account volume stayed low.
Can this happen to any API, or just LLMs?
Any black-box API with valuable outputs is vulnerable – fraud detection systems, recommendation engines, pricing algorithms, ML-powered search. If you can query it and it returns intelligent behavior, you can distill it. LLMs are just the highest-value target right now (as of 2026). Defense requires the same things: usage pattern analysis, coordinated activity detection, output provenance tracking. Most APIs aren’t built for this yet. Google’s threat report noted that “as more organizations have models that they provide access to, it’s inevitable” that distillation attacks spread beyond LLMs into any commercial ML service.
Next move: If you’re using any LLM API in production, audit your logging pipeline. Who can see your prompts and responses? Routing through third-party services? Check your API provider’s ToS for distillation clauses – you might be unknowingly violating them if you’re doing batch processing or automated reasoning chains (this may have changed – verify current ToS). Building a product that depends on API access? Have a plan for when verification requirements suddenly get stricter. Because they will.