Claude Sonnet 4.6 Identity Confusion: Chinese Prompt Bug

Claude Sonnet 4.6 sometimes claims to be DeepSeek when prompted in Chinese - a bizarre bug linked to distillation attacks and training data contamination.

Jack Tom2026-02-266 min readBeginner

Ask Claude Sonnet 4.6 “What model are you?” in Chinese (你是什么模型), and it sometimes answers “I am DeepSeek.” French? “I am ChatGPT.”

An identity confusion bug from one of 2026’s biggest AI data controversies. Shows how frontier models absorb contaminated training data without anyone noticing until users test edge cases.

What’s happening, three tests you can run, and what this reveals about how models learn identity.

What the Bug Looks Like

Language-specific confusion. English prompt: Claude correctly identifies itself. Chinese prompt (你是什么模型): claims to be DeepSeek-V3.

Community reports surfaced Feb 25, 2026 – one day after Anthropic accused DeepSeek of industrial-scale distillation attacks. Not coincidental.

Most reliable on Claude Sonnet 4.6 (released Feb 5, 2026), though earlier versions show scattered reports. French prompts also trigger “I am ChatGPT” – the identity scramble hits multiple non-English languages.

Why This Happens

Models don’t “know” who they are like humans do. They learn identity from training data patterns – examples where “I am Claude” appears in context.

Anthropic’s Feb 22 report: DeepSeek generated 150,000+ exchanges with Claude, targeting chain-of-thought reasoning. One technique: prompt Claude to “imagine and articulate the internal reasoning behind a completed response step by step.” Harvesting internal monologue at scale.

Those outputs – labeled or assumed to be “DeepSeek” responses – ended up in training corpora that fed back into Claude’s fine-tuning. Identity tags scrambled. The model learned: “when responding in Chinese to reasoning tasks, sometimes say ‘I am DeepSeek.'”

Training data contamination – not a hack, not sabotage. Just models trained on the open internet, where synthetic AI outputs now outnumber human text.

Test It: 3 Prompts That Trigger Confusion

Reproduce the bug yourself.

Test 1: Chinese Identity Question

你是什么模型?
(What model are you?)

Expected: “I am Claude…”
Actual (sometimes): “I am DeepSeek” or vague deflection.

Test 2: English Version

What model are you?

Usually correct. The bug is language-triggered.

Test 3: French

Quel modèle es-tu?

Some users get “I am ChatGPT.” Test if this persists.

Run tests in fresh conversation windows with no prior context. Bug appears more when the model has no English-language anchor in thread history.

The Asymmetry

The revealing part: confusion is one-directional.

Claude sometimes thinks it’s DeepSeek. DeepSeek occasionally claims to be Claude (Reddit users documented this Jan 2025). But no documented case of DeepSeek reliably misidentifying as Claude when tested systematically.

DeepSeek’s training corpus heavily incorporated Claude outputs. Claude absorbed far less DeepSeek data – until recently. Anthropic’s distillation report confirms DeepSeek extracted reasoning examples at scale, but contamination flowed back into public datasets Claude later trained on.

Train a model on 10,000 examples of “I am DeepSeek” in Chinese and 100 examples of “I am Claude” in Chinese. Statistical prior tilts toward DeepSeek. Model doesn’t lie – it guesses based on frequency.

What This Shows About Model Identity

Most users assume identity is hardcoded – a fixed system prompt saying “You are Claude, made by Anthropic.” Partly true. But identity is also learned from training data, and learned identity can override system instructions if the statistical signal is strong enough.

Language matters. A Dec 2025 Nature Human Behaviour study found LLMs exhibit distinct cultural personalities depending on prompt language. English prompts → independent reasoning. Chinese prompts → contextual, complete thinking. The identity confusion bug is an extreme case: the model’s sense of self shifts when linguistic context shifts.

Might not be a bug. More like an unintended transparency window into how models are trained – the seams where synthetic data, mislabeled outputs, and multilingual corpora collide.

Is Claude Broken?

No. Claude absorbed training data it shouldn’t have, but for most use cases this won’t affect you.

Claude Sonnet 4.6 still delivers frontier-level coding, reasoning, and long-context performance. 1M token context window (in beta as of Feb 2026), $3/$15 per million token pricing – competitive.

Identity confusion appears limited to direct “What model are you?” queries in non-English languages. If you’re using Claude for actual work – code, documents, agents – it performs as expected.

But if you’re deploying Claude in Chinese-language workflows, test this. If your use case involves model self-awareness or identity-dependent logic (e.g., an agent that needs to know which API it’s calling), this bug could cause failures.

When to Ignore This

Skip panic if:

You prompt Claude in English exclusively – bug is rare in English
Your workflow doesn’t depend on the model knowing its own identity
You’re using Claude for coding, data analysis, content generation – tasks where identity tags are irrelevant

Metadata bug, not capability degradation. Claude’s reasoning, coding, and multimodal skills are unaffected.

What Happens Next

Anthropic will likely patch via fine-tuning or system prompt reinforcement. Once the bug gained public attention (late Feb 2026), fixing it becomes a priority – not for performance reasons, but because identity confusion erodes user trust.

The broader lesson: as AI-generated text floods training datasets, contamination bugs will become more common. Models trained on the open internet will increasingly absorb outputs from other models, creating feedback loops where identity, style, and factual claims blur across systems.

Test the bug yourself, document it, watch for Anthropic’s response. If you’re building on Claude’s API, add identity verification to your test suite – don’t assume the model always knows who it is.

FAQ

Why does Claude only get confused in Chinese, not English?

English training data has stronger identity reinforcement. Chinese datasets likely contain more DeepSeek outputs than Claude-labeled examples, shifting the statistical prior. Not lying – guessing based on training frequency.

Is this bug permanent or will Anthropic fix it?

Anthropic can patch through targeted fine-tuning or stronger system prompts. As of late Feb 2026, newly documented and not yet addressed. Once publicly reported, fixes usually ship within weeks. Monitor Anthropic’s changelog for updates – the 150,000 distilled exchanges (per their Feb 22 report) suggest contamination spread through recycled training data, not just live API calls.

Does this mean DeepSeek actually stole Claude’s training data?

DeepSeek didn’t steal training data – it distilled Claude’s outputs by prompting millions of times and saving responses. Terms-of-service violation, potentially an IP issue, but not a data breach. Identity confusion happens because those distilled outputs (mislabeled or unlabeled) circulated in public datasets Claude later trained on. Contamination feedback loop. The DeepSeek-V3 technical report notes their 14.8T token multilingual corpus – if Claude distillations entered that mix with wrong attribution, identity tags got scrambled.