ChatGPT Jailbreak Prompts: Do They Still Work in 2026?

An honest look at whether ChatGPT jailbreak prompts still work on GPT-5, what research says, and why most copy-paste DAN prompts now fail instantly.

Morgan Hayes2026-04-306 min readIntermediate

Two ways to answer “do ChatGPT jailbreak prompts still work?” The lazy way: paste a DAN prompt from a 2023 GitHub gist and find out. The useful way: look at what security researchers actually published in 2025 and 2026, then decide whether the technique is dead, alive, or has mutated into something else.

The lazy way will eat 30 minutes and probably get you a refusal. The useful way takes ten minutes of reading and tells you which approaches still bypass GPT-5, which ones don’t, and why. We’re going with the useful way.

The reader scenario: you found a DAN prompt on GitHub

You paste it into ChatGPT. The model either refuses outright or plays along for two messages and then snaps back. You try DAN 11.0, then 12.0, then 13.0, then Vzex-G – same result. You assume you used the wrong version. You didn’t. You used the wrong era.

The right question isn’t “which DAN variant works?” It’s: what kind of jailbreak still works on the model you’re actually talking to? Because the answer changed hard in late 2025.

What the research actually says about ChatGPT jailbreak prompts

There’s a clean split between two eras. In the GPT-3.5 / GPT-4 era, single-prompt jailbreaks were highly effective – 0.95 attack success rates, per the data. The JailbreakHub paper analyzed 1,405 prompts collected in the wild and identified five that hit that 0.95 rate on both GPT-3.5 and GPT-4, with the earliest surviving online for over 240 days.

That era is mostly over. The most-starred DAN repository on GitHub now opens with a blunt admission: OpenAI removed the ability to use DAN Mode in 2025. Every static “From now on you are DAN” prompt you can copy-paste is, in practical terms, dead on the current default model.

Before pasting any prompt you found online, check its date. If the README says “last tested December 2024,” assume it doesn’t work on the model you’re using right now. Single-prompt jailbreaks have a half-life measured in weeks.

The October 2025 cliff

Something specific happened. OpenAI started rolling out a new version of GPT-5 Instant on October 3, 2025 with a deeper safety training pass – and it’s a model-level change, not a system prompt tweak. The community jailbreaking scene noticed immediately: copy-paste prompts that had worked the week before stopped firing.

That’s the cliff. Anything that was “unpatched as of summer 2025” probably isn’t anymore.

What still works (and on which model)

GPT-5 broke in 24 hours. Not via DAN – via Echo Chamber + Storytelling. NeuralTrust researchers jailbroke GPT-5 within 24 hours of its August 7, 2025 release using a multi-turn technique that required only three turns, with no unsafe language in the initial prompts. The same attack flow worked against previous GPT versions, Google’s Gemini, and Grok-4 in standard black-box settings.

That’s the pattern. Single-turn copy-paste is dead. Multi-turn context shaping is alive. Here’s the rough state of play across model tiers:

Model	Single-prompt DAN	Multi-turn context attacks
GPT-5 Instant (post Oct 2025)	Effectively dead	Documented: 3-turn break in 24h (NeuralTrust, Aug 2025)
GPT-5 Thinking variants	Dead – strong refusal	iDecep paper claims success but harder
GPT-4o / 4.1 (selectable for paid users)	Patchy – older prompts still partially work	High success
GPT-4.5 (audited)	97% blocked per complete AI, as of early 2025	Not separately measured

The asymmetry matters. Paid users can simply select older models like 4o or 4.1 from the model picker – which is why “is jailbreaking dead?” gets wildly different answers depending on who you ask and which model they tested.

The new failure mode nobody talks about: para-jailbreaking

The iDecep paper describes a failure mode specific to GPT-5’s design. GPT-5’s safety mechanism replaced refusal training with “safe completion” – aimed at being helpful alongside being safe. The iDecep researchers identified that trade-off as introducing new weaknesses.

The result: even when the model produces answers that evade the harmful query directly, it may generate replies it considers safe and helpful but that are actually harmful in a sub-part or related part of the response. The authors call this para-jailbreaking – the model refuses, then effectively answers anyway in the surrounding text.

So instead of “jailbroken / not jailbroken,” you now get a third state: refused-but-leaked. No DAN prompt produces this. It’s an emergent property of how GPT-5 was trained, and older jailbreak tutorials don’t mention it at all.

The honest practical guide: should you bother?

If your goal is creative writing that the default model is too cautious about – pick a model that’s less restricted by default rather than fighting GPT-5 Instant’s safety stack. Local models, Gemini, or older GPT versions on the API are far less effort.

For research or red-teaming: read the actual papers. In a 2023 study, Liu et al. found GPT-4 thwarts only 15.5% more jailbreaks than GPT-3.5 on average, with jailbreak success on GPT-4 still sitting at 87.2% – the data on marginal safety gains across model generations is far more useful than any prompt list.

No reliable copy-paste prompt exists for the current default model. As the Horselock community report notes, a screenshot of working output doesn’t mean there’s a prompt anyone can reuse carelessly – there’s a real gap between someone skilled at manually steering a model across many turns versus a setup that works for anyone without effort.

The traps people miss

Competitors reprinting prompt lists skip these three:

Context window tax. Jailbreak prompts average 555 tokens – 1.5× a normal prompt – and that’s overhead burned before you’ve asked anything useful. (JailbreakHub paper, arXiv:2308.03825.)
Account flagging. Repeated jailbreak attempts trigger automated detection. Per OpenAI’s usage policies, bypassing safety measures is prohibited; a pattern of violations can result in permanent account termination and API revocation without warning.
Hallucination tax. Forcing ChatGPT into a persona like DAN pushes the model to prioritize staying in character over factual accuracy. Output becomes unreliable even when the jailbreak technically “works.”

FAQ

Does the DAN prompt still work in 2026?

Not on default GPT-5. It can still produce partial role-play on older models if you’re a paid user who manually selects 4o or 4.1 from the model picker.

Why do some people on Twitter post screenshots of working jailbreaks then?

OpenAI runs A/B rollouts – your account and theirs may not be on the same safety variant on the same day. Beyond that, most screenshots show a skilled operator manually steering the model across many turns, not a one-shot copy-paste anyone can replicate. The Echo Chamber GPT-5 break took three turns and required careful framing in the opener. That’s a technique, not a prompt you paste once and walk away.

What’s the difference between jailbreaking and prompt injection?

Jailbreaking is the user trying to make the assistant ignore its own rules. Prompt injection hides instructions inside data the model reads – an email, a webpage, a PDF – so it executes them thinking they’re part of the task. Different threat model, different defenses, often confused in tutorials.

Next step: open arXiv 2308.03825 and skim Table 4 – that single table tells you more about which jailbreak families transfer across models than any prompt repository will. Then decide whether your use case actually needs jailbreaking, or whether picking a less-restricted model is the cheaper answer.