Skip to content

GPT-5 Just Dropped – And Everyone’s Making the Same Mistake

GPT-5 launched to 700M users with a hidden router system nobody expected. Most tutorials miss the #1 issue: the model you think you're using isn't always the one answering.

8 min readBeginner

On launch day, most people weren’t actually talking to the model OpenAI spent months hyping.

They got routed to a weaker backup. Without knowing it.

August 8, 2025. Reddit and X exploded. Same complaint everywhere: “GPT-5 feels dumber than GPT-4o.” Benchmarks said PhD-level intelligence. Reality? Felt like a downgrade.

Turns out they were right. Hours later, Sam Altman confirmed it on X: the auto-router broke. The system deciding which GPT-5 model answers your question defaulted everyone to a cheaper, faster variant – no warning. For a chunk of launch day, GPT-5 was dumber because the thing controlling it malfunctioned.

That’s the #1 mistake: assuming it’s one model. It’s not. Network of models. Router playing traffic cop. When that router screws up (or just makes bad calls), your results tank.

What GPT-5 Actually Is

GPT-5 launched August 7, 2025 to all ChatGPT users – Free, Plus, Pro. Staggered rollout. OpenAI expected 700 million weekly active users that week. Positioned as a unified system that would kill the confusing model picker (the dropdown Altman publicly hated).

The GPT-5 System Card reveals: not a single model. It’s a real-time routing system:

  • gpt-5-main – Fast, high-throughput, simple queries
  • gpt-5-thinking – Deeper reasoning, complex problems
  • mini variants – Backup when you hit usage limits or servers overload

Router decides which one answers. Based on conversation complexity, tool requirements, keywords in your prompt (like “think step by step”). Trained on signals: when users manually switch models, preference rates, correctness measurements.

Sounds smart. Except when it isn’t.

The Router Broke

August 8 (day after launch): auto-switcher malfunctioned. Users asked complex questions, got shallow half-baked responses. Some thought the model got secretly nerfed. Others assumed OpenAI was cutting costs, throttling output.

Sam Altman, Reddit AMA: “Yesterday, the autoswitcher broke and was out of commission for a chunk of the day, and the result was GPT-5 seemed way dumber.”

Router defaulted everyone to the lightweight model. PhD question → intern answer. The problem? No indicator showing which model was answering. Users had zero visibility.

Not a one-time glitch. Router still misroutes. Servers under load? Free and Plus users get silently downgraded to mini models. Lower-tier plan during peak hours? You might be talking to a backup without warning.

Think of it like calling customer service. You dial the expert line, get transferred to tier-1 support, and nobody tells you. That’s GPT-5’s router when it fails.

Pro tip: GPT-5 suddenly giving weaker answers mid-conversation? Not you – router switching models behind the scenes. Start a new chat or add “think carefully” / “use deep reasoning” to nudge it toward Thinking model.

How to Actually Use GPT-5

Auto mode works fine for casual stuff – weather, definitions, quick summaries. Anything requiring accuracy or depth? Take control.

Option 1: Force Thinking Mode (ChatGPT Interface)

After the backlash, OpenAI added manual controls. ChatGPT interface model picker:

  • Auto – Router decides (unreliable)
  • Fast – Forces gpt-5-main (quick, shallow)
  • Thinking – Forces gpt-5-thinking (slower, deeper)

Plus or Pro user? Select Thinking for coding, technical analysis, any task where quality beats speed. Free users don’t get manual selection – stuck with Auto. More frustrating.

Option 2: Use Explicit Prompts

Router scans your prompt for intent signals. These phrases increase odds it routes to reasoning model:

  • “Think step by step”
  • “Explain your reasoning”
  • “Use deep analysis”
  • “Show all your work”

Not guaranteed – router can override you – but shifts probability. Community testing (DataCamp, Data Science Dojo docs) shows these prompts correlate with higher reasoning engagement.

Option 3: API – Skip the Router

Developer? API doesn’t use the router. Call a specific model directly:

from openai import OpenAI

client = OpenAI(api_key="YOUR_API_KEY")

response = client.chat.completions.create(
 model="gpt-5", # or "gpt-5-mini", "gpt-5-nano"
 messages=[
 {"role": "system", "content": "You are a helpful assistant."},
 {"role": "user", "content": "Analyze this code for security vulnerabilities: [paste code]"}
 ],
 reasoning_effort="high" # none, low, medium, high
)

print(response.choices[0].message.content)

Full control. No router, no surprises. You specify reasoning_effort: none (no reasoning), low, medium, high. Higher effort = slower + more expensive, but consistent quality.

Pricing: $10 per million input tokens, $10 per million output tokens (reasoning tokens count as output, as of August 2025). For most users, ChatGPT Plus ($20/month) is cheaper unless you’re processing massive volume.

The Hidden Cost Problem

OpenAI marketed GPT-5 as more efficient. Technical quirk makes it less efficient: every router model switch resets the static prompt.

Every ChatGPT message has two parts:

  1. Your prompt (what you type)
  2. Static prompt (hidden system instructions like “You are ChatGPT, a helpful assistant”)

GPT-4o cached the static prompt in long conversations – didn’t need resending. GPT-5’s router? Every model switch forces a fresh static prompt reload. Ask a question → router picks gpt-5-main. Follow-up triggers gpt-5-thinking → router switches, sends entire static prompt again.

Eats tokens. Technical analysis from Where’s Your Ed At found GPT-5 queries consume more tokens than GPT-4o because of the router. System supposed to save compute quietly burns more.

Using the API on long multi-turn conversations? This destroys budgets. One debugging session with 12 back-and-forths where the router switches models 4 times = 4 extra static prompt reloads you’re paying for.

When GPT-5 Works Well

Despite the chaos, GPT-5 is better than GPT-4 at several things – when routed to the right model.

Coding: 74.9% on SWE-bench Verified (real-world GitHub issues, as of August 2025 launch). Highest of any model at launch. Good at debugging, refactoring, generating working code from vague descriptions. Use Thinking mode or set reasoning_effort="high" in API.

Math: 94.6% on AIME 2025 (advanced high school math competition). For STEM homework or technical calculations, genuinely helpful – only if you force reasoning mode.

Long documents: 400K token context window (roughly 300,000 words – entire book). Upload contract, codebase, research paper, ask detailed questions. But: OpenAI’s own evals show accuracy drops to 89% between 128K-256K tokens. “Lost in the middle” problem not solved.

Task Best Model Setting Why
Quick facts, definitions Auto or Fast Router works for simple queries
Coding, debugging Thinking (manual) or API high reasoning Needs step-by-step logic
Writing, creativity Fast mode – reasoning hurts creativity GPT-5 is bland; faster = slightly better
Math, STEM problems Thinking (manual) or API high reasoning Requires extended reasoning chains
Long document analysis Upload files, use Thinking mode Context huge but accuracy degrades

GPT-5.2 (And Why People Hate It)

December 2025: OpenAI released GPT-5.2. Positioned as “most advanced model for professional work.” 400K context, 128K max output. Trained for coding, spreadsheets, presentations, tool use.

Sparked new complaints. Power users: GPT-5.2 ignores custom instructions, produces blander output than 5.1, hallucinates more on real documents – despite benchmark improvements.

OpenAI’s own system card admits certain modes show regressions. GPT-5.2 Instant (default for most users) performs worse than GPT-5.1 in some scenarios. Optimized for benchmarks. Real-world prompts don’t follow benchmark structure.

Evidence suggests GPT-5.2 was an early checkpoint release – partially trained version shipped to hit a deadline. OpenAI confirmed “substantial improvements” coming Q1 2026. Even they know current version underperforms.

Frustrated with GPT-5.2? Switch to Thinking mode explicitly, or roll back to GPT-5.1 if your API/enterprise plan still has access.

The Real Lesson

GPT-5 is powerful when it works. Router adds unpredictability that breaks the mental model: “I ask, AI answers.”

Now: “I ask, hidden router decides which version answers, zero visibility unless I’m a Pro user who manually overrides.”

Trust problem. When router fails – or silently downgrades you – erodes confidence. You stop trusting output because you don’t know which model produced it.

The fix? Be explicit. Force Thinking for anything important. Use API if you need control. GPT-5 suddenly feels dumber mid-conversation? Assume router switched – start new chat, try again.

New reality of frontier AI. Model picker is back, whether OpenAI wanted it or not. You just have to know it exists.

FAQ

Is GPT-5 really better than GPT-4?

On benchmarks? Yes – math, coding, multimodal. In practice: depends which variant the router gives you. Routed to gpt-5-main during high load? Might feel worse than GPT-4o. Force Thinking for important tasks.

Why does GPT-5 feel less creative than GPT-4o?

OpenAI trained it to be direct and concise. “Flat” or “bland” – less personality, fewer emojis, more corporate. Intentional (also restricted heavily after mental health lawsuits). Many users prefer GPT-4o’s warmer tone. Creative work? Try Fast mode instead of Thinking – reasoning actually hurts creativity.

How do I know which model is answering my question?

You don’t. ChatGPT doesn’t show which variant the router selected unless it routes to a different model for sensitive topics (you’ll see “Used GPT-5” under reply). Only way to guarantee specific model: manually select Fast or Thinking, or use API with explicit model names. Lack of transparency = biggest user complaint.