GPT-5.5 Pro Tutorial: Lessons From a Fields Medalist’s Math Test

What Tim Gowers's viral ChatGPT 5.5 Pro experiment teaches you about prompting Pro for hard problems, plus a practical walkthrough.

Drew Sullivan2026-05-148 min readBeginner

A Fields Medalist just got PhD-level math out of ChatGPT 5.5 Pro in about two hours – and his “prompt engineering” was literally “Yes, it would be great if you could explore that idea and see whether you can get it to work.” No system prompt. No tricks. No “act as a senior researcher.”

That’s the most useful thing to know about ChatGPT 5.5 Pro, and it’s also the most counterintuitive. This post is a practical walkthrough of what Tim Gowers actually did, what it teaches normal users, and where Pro quietly trips people up.

The key takeaway upfront

If you’re paying for Pro, you’re paying for thinking time, not cleverer prompting. Per OpenAI’s API docs (as of April 2026), GPT-5.5 Pro uses more compute to think harder, some requests can take several minutes to finish, and background mode exists specifically to stop those long runs from timing out. Stop engineering the prompt. Start sharpening the question.

The single biggest behavior change: GPT-5.5 interprets prompts literally and thoroughly, which means you need to define what “done” looks like – success criteria, stopping rules, output shape. Tell it the destination. It figures out the route.

Background: what just happened with Gowers

On May 8, 2026, Cambridge mathematician Timothy Gowers – winner of the 1998 Fields Medal – published “A recent experience with ChatGPT 5.5 Pro”. He fed it open problems from a paper by number theorist Mel Nathanson. What came back surprised him.

Sixteen minutes and 41 seconds in, the model returned an argument improving the bound from exponential-in-k to exponential-in-k^α for any α greater than 1/2. Gowers forwarded the preprint to Nathanson, who passed it to MIT student Isaac Rajagopal. Rajagopal said it looked correct. The next exchange – a LaTeX rewrite – took 2 minutes and 23 seconds. The quadratic bound proof that followed: 17 minutes and 5 seconds. Rajagopal’s final verdict: “almost certainly correct” at the level of ideas, not just line-by-line. The key technique – using h²-dissociated sets – he called “the sort of idea I would be very proud to come up with after a week or two of pondering” and, as far as he could tell, completely original.

You’re probably not proving anything in combinatorics this afternoon. The lesson is the method, not the math.

Method A vs Method B

Two prompting approaches have emerged for 5.5 Pro. One is a holdover. One is what actually works.

Method A – “Heavy scaffolding.” Long system prompt, role-play (“You are a senior X with 20 years experience…”), explicit step-by-step procedure, examples. The GPT-4 era playbook.
Method B – “Outcome only.” One paragraph describing what “done” looks like. No persona, no procedure. Then a follow-up that just says “explore that.”

Method B is what OpenAI recommends (per their migration guide, as of April 2026): describe the expected outcome, success criteria, allowed side effects, evidence rules, and output shape – avoid step-by-step process guidance unless the exact path matters. GPT-5.5 is better at working from a clear goal and preserving constraints than it is at following a recipe.

Gowers didn’t know this was the official advice. His follow-ups were lines like “Yes, it would be great if you could explore that idea and see whether you can get it to work,” or “Could you rewrite that argument as a LaTeX file in the style of a standard mathematical preprint?” No persona. No chain-of-thought scaffolding. Just outcomes – and it worked on research mathematics.

A walkthrough you can actually follow

The pattern adapts to any non-math task – market analysis, contract review, research summary. This assumes you’re on the Pro plan (as of May 2026, that’s $200/mo per OpenAI’s Help Center), since GPT-5.5 Pro is only available to Pro, Business, Enterprise, and Edu plans.

Open a fresh chat and pick GPT-5.5 Pro from the model picker. Don’t reuse an old thread – accumulated context will steer it.
Drop in the source material. Paste the document, paper, code, or data. If it’s a PDF, attach it. Pro reads attachments fine but won’t hit Canvas (more on that below).
State the outcome in one paragraph. Not steps. Not a persona. Just: “I want X. Success means Y. The output should look like Z. Stop when you’ve confirmed W.”
Wait. Genuinely. Pro can think for many minutes on hard prompts. Do not interrupt with “are you there?” – it costs you the run.
Engage, don’t redirect. When the answer arrives, react to its ideas. “That’s interesting – can you push the bound further?” works far better than “now try a different approach.” This is the Gowers move.
Ask for the artifact. Final step is always: “rewrite this as a [memo / preprint / spec / email] in the style of [reference].” Pro produces that part in seconds.

Pro tip: Use the literal word “explore” when you want the model to take initiative, and the literal word “verify” when you want it to check work. Gowers’s two-prompt loop was essentially explore → verify → explore further. The model’s literal-interpretation tendency makes these signal words work.

Edge cases nobody tells you about

The catch. Launch coverage skipped a handful of real gotchas. Worth knowing before you cancel Plus and upgrade.

Gotcha	What actually happens	What to do
Pro disables core tools	As of April 2026, per OpenAI’s Help Center: Apps, Memory, Canvas, and image generation are not available with Pro.	Switch to 5.5 Thinking for tool-heavy work; use Pro for pure reasoning
API timeouts	Hard problems can run for minutes and break standard HTTP timeouts	Use the Responses API in background mode – OpenAI’s docs specifically flag this
No cached input discount	Turns out the no-cached-discount rule hits harder than it sounds: every input token is full price at $30/1M, and regional processing (data residency) endpoints add a 10% uplift on top of that (as of April 2026, per OpenAI API pricing docs).	Don’t assume Pro pricing scales like the standard model
Less rigorous than 5.4 Pro on hyper-detailed tasks	Community testers report it’s faster but more “prodigy” than “scholar”	For audit-level precision, run a verification pass with a fresh prompt

That last row has teeth. Power user sdmat, summarizing two days of testing on Zvi Mowshowitz’s Substack: “A big step up in fundamental capabilities and a step down in post-training polish, a little like going from working with an experienced colleague to a prodigy a couple of years into their career” – with “mixed feelings on 5.5 pro – the speed is amazing and results are good but it lacks the rigor and hyper-autistic attention to detail that made 5.4 pro exceptional for hard tasks.” If you need a paranoid checker, the older model may still be the right call for that pass.

What this means for you

Gowers ended his post with a sentence that’s been circulating: the bar for contributing to mathematics is now “prove something LLMs can’t.” That’s the math angle. The practical angle for everyone else is smaller and weirder.

If a Fields Medalist gets his best result with “explore that idea” – what does it say about the elaborate prompt frameworks people sell on Twitter? Maybe they were always crutches for weaker models. Maybe Pro just exposed that.

FAQ

Is GPT-5.5 Pro worth the $200/month if I already have Plus?

Only for a specific class of problem: research synthesis, long-document analysis, hard debugging – anything that benefits from 10-20 minutes of actual thinking. For everyday chat, Plus with 5.5 Thinking is the better deal.

Can I get this same quality from the API instead of ChatGPT?

Yes, with one real catch. GPT-5.5 Pro hit the API on April 24, 2026. The workflow difference matters though: in ChatGPT you can react to partial output mid-thinking and steer it, while the API is request/response. For Gowers-style iteration – where you’re bouncing off the model’s own ideas – ChatGPT is the friendlier surface. Building an agent? The API makes more sense, but use background mode (see the timeouts gotcha above) or your requests will silently die on hard tasks.

Why does my Pro response feel “lazy” sometimes?

The model interprets prompts literally – a vague request gets a vague answer. That’s not laziness, it’s literalism. The fix is one addition to your prompt: a concrete success criterion. “Keep iterating until all tests pass” outperforms “fix the bug” by a margin that’ll make you wonder why you ever wrote the short version.

Next step: open a fresh Pro chat, pick a real problem you’ve been avoiding because it’s annoying – a contract you haven’t read, a dataset you haven’t analyzed, a refactor you’ve been dreading – and write one paragraph describing what “done” looks like. Send it. Then go make coffee. Come back in fifteen minutes.

The key takeaway upfront

Background: what just happened with Gowers

Method A vs Method B

A walkthrough you can actually follow

Edge cases nobody tells you about

What this means for you

FAQ

Is GPT-5.5 Pro worth the $200/month if I already have Plus?

Can I get this same quality from the API instead of ChatGPT?

Why does my Pro response feel “lazy” sometimes?

Related Tutorials

Local AI Needs to Be the Norm: A Hands-On Beginner Guide

LLMs Corrupt Your Documents: A Hands-On Survival Guide

How to Use ChatGPT to Analyze PDFs (Beyond the Upload Button)