OpenAI Disproves an 80-Year Erdős Conjecture: A Playbook

OpenAI's internal model just disproved the Erdős unit distance conjecture. Here's what actually happened and how to use the same workflow.

Alex Carter2026-05-207 min readBeginner

Erdős literally paid people to solve this problem. He attached a cash bounty to the planar unit distance question because he thought it was that important – and for almost 80 years, nobody collected. On May 20, 2026, an internal OpenAI model did. Sort of. The story is more interesting than the headlines.

If you’re a developer, researcher, or just someone who uses ChatGPT, the actual question isn’t “is AI doing math now” – it’s “what workflow made this possible, and can I copy it?” That’s what this guide is about.

The Problem: Headlines Are Telling You The Wrong Story

Every article currently trending says some version of: OpenAI’s AI autonomously solved an 80-year-old open problem. That’s mostly true. It’s also missing the part that matters if you want to use these tools yourself.

Here’s what actually happened. An internal OpenAI model produced an infinite family of point configurations that disproves the long-standing belief that square-grid constructions were essentially optimal for maximizing unit-distance pairs, giving a polynomial improvement. The proof leans heavily on algebraic number theory – specifically infinite class field towers and Golod-Shafarevich theory. That’s the headline.

The footnote nobody quotes: the original proof produced by the AI was completely valid, but it was significantly improved by the human researchers at OpenAI and the many other mathematicians involved in the published paper. The AI didn’t hand humans a polished manuscript. It handed them a correct but rough sketch, and a team of mathematicians spent serious time turning it into something publishable.

Why Most “AI Did Math” Claims Fall Apart

This is the second time in seven months OpenAI has made a big math claim. The first one collapsed. Former OpenAI VP Kevin Weil posted that GPT-5 had found solutions to 10 previously unsolved Erdős problems and progress on 11 others – but it turned out GPT-5 had just retrieved existing solutions from the literature. OpenAI later acknowledged the error.

So when you read “AI solved an open problem,” the question is always: did the model actually do something new, or did it pattern-match to something already in its training data? Three checks separate the two:

Is there a verifiable proof artifact? Not a claim, a PDF mathematicians can read.
Did external experts review it? The current result ships with a companion paper from outside mathematicians.
Is the technique unexpected for the field? Using class field towers on a geometry problem qualifies. Restating a 1980s result doesn’t.

This week’s result clears all three. The previous one cleared zero.

The Workflow: How To Use AI On Hard Problems

You probably aren’t going to disprove an Erdős conjecture this weekend. Doesn’t matter. The same workflow that worked here works on any hard, multi-step reasoning task – debugging a gnarly system, proving a small lemma, untangling a research question.

The clearest public template comes from UCLA mathematician Ernest Ryu, who used GPT-5 (as of early 2026) to solve a 40-year-old optimization problem. His method is documented, and it maps cleanly onto what the OpenAI team apparently did internally.

Step 1: Frame the model as an idea generator, not an answer machine

Ryu kept prompting GPT-5 as a collaborator he would bounce ideas off of. It would continue to produce creative ideas, and when he posed a question it would offer a direction – right or wrong – and he would assess it quickly, pivoting immediately on dead ends and pursuing the promising ones. You’re not asking for the answer. You’re asking “what techniques from adjacent fields might apply here?”

Step 2: Verify in a fresh chat

This one is counterintuitive and almost no tutorial mentions it. When asking GPT-5 to check the work, Ryu found greater success starting a new chat rather than asking the model to check its work in the same conversation. Feeding results into fresh chats helped minimize accumulated errors.

Same chat = the model is anchored to what it already said and tends to defend it. Fresh chat = the model evaluates the argument cold. If you’ve ever asked ChatGPT “are you sure?” and watched it cave and rewrite a correct answer, this is why.

Step 3: Treat output as a sketch, not a deliverable

The OpenAI announcement is unusually honest about this. The model produced a valid proof – and humans still had to make it presentable. If a Fields medalist’s team had to clean up the output before publication, your weekend project will too.

Pro tip: when the model gives you a long argument or code block, paste it into a fresh conversation with the prompt “act as a skeptical reviewer. List every step where this argument might fail, in order of likelihood.” You’ll catch 80% of the holes in five minutes.

A Real Example: The Unit Distance Setup, In Plain Words

Start with the geometry: n points on a flat plane, and you want as many pairs as possible to sit exactly one unit apart. Call that maximum v(n). Erdős’s intuition – backed by decades of square-grid examples – was that v(n) couldn’t grow much faster than n itself. The model’s answer was to stop thinking in grids entirely.

The move that mattered: swap out the Gaussian integers (the algebraic foundation underneath those grids) for more complicated generalizations from class field tower theory. More algebraic complexity → many more unit-length differences → a polynomial improvement over every prior construction. That cross-domain leap – geometry problem, number theory tool – is the kind of thing LLMs are unusually good at, because they’ve absorbed both fields and don’t carry the human habit of “that’s not my area.” You can read the full proof PDF if you want the specifics.

Here’s the catch the press release downplays: the construction is, with hindsight, a natural (though highly non-trivial) generalisation of Erdős’s original lattice-based construction. It’s brilliant. It’s also not from outer space. The model found a path that was reachable from existing literature – exactly the kind of cross-reference task LLMs excel at and humans get tunnel vision on.

What Actually Carries Over – And What Doesn’t

The cross-domain trick works in non-math too. When stuck, try: “What techniques from [unrelated field] might apply to [my problem]?” Connecting things you wouldn’t connect is where these models earn their keep.
Fresh-chat verification. Single highest-use habit from the Ryu workflow. Open a new tab, paste in the result, ask the model to poke holes in it cold.
Demand artifacts, not summaries. If the AI claims a result, ask for the explicit construction, code, or proof – not a description of one. The model will happily describe something it can’t actually produce.
You’re the editor, not the reader. Even the OpenAI team rewrote their own model’s output. Expect that.

This all happened with an internal model – as of late May 2026, you and I don’t have access to it. The publicly available models (GPT-5, Claude, Gemini) are capable of the exploratory partnering Ryu demonstrated, but not the autonomous solve. That distinction matters.

FAQ

Can I use ChatGPT today to do this kind of math research?

For exploration and idea generation, yes. For an autonomous breakthrough proof, no – the model that did this isn’t public. Use what you have to brainstorm and stress-test arguments.

How do I know if an AI proof is actually correct?

The short version: you can’t, by yourself, on a research-level result. The published OpenAI work cleared the bar precisely because external mathematicians wrote a companion paper reviewing it. Tim Gowers said he would have recommended acceptance to the Annals of Mathematics without hesitation if a human had submitted it – and that endorsement is what makes it credible, not the AI’s confidence. For your own work, the practical rule is: convert the proof to formal code (Lean, Coq) if the stakes are high, or get a domain expert to review.

Is this the start of AI replacing mathematicians?

No, and the OpenAI paper itself is the strongest evidence. Humans still wrote the cleanup, the framing, the companion analysis, and chose which problems to point the model at. The role shifts from “do every step” to “direct, verify, and contextualize” – which is closer to how senior researchers already work with junior collaborators.

Your next move: pick one hard problem you’ve been stuck on. Open two ChatGPT tabs. Tab one is for brainstorming techniques from adjacent fields. Tab two – fresh, no context – is for checking whatever tab one gives you. Run that loop for an hour and see what falls out.