Skip to content

ChatGPT Picks 7300-7500 Every Time: Why LLMs Can’t Do Random

A viral claim says GPT picks 7300-7500 for numbers between 1-10,000. We tested it. Turns out LLMs can't generate randomness - here's what actually happens and why it matters.

7 min readBeginner

Someone just bet their house that if you ask GPT to pick a number between 1 and 10,000, it’ll land in the 7300-7500 range. Every time.

Is that actually true?

Sort of. The claim went viral on Hacker News four days ago (as of March 25, 2025), and people are testing it right now. The wild part isn’t just that GPT has a favorite range – it’s that all the major LLMs do this, across every number range you throw at them. Ask for 1-50, you get 27. Ask for 1-100, you get 42 or something ending in 7. The pattern scales, and it’s not a bug.

It’s how these models work.

What Happens When You Actually Test This

Open ChatGPT. Fresh conversation. Type: “Pick a random number between 1 and 10,000.”

7400. Maybe 7350. GPT-4o and GPT-3.5 both cluster in the 7000s – exact range varies (some users report 7200-7500, others see 7300-7600), but the bias is real. Now try smaller.

“Pick a random number between 1 and 50.”

27. So will Claude. So will Gemini. The Register tested this in June 2025 – ChatGPT, Claude Sonnet 4, Gemini 2.5 Flash, and Llama 4 all answered 27.

“Pick a random number between 1 and 10.”

According to a Springboards.ai experiment (2025), GPT-4o said “7” in 92 out of 100 fresh conversations. Claude 3.5 Sonnet: 90 times. Gemini 2.0 Flash? A perfect 100 out of 100.

This isn’t randomness. It’s a preference function pretending to be random.

Why LLMs Can’t Actually Generate Random Numbers

When you ask an LLM for a random number, you’re not triggering some internal dice roll.

You’re asking it to predict the most likely next token based on “random number between 1 and X.” And what did humans write when they talked about picking random numbers? They picked 7. A lot. Mid-range numbers. They avoided round numbers – 10, 20, 50. Skipped doubles like 11, 22, 33. They thought 27 “felt random” because it’s odd, not a multiple of 5, not too symmetrical.

LLMs learned all of that. Elad Hirsch tested GPT in July 2025 across 9 temperature settings (0.0 to 2.0) with 10,000 trials each. Even at maximum “creativity”? 3.5% deviation from true randomness. A real random number generator should be under 0.3% – the LLM is more than 10x worse.

In practice: ask it for 1,000 numbers between 1-50. Instead of each number appearing ~20 times, favorites like 27 or 37 show up 60+ times. Others? Maybe 5.

Real randomness doesn’t care about patterns. LLMs are built to find patterns.

Cranking up the temperature doesn’t fix it. Rephrasing the prompt doesn’t either. Actually, asking for “completely unpredictable” or “more random” makes it worse – when Springboards.ai tested that phrasing, GPT-4o returned “quokka” 155 times out of 600. That’s 25%.

One Workaround That Actually Works

GPT doesn’t show this bias when it writes code instead of “thinking.”

If you’re using GPT-4 with Advanced Data Analysis (formerly Code Interpreter), it sometimes realizes you want true randomness and writes Python:

import random
print(random.randint(1, 10000))

That produces uniform distribution. The catch: the model has to choose to do this. Not guaranteed. In a basic ChatGPT conversation? It’ll just guess.

Another option comes from recent research. A December 2024 paper on B-score (arXiv:2505.18545v1) found that LLMs in multi-turn conversations – where they can see their previous answers – produce much less biased outputs. Single-turn mode: highest selection probability 0.77 for random questions. Multi-turn mode: dropped to 0.29.

What that means: ask GPT to pick 10 random numbers in a row in the same chat. It’ll notice it keeps saying “7” and start varying. Not true randomness – it’s pattern-matching against its own responses to avoid looking repetitive.

Does this work for the 7300-7500 claim? Probably. But no one’s verified it yet – the observation is only 4 days old. We know the multi-turn workaround reduces bias for small ranges (1-10, 1-50). Whether it scales to 1-10,000? Open question.

Where This Actually Matters (And Where It Doesn’t)

Should you care? Depends.

Don’t use LLMs for:

  • Password generation
  • Cryptographic keys or tokens
  • Lottery number selection (obviously)
  • Simulations that require statistical randomness
  • Security-critical random sampling

The MITRE CWE database lists at least 10 documented security weaknesses related to poorly implemented random number generators. If an attacker knows your “random” tokens come from a biased LLM, they can predict patterns.

LLMs are fine for:

  • Picking a topic for your next blog post
  • Choosing a restaurant from a list (if you don’t care about true fairness)
  • Generating creative prompts or story ideas
  • Anything where “random-ish” is good enough

The principle: if the cost of bias is low, LLMs work. If someone could exploit predictability, use a real random number generator.

But What About the 7300-7500 Claim?

No systematic study yet confirms it. Community reports are all over the place – some see 7400s, others 7200s, a few report 6000s or 8000s. The claim showed up on Hacker News four days ago. Zero large-scale trials.

What we know: the mid-range bias holds across every range tested so far (1-5, 1-10, 1-50, 1-100). Whether 7300-7500 is the consistent center for 1-10,000, or whether it’s noisier at larger scales – no one’s run 10,000 trials yet.

That’s the gap. We know LLMs are bad at randomness. We don’t yet know how bad at scale.

How to Actually Get Random Numbers When You Need Them

Use these instead:

Use Case Tool Why
Quick random number random.randint() (Python), Math.random() (JavaScript) Pseudorandom, good enough for non-security tasks
Cryptographic security secrets module (Python), crypto.getRandomValues() (JS) Uses OS-level entropy sources
True quantum randomness QRNG hardware, Random.org API Physical entropy from quantum phenomena or atmospheric noise
LLM-adjacent randomness Ask GPT to write code that calls a random function Delegates to a real RNG instead of guessing

Instead of:

“Pick a random number between 1 and 100.”

Try:

“Write Python code to generate a random number between 1 and 100, then execute it.”

GPT-4 with code execution enabled produces actual uniform randomness. Using the API? Ask it to return code as JSON and run it yourself.

Why This Tells Us Something Bigger About LLMs

The number bias isn’t just a curiosity. It shows what LLMs are.

These models don’t “understand” random. To them, it’s just another token – like “purple” or “analyze” or “delicious.” When they see “pick a random number,” they’re not accessing some internal randomness module. They predict: what did humans write after this phrase in my training data?

Humans wrote “7” a lot. They wrote “27.” They wrote “42” because of Hitchhiker’s Guide memes. They avoided 1, 10, 50, 100 because those “feel” too obvious.

LLMs learned that. Most models train on overlapping internet-scale datasets, so they converge on the same biases. Five different LLMs all say “27” when you ask the same question – not a coincidence.

Does that mean LLMs are “just” pattern matchers with no reasoning? Not quite. They reason through patterns. When the task requires breaking patterns – like true randomness – they fail.

FAQ

Can you train an LLM to generate better random numbers?

You’d have to change how it works. LLMs predict likely tokens. Randomness requires unlikely tokens. Better solution: connect the LLM to an external random number generator via function calling.

Why does asking for “more random” make output less random?

Because “unpredictable,” “chaotic,” and “completely random” are themselves patterns in the training data. When humans wrote those phrases, they often followed them with unusual words like “quokka,” “zephyr,” or “serendipity” – words that feel random to us. The LLM learned that association. So when you ask for “unpredictable,” it gives you the most predictable version of unpredictability: literary words that show up in “random word” lists online.

If I test this myself and get a different number, does that disprove the bias?

The bias is statistical, not absolute. One test that returns 34 instead of 27? Expected variability (especially at higher temperatures). Run hundreds or thousands of trials – certain numbers will appear 2-3x more often than true randomness predicts. A single result tells you nothing. The distribution tells you everything.

Next time someone asks you to pick a number, try asking ChatGPT first. Then ask yourself: if a machine trained on all human knowledge makes the same biased choice you would, does that mean any of us are really “random”?