Skip to content

Install OpenAI Whisper v20250625: Local Setup Guide

Deploy OpenAI Whisper v20250625 locally for offline speech-to-text. Real install commands, model size tradeoffs, and the setuptools_rust fix nobody warns you about.

7 min readIntermediate

Two ways to run OpenAI Whisper locally. Both work. Only one is what you probably want.

Path A is the official openai-whisper Python package – clean, MIT-licensed, blessed by OpenAI. Path B is faster-whisper, a community port built on CTranslate2 that runs the same models measurably faster on the same hardware with lower VRAM usage. Deploying for production transcription volume? Path B wins on raw economics. Learning, prototyping, or want the reference implementation that matches every tutorial? Path A wins on simplicity.

This guide installs Path A – openai/whisper – at the latest release, v20250625 (released June 26, 2025, per the official GitHub releases page). Where faster-whisper makes more sense, I’ll say so.

The choice between them is basically: do you want the thing OpenAI ships, or the thing the community has optimized? There’s no wrong answer – just depends on whether GPU-hours show up on your bill yet.

What you actually need (and what tutorials lie about)

Python 3.13 will burn you. Pip resolves the install cleanly, then imports fail – the worst kind of silent breakage. According to an Emory University Libraries research guide, Whisper currently doesn’t work with Python 3.13; stick to 3.10 or 3.12. The official README (tested against Python 3.9.9 and PyTorch 1.10.1) lists compatibility as Python 3.8-3.11, but 3.12 works in practice as of v20250625.

Resource Minimum Recommended
OS Linux/macOS/Windows 10+ Linux or macOS
Python 3.8 3.11 or 3.12
RAM (tiny model) ~1 GB 2 GB
RAM/VRAM (large/turbo) ~10 GB 16 GB GPU
Disk 5 GB 10 GB (cache + models)
FFmpeg Required Latest stable

Speed benchmarks in the README were measured on an A100 GPU (as of v20250625). Real-world speed varies – a lot – depending on language, speaking cadence, and hardware. Ignore the speed multipliers if you’re on a laptop.

Install Whisper v20250625

Full sequence on a clean Ubuntu/macOS box:

# 1. Verify Python (must be 3.8-3.12)
python3 --version

# 2. Install FFmpeg
# Ubuntu/Debian
sudo apt update && sudo apt install ffmpeg
# macOS
brew install ffmpeg
# Windows (with Chocolatey)
choco install ffmpeg

# 3. Create an isolated venv (skip at your peril)
python3 -m venv whisper-env
source whisper-env/bin/activate # Windows: whisper-envScriptsactivate

# 4. Install Whisper
pip install -U openai-whisper

The official README gives two install paths: pip install -U openai-whisper for the PyPI release, or pip install git+https://github.com/openai/whisper.git when you need a fix that hasn’t shipped yet. Use the git version sparingly – it pulls whatever’s on main, which isn’t always stable.

GPU acceleration: install a CUDA-matched PyTorch build beforepip install -U openai-whisper. Do it after, and pip pulls the CPU-only torch wheel. Your RTX 4090 will sit completely idle while you wonder why transcription is slow.

First-time configuration and verification

No config file. Whisper configures at call time. Minimum sanity check:

whisper --help
whisper your-audio.mp3 --model tiny --language English

First run with any model triggers a silent download. The tiny model is roughly 75MB (community-reported; exact size varies by version). The large model? Around 2.8GB – and users routinely lose track of where it lands. It goes to ~/.cache/whisper/ on Linux/macOS and %USERPROFILE%.cachewhisper on Windows (confirmed via GitHub Discussion #1762, where someone discovered their 2.8GB large model had quietly appeared there). Check that directory before freaking out about missing disk space.

For programmatic use:

import whisper
model = whisper.load_model("turbo")
result = model.transcribe("audio.mp3")
print(result["text"])

The turbo model – per the official README – is an optimized version of large-v3: faster transcription with minimal accuracy drop. For most users with a decent GPU, it’s the right pick. Better than medium, close enough to large-v3 that the difference rarely matters.

Pro tip: Transcribing English-only content with small or below? Append .en to the model name (e.g. base.en). The English-only variants of tiny and base are noticeably more accurate on English audio than the multilingual versions, though the README notes the difference becomes less significant for small.en and medium.en.

The install errors you’ll actually hit

Three failures dominate the GitHub issues. None of them are documented clearly upfront.

  • No module named 'setuptools_rust' – Hits on minimal Linux containers and some Windows boxes. The README acknowledges it and gives the fix: pip install setuptools-rust, then retry. Takes 30 seconds once you know what it is.
  • 'whisper' is not recognized as an internal or external command – Turns out this is a PATH issue, not an install failure. GitHub Discussion #1939 documents a user who had a successful pip install -U openai-whisper but couldn’t run the CLI because Python’s Scripts/ directory wasn’t on PATH. Fix: activate your venv before calling whisper, or add the Scripts directory to PATH manually on Windows.
  • Model fails to load on first run – Usually a network timeout mid-download. Re-run the same command; it resumes from the partial file in the cache directory.

One more edge case worth knowing: CUDA errors that don’t actually crash Whisper. It silently falls back to CPU, and you have no idea until transcription takes 10 minutes on a 2-minute file. Verify with python -c "import torch; print(torch.cuda.is_available())". If that’s False, your PyTorch wheel is CPU-only – reinstall with the right CUDA version.

Should you have used faster-whisper instead?

Honest answer: probably yes, if you’re shipping anything.

The official repo releases roughly once a year. Community forks move faster on performance, quantization, and batching. As of v1.9.1, the whisper-asr-webservice project pins three actively maintained backends together: openai/whisper@v20250625, SYSTRAN/[email protected], and [email protected]. The fact that a popular wrapper explicitly supports all three tells you something about what people actually deploy versus what they experiment with.

The official repo is the reference implementation. The forks are the production tools. Both are MIT-licensed. Pick based on whether inference speed matters at your scale.

Upgrade and uninstall

Force a clean reinstall from the latest commit when the PyPI release is behind main:

pip install --upgrade --no-deps --force-reinstall git+https://github.com/openai/whisper.git

That command comes straight from the official README. Uninstall:

pip uninstall openai-whisper
# Remove cached models (can be several GB)
rm -rf ~/.cache/whisper

Don’t skip the cache step. That’s how you end up with 8GB of orphaned model weights you can’t account for.

FAQ

Does Whisper run offline after install?

Yes. First load_model() call hits the network; every run after that is fully local.

Why does the install break on Apple Silicon sometimes?

PyTorch on M1/M2/M3 uses Metal Performance Shaders (MPS) instead of CUDA, and some Whisper code paths optionally call into Triton – which is CUDA-only. Community reports suggest this is the source of most Apple Silicon install headaches (this may have changed in recent PyTorch versions). Start with --device cpu to rule out MPS as the variable, then verify MPS is actually available with python -c "import torch; print(torch.backends.mps.is_available())". If MPS shows True and you’re still seeing errors, the Triton dependency is the likely culprit. The small and medium models generally run well on MPS; large-v3 on 16GB unified memory is tight.

What’s the actual license – can I use this commercially?

MIT License, per the official README. Code and model weights both. Commercial use, modification, redistribution – all permitted with attribution. No restrictions on use case.

Next step: grab a 30-second audio clip, run whisper test.mp3 --model turbo --output_format srt, and check the timestamps. If the SRT file looks right, you’re done.