Skip to content

Install OpenAI Whisper v20250625: Open Source Speech Recognition Guide

Deploy OpenAI's Whisper v20250625 locally for open source speech recognition. Real install commands, VRAM specs, and the GPU bug nobody warns you about.

6 min readIntermediate

Most Whisper install guides walk you through five tools in order, declare victory, then leave you wondering why your RTX 4090 transcribes audio at the same speed as your laptop CPU. The answer is almost always the same: you installed PyTorch after openai-whisper. pip already pulled a CPU-only torch wheel as a dependency, your GPU is sitting idle, and nothing in the install log told you so.

This guide reverse-engineers the right approach from that single bug. We’ll set up OpenAI Whisper v20250625 – the latest release as of June 2025 – in an order that doesn’t sabotage GPU acceleration, then verify it actually works.

What you’re actually deploying

Whisper is the encoder-decoder transformer behind strong Speech Recognition via Large-Scale Weak Supervision. Code and model weights are MIT licensed – ship it inside commercial products without paperwork.

The current top open checkpoints, per the official README: large-v3 (released November 2023, ~1.55B parameters) and large-v3-turbo (October 2024, an optimized version of large-v3 with minimal accuracy loss and roughly 5x the speed). If you searched for “Whisper v4,” it doesn’t exist as of this writing – the version string v20250625 is a package release date, not a new model architecture.

Which model do you actually need?

The official README is vague on hardware because Whisper scales from a Raspberry Pi to an H100. Here are the model sizes with VRAM notes – disk sizes are confirmed from the README; VRAM figures for the smaller models are community estimates and can vary with batch size and sequence length:

Model Disk VRAM (FP16, est.) Realistic use case
tiny 75 MB ~1 GB Voice commands, low-latency
base 142 MB ~1 GB English podcasts on a laptop
small 244 MB ~2 GB Default for multilingual jobs
medium 769 MB ~5 GB Production transcription
large-v3 1.55 GB ~10 GB Maximum accuracy
large-v3-turbo ~6 GB Best speed/accuracy tradeoff

The turbo recommendation isn’t just convenience: 6 GB VRAM means it fits on an RTX 3060. large-v3 at ~10 GB fits on an RTX 3080 or better. Both figures are sourced from the official README and verified in the Codersera 2026 community benchmark. For the four smaller models, treat the VRAM column as a starting point, not a guarantee.

Software-wise: Python 3.8-3.11 (confirmed by the official README), FFmpeg in your PATH, and a CUDA-matched PyTorch build if you have an NVIDIA GPU. Python 3.12 and 3.13 may work on the latest release, but the official compatibility matrix still stops at 3.11 as of June 2025 – worth rechecking if you’re reading this later.

Install – the correct order

The sequence below is deliberate. Reverse any two steps and you’ll either lose GPU acceleration or hit a build error.

  1. Create a clean virtual environment. Whisper has heavy dependencies that conflict with other ML projects.
    python -m venv whisper-env
    source whisper-env/bin/activate # macOS/Linux
    whisper-envScriptsactivate.bat # Windows
  2. Install PyTorch FIRST, matched to your CUDA version. The install-selector at pytorch.org generates the right command for your OS and CUDA version. Example for CUDA 12.1:
    pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
  3. Install FFmpeg at the system level.
    # macOS
    brew install ffmpeg
    # Ubuntu/Debian
    sudo apt install ffmpeg
    # Windows (with Chocolatey)
    choco install ffmpeg
  4. Install Whisper itself.
    pip install -U openai-whisper

    Or for the latest commit directly from GitHub:

    pip install git+https://github.com/openai/whisper.git

Verify GPU before transcribing anything: Run python -c "import torch; print(torch.cuda.is_available())". Prints False with an NVIDIA card? Something during the Whisper install downgraded torch to CPU-only. Reinstall PyTorch with --force-reinstall before you burn an hour on a batch job.

First run and model caching

Whisper doesn’t ship with weights. On first use, models download to ~/.cache/whisper/ – up to 1.55 GB for large, and some terminals show zero download progress during this. Don’t kill the process if it looks frozen for a minute or two. If you watch ~/.cache/whisper/ in another terminal, you’ll see the file growing.

Quick smoke test with the turbo model:

whisper audio.mp3 --model turbo --output_format srt

Watch nvidia-smi in a second terminal while this runs – VRAM should jump immediately. No spike means CPU-only. To force CPU explicitly: --device cpu. To check what version you have installed:

pip show openai-whisper | grep Version

Errors you’ll probably hit

“ModuleNotFoundError: No module named ‘setuptools_rust'” – happens during the tokenizers build step. The fix reported in community threads: pip install setuptools-rust (hyphen, not underscore – the package name and the Python module name differ, which is its own special kind of annoying).

“NewConnectionError… setuptools>=40.8.0” on an air-gapped machine – pip’s build isolation fetches setuptools from PyPI even when it’s already installed locally. Documented in discussion #1463. Workaround: pip install --no-build-isolation openai-whisper after manually staging all wheels offline.

FFmpeg “not recognized” on Windows – Chocolatey installed it but the PATH hasn’t refreshed yet. Close and reopen your terminal, then confirm with ffmpeg -version.

Asked for “large” but got large-v2 behavior – your openai-whisper version is older than 20231106. The large alias only points to large-v3 from that release onward, per the official announcement. Upgrade with the force-reinstall command in the next section.

Upgrade, uninstall, alternatives

Whisper uses date-stamped releases, not semantic versioning. To pull the latest from GitHub without clobbering your CUDA PyTorch install:

pip install --upgrade --no-deps --force-reinstall git+https://github.com/openai/whisper.git

The --no-deps flag is the key part – skip it and pip will happily re-resolve dependencies and pull a CPU wheel again.

To remove Whisper entirely:

pip uninstall openai-whisper
rm -rf ~/.cache/whisper # delete cached models - can be 5+ GB across all sizes

Is the reference Python package actually the right one for your use case? Often not. whisper.cpp (v1.7.5 as of this writing) runs on CPU and Apple Silicon with Core ML – roughly 3x faster than CPU-only PyTorch on a Mac, per the whisper.cpp project docs. faster-whisper (v1.1.1) supports INT8 and 4-bit quantization that drops large-v3 into the 1.5-4 GB VRAM range, making it viable on 8 GB consumer cards where the reference package would OOM. The reference Python package is the right call when you need exact output parity with the published paper, or when you’re fine-tuning. For pure inference volume, faster-whisper usually wins on VRAM and throughput.

FAQ

Do I need a GPU to run Whisper?

No. tiny and base run in real time on a modern laptop CPU. large on CPU takes longer than the audio’s actual duration – for a one-hour recording, that’s a long wait. GPU is optional, not required.

Why does turbo get recommended over large-v3 if large-v3 is more accurate?

Because the accuracy gap is small enough most users can’t notice it, and turbo fits on 6 GB cards instead of 10 GB. Transcribing a meeting on an RTX 3060: turbo gets you a result in a few minutes; large-v3 OOMs. The exception worth knowing: heavily accented speech in non-English languages, where the extra parameters in large-v3 do measurably help. The WER tables in the paper’s Appendix D break this down by language – check them before committing to one model for a production workload.

Can I use Whisper commercially?

Yes. MIT license covers both code and weights.

Next step: grab a 3-minute audio file, run the turbo command above, and keep nvidia-smi open in a second terminal. No VRAM spike means your install is CPU-only and step 2 needs to be redone – catch it now, not after a 6-hour batch run.