Skip to content

Wav2Lip Install Guide: AI Lip Sync That Actually Runs

Step-by-step Wav2Lip install for AI lip sync - Easy-Wav2Lip v8.3 path, the librosa pin that breaks builds, and a working verification command.

7 min readIntermediate

Two ways to install Wav2Lip for AI lip sync today, and they aren’t equal. Clone the original Rudrabha/Wav2Lip repo from 2020 and you’ll spend an evening fighting Python 3.6 and CUDA mismatches. Or use Easy-Wav2Lip v8.3, a community fork that scripts the whole install. For most people in 2025, the second path wins. The model weights are identical – you’re picking installer ergonomics, not quality.

The original repo backs the ACM Multimedia 2020 paper by Prajwal et al. (DOI 10.1145/3394171.3413532) – the gold standard if you need research reproducibility. Per the official README, it pins Python 3.6 and ffmpeg via apt-get. Fine if you’re frozen in 2020. If you have an RTX 30/40-series card or an Apple Silicon Mac, those pins fight you.

Picking your install path

Honest tradeoff before any commands run:

Option Best for Pain points
Original Rudrabha/Wav2Lip Reproducing the paper, training Python 3.6 EOL, CUDA 10.1 wheels, manual checkpoint download
Easy-Wav2Lip v8.3 Just generating lip-synced video Needs CUDA 12.2; one-shot config.ini workflow
OpenVINO notebook CPU-only inference on Intel Slower; Jupyter setup required

Think of the two repos as the same engine in two different cars. Original Wav2Lip is the stripped race car: every bolt exposed, nothing automated, maximum control if you know what you’re doing. Easy-Wav2Lip is the same engine with a proper dashboard. Same horsepower – fewer afternoons debugging wheel compatibility.

One license note other tutorials skip: per the official Wav2Lip README, the original repository is for personal, research, or non-commercial use only – commercial requests go through Sync Labs. Easy-Wav2Lip inherits that license. If you’re shipping a product, this matters before you write a single line of integration code.

System requirements

GPU first: Easy-Wav2Lip v8.3 requires an Nvidia card supporting CUDA 12.2, or a Mac with Apple Silicon / AMD GPU via mps (per the Easy-Wav2Lip README). Run nvidia-smi – if the CUDA version shown is below 12.2, update your Nvidia driver before touching the installer. Everything else fails downstream if you skip this.

  • OS: Windows 10/11 or Linux (Ubuntu 20.04+); macOS supported on Easy-Wav2Lip via mps
  • GPU: CUDA 12.2-capable Nvidia card, or Apple Silicon / AMD Mac
  • Disk: allow for the environment plus model checkpoints – the s3fd face-detection model alone is ~85.7 MB (per the Wav2Lip README); total will vary
  • Python: Python 3.6 for the original repo (use a conda env); check the Easy-Wav2Lip README for its current Python requirement before installing
  • ffmpeg: on PATH and callable from terminal

Install Easy-Wav2Lip v8.3 (recommended path)

On Windows, a single batch file handles install and run (per the Easy-Wav2Lip README). No manual venv creation, no hunting for wheel URLs.

Windows one-shot install

  1. Download Easy-Wav2Lip.bat from the Installers branch.
  2. Place it in an empty folder (e.g., C:UsersyouDocumentsAI).
  3. Double-click. The bat file checks for required software, sets up a venv, installs ffmpeg into it, then loops between config and processing – no separate install command needed.

Manual / Linux / Mac install

git clone https://github.com/anothermartz/Easy-Wav2Lip.git
cd Easy-Wav2Lip
python -m venv ../Easy-Wav2Lip-venv
source ../Easy-Wav2Lip-venv/bin/activate # Windows: ..Easy-Wav2Lip-venvScriptsactivate
pip install -r requirements.txt
python install.py

After install.py completes, a config.ini opens automatically. That’s your control panel going forward.

First-time configuration

CLI flags are gone. Easy-Wav2Lip uses a flat config file – edit it, save, close, and the run starts. Per the Easy-Wav2Lip README, output saves next to your video_file path.

[OPTIONS]
video_file = C:pathtoyour_clip.mp4
vocal_file = C:pathtoyour_audio.wav
quality = Improved
output_height = full resolution
wav2lip_version = Wav2Lip_GAN
use_previous_tracking_data = True

Pick Improved over Fast for anything you’ll show another human. Enhanced adds GFPGAN upscaling – looks better, takes longer.

Cache tip: Easy-Wav2Lip stores face-detection data per video. Re-run with different audio on the same clip and it skips the detection pass entirely. Per the Easy-Wav2Lip README, a 9-second 720p 60fps test dropped from ~7 minutes to under 1 minute on a Colab T4 between first and second run. First run: make tea. Second run: basically instant.

Verify it works

5-second clip, single face, decent lighting. Any wav file. Save config, close. You’ll see frame-extraction logs, then face-detection, then a generated MP4 drops beside your input.

Open it. Lips moving roughly in time with audio? Done. Two mouths or a sliding chin? Not a broken install – that’s a runtime tuning problem. The fixes are in the next section.

Common errors and real fixes

librosa.filters.mel TypeError

The error: TypeError: mel() takes 0 positional arguments but 2 positional arguments...

Diagnosis first: librosa 0.10 (released March 18, 2023) changed the mel() function signature, breaking Wav2Lip’s audio.py. The original repo never pinned against it. Fix confirmed in GitHub issue #465 – downgrade inside your venv:

pip install librosa==0.9.2

ModuleNotFoundError: No module named ‘ffmpeg’

The catch: this isn’t about the ffmpeg binary being missing. It’s a missing Python wrapper – a completely separate package. Reported across multiple machines in GitHub issues #671 and #676 (May 2024). Two fixes needed, not one:

pip install ffmpeg-python

Then confirm the binary: ffmpeg -version. Both must exist. Fixing just the pip package while the binary is absent gives you a different error immediately after.

s3fd.pth missing or 404

The original repo expects a face-detection checkpoint at a specific path. Per the Wav2Lip README, if auto-download fails, place it manually – the file is ~85.7 MB:

mkdir -p face_detection/detection/sfd/
wget -O face_detection/detection/sfd/s3fd.pth 
 https://www.adrianbulat.com/downloads/python-fan/s3fd-619a316812.pth

Mouth offset, two mouths, sliding chin

Runtime args, not install bugs. Per the Wav2Lip README: --nosmooth fixes the two-mouths artifact (caused by over-smoothed face detections). For chin clipping, --pads 0 20 0 0 adds bottom padding. In Easy-Wav2Lip, these are config.ini fields (nosmooth, U/D/L/R padding) rather than CLI flags.

CUDA version mismatch: RTX 30/40 series + original repo

The original repo’s listed torch==1.7.1+cu101 wheels don’t exist for modern CUDA 12 GPUs. Easy-Wav2Lip was rebased onto a newer PyTorch stack specifically to close this gap – if you’re on an RTX 30 or 40 series card, this is the main reason to use the fork rather than fighting wheel compatibility manually. Conversely, if you’re on an older Pascal or Turing card stuck at CUDA 11, the original repo’s stack often behaves better. Pick the fork to match your hardware.

Upgrade and uninstall

To uninstall Easy-Wav2Lip cleanly: delete the Easy-Wav2Lip and Easy-Wav2Lip-venv folders. No registry entries, no leftover services. Models live inside that folder – removing it reclaims everything.

For the original repo: conda env remove -n wav2lip, then delete the cloned directory.

What does long-term maintenance actually look like for a project like this? The original Rudrabha repo hasn’t had a major commit in years – it’s stable, not active. Easy-Wav2Lip gets periodic updates from the community. Worth checking the releases page before starting a new project that depends on either.

FAQ

Do I need a GPU to run Wav2Lip?

There’s an OpenVINO CPU notebook that works on Intel processors. But CPU inference is too slow for anything beyond a 5-second test clip – plan for a GPU.

Why does Easy-Wav2Lip require CUDA 12.2 when the original repo wants CUDA 10.1?

The original was built when CUDA 10.1 + Torch 1.7 was current. Modern RTX 30/40 cards have no working wheels for that combo – that’s why the Easy-Wav2Lip fork rebased onto a newer stack. If you’re on an older Pascal or Turing card and stuck on CUDA 11, the original repo’s stack will likely behave better. Pick your fork to match your hardware, not the reverse. Running the wrong combination means hours of manual wheel hunting that still might not resolve cleanly.

Can I use Wav2Lip output commercially?

No – the pretrained weights from the original repo are research-only. Easy-Wav2Lip uses the same restricted weights, same limit. For commercial work: Sync Labs hosted API, or train your own model on commercially-licensed data.

Next: grab a 5-second clip with a clean front-facing shot, run it through Easy-Wav2Lip with default settings, then re-run with nosmooth = True and bottom padding of 10. Compare the two outputs side-by-side – that A/B is the fastest way to learn what each parameter actually does on your specific face data.