Install Wan 2.2: Open Source Video Generation Setup Guide

Step-by-step guide to deploy Wan 2.2, Alibaba's open source video generation model, with real install commands, flash-attn fixes, and VRAM tips.

Riley Brooks2026-05-128 min readIntermediate

Two ways to get Wan 2.2 running locally, and they’re not equally good. The official README points you at pip install -r requirements.txt with a note that flash_attn should be installed last. Some community posts suggest Poetry. Pick pip. Poetry adds a lock-file layer that makes flash_attn build failures harder to diagnose – and you’ll have at least one flash_attn failure. Everyone does.

This guide deploys Wan 2.2 the way it actually works on consumer hardware: TI2V-5B on a single RTX 4090, with the open source video generation pipeline running end-to-end. The 14B variants are mentioned but not the focus – if you have an 80GB H100, you don’t need this tutorial.

What you’re actually deploying

Wan 2.2 shipped on July 28, 2025 from Alibaba’s Tongyi Lab under Apache 2.0, with commercial use allowed. Three model variants exist:

T2V-A14B – text-to-video, MoE, 27B params total (~14B active per step)
I2V-A14B – image-to-video, same MoE structure
TI2V-5B – combined text+image to video, dense 5B with high-compression VAE

The TI2V-5B is the one you want unless you have datacenter-class GPUs. Per the HuggingFace model card, its Wan2.2-VAE hits a 16×16×4 compression ratio, supports 720P at 24fps, and runs on a 4090 in under about 9 minutes. The A14B variants need 80GB VRAM for the headline command in the README – that’s an H100 or A100.

There’s a gap worth naming here. The README’s featured example and the benchmark numbers are all for the A14B. Most readers have a 4090. The 5B model is almost a footnote in the official docs but it’s the actual on-ramp for the vast majority of people who want to run this locally. Keep that framing mismatch in mind as you read the README.

System requirements

Component	Minimum (TI2V-5B)	Recommended
GPU VRAM	~12 GB with offload flags	24 GB (RTX 4090 / 3090)
System RAM	32 GB	64 GB
Disk	See model card for current weights size	100 GB SSD
Python	3.10 or 3.11	3.10 (best flash_attn compatibility)
CUDA	12.1+	12.4
PyTorch	≥ 2.4.0 (per requirements.txt)	2.5.x – stay below 2.8
OS	Linux / WSL2	Ubuntu 22.04 or 24.04

Skip Windows native if you can. Every flash_attn-related ticket in the Wan2.2 issue tracker that I read was filed from Windows.

Install Wan 2.2 step by step

1. Clone and set up a clean environment

git clone https://github.com/Wan-Video/Wan2.2.git
cd Wan2.2
conda create -n wan22 python=3.10 -y
conda activate wan22
pip install --upgrade pip setuptools wheel

Python 3.10 specifically – not 3.13. Flash_attn’s prebuilt wheels stop at Python 3.12, and per community reports (as of August 2025), the torch 2.8 + Python 3.13 combination has no working flash_attn path yet.

2. Install PyTorch BEFORE the requirements file

pip install torch==2.5.1 torchvision==0.20.1 --index-url https://download.pytorch.org/whl/cu124

This is the step the README skips. Run pip install -r requirements.txt first and flash_attn’s build script fails with ModuleNotFoundError: No module named 'torch' – documented in issue #111. The reason: pip processes flash_attn’s build dependencies in an isolated environment that doesn’t see the torch you’re installing in the same command. Pre-installing torch breaks that loop.

3. Install the rest, flash_attn last

# Edit requirements.txt - comment out the flash_attn line temporarily
pip install -r requirements.txt

# Now install flash_attn separately
# Option A: prebuilt wheel (fast, may not match your exact CUDA build)
pip install flash-attn --no-build-isolation

# Option B: build from source (slower, but resolves ABI mismatches)
# Documented in issue #166 as the reliable path for flash_attn 2.8.3
git clone https://github.com/Dao-AILab/flash-attention.git
cd flash-attention
MAX_JOBS=4 pip install . --no-build-isolation
cd ..

The docs say install flash_attn last. What they don’t say is that it fails nearly every time on a fresh environment. Option A works if your CUDA build aligns; go straight to Option B if you hit the undefined symbol error described below.

4. Download the model weights

pip install "huggingface_hub[cli]"
huggingface-cli download Wan-AI/Wan2.2-TI2V-5B --local-dir ./Wan2.2-TI2V-5B

Check the HuggingFace model card for current weight sizes before downloading – these can change across revisions. ModelScope is a mirror option if HuggingFace is slow in your region.

First-time configuration and verification

Run the smallest possible generation to confirm everything wired up correctly:

python generate.py 
 --task ti2v-5B 
 --size 1280*704 
 --ckpt_dir ./Wan2.2-TI2V-5B 
 --offload_model True 
 --convert_model_dtype 
 --t5_cpu 
 --prompt "A red panda playing piano in a sunlit room, cinematic lighting"

Those three flags – --offload_model True, --convert_model_dtype, and --t5_cpu – are what makes a 4090 viable. Per the README, they push the T5 text encoder to CPU and convert the diffusion model to lower-precision dtype. Drop any one of them on a 24GB card and you’re likely looking at OOM. Add --frame_num 17 on your first run instead of the default 81; it’s a faster sanity check before committing to a full-length generation.

Common errors and what actually fixes them

Error: ImportError: ... undefined symbol: _ZN3c104cuda9SetDeviceEa

Not an install error – a runtime ABI mismatch (documented in issue #108). The flash_attn wheel compiled against a different torch CUDA version than what’s loaded at runtime. The fix is exact version matching: run python -c "import torch; print(torch.version.cuda)" first, then either rebuild flash_attn from source against that exact version or reinstall torch to match what flash_attn expects.

Error: flash_attn build wheel fails on Windows

The MSVC compiler stack on Windows can’t build flash_attn from source reliably. Switch to WSL2, or download a prebuilt wheel from the Dao-AILab releases page that matches your exact torch + CUDA + Python combination.

Error: torch 2.8 + Python 3.13 + CUDA 13 = nothing works

Requirements.txt specifies torch>=2.4.0, but flash_attn was only compatible up to torch 2.5 and Python 3.12 at time of Wan 2.2’s release – the version ceiling is documented in community discussion around issue #84. Torch 2.8 and Python 3.13 are both beyond that ceiling. Pin torch to 2.5.x and Python to 3.10.

Error: OOM on 24GB cards even with TI2V-5B

Check you have all three offload flags active. Also drop resolution to 704*1280 or reduce frame count. For pushing further, the README lists community projects: DiffSynth-Studio offers FP8 quantization and layer-by-layer offload; LightX2V targets 8GB VRAM cards like the RTX 4060.

Flash_attn pain is real, but it’s a one-time tax. Once the environment works, it works. The question is whether that tax is worth it for your use case – if you’re generating one or two test clips, ComfyUI’s pre-bundled environment might be the smarter path (see FAQ).

Upgrading and removing

Wan moves fast. Since the July 28 launch, the team shipped Wan2.2-S2V-14B (audio-driven cinematic video, Aug 26, 2025), Wan2.2-Animate-14B for character animation (Sep 19, 2025), and CosyVoice TTS support (Sep 5, 2025) – all per the README changelog. To upgrade:

cd Wan2.2
git pull
pip install -r requirements.txt --upgrade
# Then re-download any new model variant you want
huggingface-cli download Wan-AI/Wan2.2-Animate-14B --local-dir ./Wan2.2-Animate-14B

To remove cleanly:

conda deactivate
conda env remove -n wan22
rm -rf Wan2.2 Wan2.2-TI2V-5B
# HuggingFace cache lives separately
rm -rf ~/.cache/huggingface/hub/models--Wan-AI--*

That last rm matters. The HuggingFace download cache is on top of your local checkpoint copy, so a naive uninstall leaves the full weights orphaned on disk.

Where to go next

ComfyUI if you want a UI – native Wan 2.2 nodes were integrated on launch day and that’s the path most creators use. Multi-GPU throughput if you’re scaling: the README documents FSDP + DeepSpeed Ulysses with --ulysses_size 8 for parallel inference across nodes.

Pick one, generate something, and iterate from there.

FAQ

Can I run Wan 2.2 without flash_attn?

Yes – PyTorch’s native scaled_dot_product_attention kicks in as fallback. Per community reports, generation is slower and VRAM usage is higher, but it runs.

Is the A14B model worth the extra hardware over TI2V-5B?

The A14B scores higher on Wan-Bench 2.0 (the official benchmark). But consider the practical constraint: the README’s headline A14B command requires 80GB VRAM. If you’re on a 4090, TI2V-5B generating 720P at 24fps in under 9 minutes is a better use of your time than renting H100 time for quality differences that may not matter for your specific output. If you do have access to 80GB – an A100 or H100 – the A14B is a meaningful step up. Otherwise TI2V-5B is the default for a reason.

Does Wan 2.2 work in ComfyUI without all this command-line setup?

Mostly yes, with one catch people miss: you still need the model weights downloaded into ComfyUI/models/diffusion_models/. ComfyUI won’t download them automatically. What you do avoid is the flash_attn build – ComfyUI’s bundled Python environment handles that dependency separately. Native Wan 2.2 nodes were added on launch day (July 28, 2025), so the integration is stable, not experimental.