Install ProteinMPNN: The Inverse Folding Tool for Protein Design

ProteinMPNN predicts sequences from backbone structures - but local installation requires solving CUDA version conflicts, choosing the right model weights, and understanding when to switch to LigandMPNN instead.

Jack Tom2026-04-236 min readIntermediate

Three ways to run ProteinMPNN: browser (HuggingFace Space), cloud API (NVIDIA NIM, Neurosnap), local install. Browser works for single tests. Cloud scales but costs money. Local install? That’s for designing dozens of sequences, controlling sampling parameters directly, or wiring it into a pipeline.

The catch: ligands, metals, DNA in your structure? You want LigandMPNN instead. Nature Methods (Feb 2025) shows LigandMPNN hits 63.3% sequence recovery at small molecule sites versus ProteinMPNN’s 50.5%. Separate repo, different models.

Why Local Install Still Matters

The original ProteinMPNN repo stays active despite LigandMPNN’s edge. Speed and simplicity. For protein-only backbones – monomers, homo-oligomers, scaffolds without cofactors – ProteinMPNN runs in ~0.6 seconds per 100 residues on CPU. No ligand atom parsing overhead. Screening thousands of RFdiffusion backbones? That compounds.

Institutional deployments lag. NIH HPC still runs ProteinMPNN 1.0.1 as a module (as of their current docs). Research groups have working pipelines they haven’t migrated. For pure backbone redesign, the 2022 model works.

System Requirements

According to the official repo: Python ≥3.0, PyTorch, Numpy. That’s CPU mode. For GPU:

GPU: NVIDIA CUDA-capable (compute 3.5+). Most GPUs from 2014 onward.
CUDA: 11.3 (official), but 12.4 for H100s (see below).
RAM: 8GB min for small proteins. 16GB+ for complexes.
Disk: ~2GB for repo + weights. Full pipeline (RFdiffusion + ProteinMPNN + AlphaFold2)? 150GB per a Sep 2025 community tutorial.
OS: Linux (Ubuntu 20.04/22.04 tested), macOS (CPU only), Windows via WSL2.

No official requirements doc. These are from community deployments and GitHub issues.

Download Source

Clone the official GitHub repo:

git clone https://github.com/dauparas/ProteinMPNN.git
cd ProteinMPNN

Model weights live in vanilla_model_weights/, soluble_model_weights/, ca_model_weights/. Included in the clone. Total: ~2GB.

Installation: The CUDA Decision

Official README says CUDA 11.3. Fine for most GPUs. H100s? CUDA 11.3 hangs. H100 requires CUDA 11.8 minimum. The Kuhlman Lab fork documented the workaround.

Standard Install (CUDA 11.3, most GPUs)

conda create --name proteinmpnn python=3.10
conda activate proteinmpnn
conda install pytorch torchvision torchaudio cudatoolkit=11.3 -c pytorch
pip install numpy

H100-Compatible Install (CUDA 12.4)

conda create --name proteinmpnn_cu12 python=3.10
conda activate proteinmpnn_cu12
conda install numpy
pip install torch==2.4.0 torchvision==0.19.0 torchaudio==2.4.0 --index-url https://download.pytorch.org/whl/cu124

Check your GPU’s CUDA version:

nvidia-smi

Top-right corner shows “CUDA Version: 12.x” or “11.x”. Match your install to that.

Pro tip: “CUDA out of memory” during inference? ProteinMPNN doesn’t have batch memory flags. Only lever: --batch_size. Drop to 1 (already the default). Still OOM? Your GPU can’t fit model + protein. Switch to CPU (skip CUDA install) or grab a cloud GPU.

First-Time Configuration: Choosing Model Weights

The repo ships 10+ checkpoint files. You pick via --model_name. Default: v_48_020 (vanilla, 48 neighbors, 0.20Å Gaussian noise during training).

Model	Noise Level	When to Use
v_48_002	0.02Å	Max sequence recovery on native backbones
v_48_010	0.10Å	Balanced – works for most cases
v_48_020	0.20Å (default)	Better AlphaFold compatibility for designed backbones
v_48_030	0.30Å	Max robustness to backbone geometry errors

The Science paper (2022) shows the trade-off: lower noise = higher sequence recovery on crystal structures. Higher noise = better success when you feed the designed sequence into AlphaFold. Doing a design-predict-test loop (design with ProteinMPNN → validate with AF2)? Use v_48_020 or v_48_030.

Soluble variants (in soluble_model_weights/) trained only on soluble proteins. Flag: --use_soluble_model. For cytoplasmic or secreted proteins. CA-only models handle alpha-carbon-only backbones (coarse-grained structures). Flag: --ca_only.

Install docs don’t explain this. You read the Science paper’s supplementary figures or guess.

Verify the Install Works

Run the example on a test PDB:

python protein_mpnn_run.py 
 --pdb_path inputs/PDB_complexes/pdbs/3HTN.pdb 
 --out_folder ./test_output/ 
 --num_seq_per_target 2 
 --sampling_temp "0.1" 
 --seed 37 
 --batch_size 1

Expected output: ./test_output/ with two subdirectories:

seqs/ – FASTA files with designed sequences
backbones/ – PDB files (same backbone, new sequences)

Check the FASTA. Two sequences (you set --num_seq_per_target 2). Each annotated with a score (negative log probability) and sequence recovery (percent match to input if input had a sequence).

Runtime on CPU: 5-10 seconds for this 147-residue protein. GPU: under 2 seconds.

Common Installation Errors

ImportError: No module named ‘protein_mpnn_utils’

Running from outside the ProteinMPNN directory. The repo doesn’t install as a package – you run in-place. cd ProteinMPNN first.

RuntimeError: CUDA error: no kernel image is available for execution

PyTorch CUDA version doesn’t match your GPU’s compute capability. Reinstall PyTorch: go to pytorch.org, select your CUDA version + OS, use the command it generates. Don’t trust conda’s default channel.

Sequences generated but all identical

You set --sampling_temp "0.0". Temperature 0 is deterministic – same input, same output. Bump to 0.1-0.3 for diversity. Kuhlman Lab docs recommend 0.0-0.3 range (as of their current version). Higher than 0.3? Too-random sequences that won’t fold.

Model checkpoint not found

--path_to_model_weights defaults to empty string → script looks in vanilla_model_weights/ relative to script location. Moved files? Specify full path: --path_to_model_weights /absolute/path/to/vanilla_model_weights/.

When to Migrate to LigandMPNN

Your design has any of these? Stop, install LigandMPNN instead:

Small molecule ligands (cofactors, inhibitors, substrates)
Metal ions (Zn, Fe, Mg coordination sites)
DNA or RNA
Any HETATM records that aren’t water

LigandMPNN: different repo (dauparas/LigandMPNN), different run.py, different model params. Not a drop-in replacement – can’t swap weights. Install similar (Python + PyTorch + Prody for PDB parsing), but input flags differ (reads ligand atoms from HETATM records automatically).

Neurosnap webserver says ProteinMPNN “should no longer be used” because LigandMPPN supersedes it. Overstated. For protein-only design, ProteinMPNN: faster, simpler. Ligands involved? LigandMPNN’s 63.3% vs 50.5% edge is real.

Uninstall / Cleanup

Remove the conda environment:

conda deactivate
conda env remove --name proteinmpnn

Delete the cloned repo:

rm -rf ProteinMPNN/

Model weights stored inside the repo directory. Deleting the folder removes everything.

What Model Noise Level Should I Use for De Novo Design?

v_48_020 (0.20Å) or v_48_030 (0.30Å). De novo backbones from RFdiffusion have small geometry errors crystal structures don’t. Higher training noise = tolerance to errors. Science paper Figure 2C: v_48_030 maxes out AlphaFold prediction success on designed sequences. v_48_002 has higher raw recovery but folds less reliably.

Can I Run ProteinMPNN Without a GPU?

Yes. CPU mode works – model is only 1.66M parameters. Skip CUDA install (use pip install torch without CUDA flags). ~0.6 sec per 100 residues on modern CPU vs ~0.1 sec on GPU. For research use (10-100 sequences)? CPU is viable.

How Do I Design Only Part of a Protein?

Use helper_scripts/make_fixed_positions_dict.py to generate a JSONL specifying which residues stay unchanged. Pass via --fixed_positions_jsonl. Meiler Lab tutorial (available as PDF on their site) shows redesigning a protein-protein interface while fixing the rest. Script takes position list + chain ID.

Now run your first design. Grab a PDB from inputs/, set --num_seq_per_target to 10, compare outputs. Check scores (lower = more confident) and sequence recovery. Feed one into AlphaFold2 – does it fold correctly?