The #1 mistake people make when deploying the Meta segmentation model SAM 2 is panicking at the Failed to build the SAM 2 CUDA extension message and starting over with a fresh conda environment. Don’t. That error looks fatal but isn’t – and chasing it down the wrong rabbit hole has cost people entire evenings. Reverse-engineer the install from that fact and the whole process gets shorter.
SAM 2.1 is Meta’s current checkpoint family for promptable image and video segmentation, released as the Developer Suite with training code, new weights, and a deployable web demo. A new suite of improved model checkpoints (denoted as SAM 2.1) was released on September 30, 2024, and a December 2024 update added full model compilation via vos_optimized=True in build_sam2_video_predictor for a major VOS inference speedup. This guide walks the actual install – the commands, the traps, and the verify step that proves your GPU path works.
System requirements (read this before touching pip)
The official spec is narrow and unforgiving. As of early 2025: Linux with Python ≥ 3.10, PyTorch ≥ 2.5.1, and torchvision that matches the PyTorch installation – install them together at pytorch.org to make sure versions align. CUDA toolkits must match the CUDA version for your PyTorch build; typically CUDA 12.1 if you follow the default install command from the official INSTALL.md.
| Component | Minimum | Recommended |
|---|---|---|
| OS | Linux or WSL2 Ubuntu | Ubuntu 22.04 |
| Python | 3.10 | 3.10 or 3.11 |
| PyTorch | 2.5.1 | 2.5.1+ matching CUDA 12.1 |
| GPU VRAM | ~6 GB (tiny model, approximate) | ~12 GB+ for hiera_large on video (approximate) |
| Disk | ~5 GB for code + all four checkpoints (approximate) | ~10 GB if you keep notebooks and SA-V samples (approximate) |
| Build tools | nvcc, gcc | CUDA toolkit matching torch’s CUDA build |
On Windows, the official repo strongly recommends using Windows Subsystem for Linux (WSL) with Ubuntu – native Windows installs hit the CUDA compiler wall almost immediately. CPU-only inference works but is impractically slow for anything beyond a single image test.
Get the source from the right repo
There are a dozen forks and mirrors floating around GitHub. The canonical one is facebookresearch/sam2. The older segment-anything-2 URL still resolves but the maintained code lives at the new path.
# Create a clean conda env first - mixing pip+conda PyTorch is the #2 pain point
conda create -n sam2 python=3.10 -y
conda activate sam2
# Install PyTorch matching your CUDA (example: CUDA 12.1)
pip install torch==2.5.1 torchvision==0.20.1 --index-url https://download.pytorch.org/whl/cu121
# Clone and install
git clone https://github.com/facebookresearch/sam2.git
cd sam2
pip install -e ".[notebooks]"
The [notebooks] extra pulls jupyter and matplotlib, which you’ll want for the verification step below. Skip it if you’re deploying headless.
Download the checkpoints
The repo ships with a shell script that grabs all four model sizes (tiny, small, base_plus, large). It’s the path of least resistance:
cd checkpoints && ./download_ckpts.sh && cd ..
Approximate sizes range from ~150 MB for the tiny variant up to ~900 MB for large (community-reported; check the repo release notes for current weights). If you’re on a metered connection or only need one variant, open the script and copy just the URL for sam2.1_hiera_small.pt – it’s the best speed/quality trade-off for most deployment scenarios.
Pro tip: Always grab the
sam2.1_*checkpoints, not the originalsam2_*ones. The 2.1 weights ship with better handling of occlusions and visually similar objects, and the code paths in the current repo are optimized for them. The old checkpoints still work but you’re leaving accuracy on the table.
First-time configuration and a minimum viable test
SAM 2 doesn’t need a config file in the traditional sense – model configs are YAMLs bundled inside the package (configs/sam2.1/) and you reference them by name. The minimum viable script to prove your install works:
import torch
from sam2.build_sam import build_sam2
from sam2.sam2_image_predictor import SAM2ImagePredictor
checkpoint = "./checkpoints/sam2.1_hiera_small.pt"
model_cfg = "configs/sam2.1/sam2.1_hiera_s.yaml"
sam2 = build_sam2(model_cfg, checkpoint, device="cuda")
predictor = SAM2ImagePredictor(sam2)
print("SAM 2 loaded:", sum(p.numel() for p in sam2.parameters()) / 1e6, "M params")
If that prints a parameter count without a stack trace, your CUDA path is wired correctly and the model is live in VRAM. If it prints but you saw the CUDA extension warning during pip install, you’re in the gray zone the next section explains.
Common install errors – what they actually mean
This is where most tutorials wave their hands. Here’s what the three most common failures really do to your deployment.
1. Failed to build the SAM 2 CUDA extension – looks scary, almost never fatal. According to the official INSTALL.md: installation proceeds even if the CUDA extension fails to build. You can still use SAM 2 for both image and video applications. The post-processing step – removing small holes and sprinkles in output masks – will be skipped, but this won’t affect results in most cases. Translation: ship it. If your masks have tiny holes you care about, then come back and fix it.
To skip the CUDA extension build entirely and stop pretending it matters:
SAM2_BUILD_CUDA=0 pip install -e ".[notebooks]"
2. RuntimeError: Error(s) in loading state_dict for SAM2Base – you have stale code. Per INSTALL.md: this likely means you installed a previous version of the repo, which doesn’t have the new modules to support SAM 2.1 checkpoints. Pull the latest code, run pip uninstall -y SAM-2 to remove any previous installation, then reinstall using pip install -e ".[notebooks]". A git pull alone is not enough – the package metadata sticks around.
3. Silent version-mismatch crashes at inference time – usually caused by mixed package managers. The SAM 2 library compiles against one PyTorch version but links against another at runtime – typically when conda installed one torch and pip later upgraded to another. INSTALL.md flags this explicitly: delete one of the duplicates to keep a single PyTorch and CUDA version on the path. If conda installed torch 2.4 and pip later upgraded to 2.5.1, both exist and the linker picks the wrong one.
The community-reported escape hatch when none of this works (as documented in GitHub issues #22 and #14): downgrading to PyTorch 2.1.0 by changing the restriction from torch>=2.5.1 to torch==2.1.0 in both pyproject.toml and setup.py. You lose torch.compile speedups but you get a working model. Note: this workaround may become unnecessary as the repo matures – check open issues before trying it.
Upgrade from SAM 2.0 or uninstall
Upgrading from the original SAM 2.0 release to SAM 2.1 is not a pip upgrade – it’s a clean reinstall. The order matters:
pip uninstall -y SAM-2git pull(or re-clone if you had the oldsegment-anything-2URL)pip install -e ".[notebooks]"- Re-run
./download_ckpts.shto pull the 2.1 weights
For a full uninstall: pip uninstall SAM-2, then delete the cloned sam2/ directory (including checkpoints/ – those are the bulk of the disk usage). If you used a dedicated conda env, just conda remove -n sam2 --all and you’re done.
Verify the deployment under realistic load
The MVP test loaded weights into VRAM. That doesn’t prove the inference path works end-to-end. A real verify step:
import numpy as np
from PIL import Image
# Load any image - a screenshot works
img = np.array(Image.open("test.jpg").convert("RGB"))
predictor.set_image(img)
# Prompt with a single click at the image center
h, w = img.shape[:2]
masks, scores, _ = predictor.predict(
point_coords=np.array([[w//2, h//2]]),
point_labels=np.array([1]),
multimask_output=True,
)
print("Masks:", masks.shape, "Top score:", scores.max())
You should see a (3, H, W) mask array and a top confidence score close to 1.0 on most reasonable images. If set_image hangs noticeably on a small image, your install is silently running on CPU – check torch.cuda.is_available().
Worth pausing on what “working” means here. The model loads, predicts, and returns masks – but the per-frame video predictor (build_sam2_video_predictor) uses a different code path with its own memory bank initialization. If your deployment target is video, the image test above is necessary but not sufficient. Run a 30-frame clip through the video predictor before you call the deployment done.
And here’s the question worth sitting with before you go further: how clean do your masks actually need to be? The CUDA extension only affects the small-holes-and-sprinkles post-processing step. For object detection pipelines and rough crops, that barely matters. For medical imaging or pixel-perfect compositing, it matters a lot. That single decision – whether to invest time fixing the extension build or ship without it – probably determines more of your deployment timeline than any other choice in this guide. The SAM 2 paper (arXiv:2408.00714) and the Meta product page both focus on benchmark accuracy, but your use case is the real benchmark.
FAQ
Do I need to read the SAM 2 paper before deploying?
No. The streaming-memory architecture details only matter if you’re fine-tuning or comparing against baselines. Deploying the model off the shelf? Skip it.
What if I’m deploying on an air-gapped server with no GitHub access?
Mirror everything locally first. Clone the repo on a connected machine, run ./download_ckpts.sh to pull all four checkpoints into checkpoints/, then tar the entire sam2/ directory and copy it across. The pip install with -e works offline because it’s installing from the local path, but you’ll need to pre-download the torch wheels too (pip download torch==2.5.1 torchvision==0.20.1 --index-url https://download.pytorch.org/whl/cu121 -d ./wheels) and install from that directory on the target machine.
Can I run SAM 2.1 in production behind an API?
Yes – model and checkpoints are Apache 2.0, so commercial use is fine. The pattern most teams settle on is FastAPI with a single SAM2ImagePredictor instance held in memory, plus a request queue to serialize GPU access (the predictor isn’t thread-safe). Don’t recreate the predictor per request – model load is slow (approximate load time varies by hardware, but it’s measured in seconds, not milliseconds) and that’s your bottleneck. For video workloads, consider the December 2024 vos_optimized=True flag, which enables full model compilation for VOS inference.
Next: open notebooks/video_predictor_example.ipynb from the repo and run it on a 5-second clip from your actual use case. That’s the fastest way to learn whether SAM 2.1’s video memory behavior fits your domain before you wire it into anything.