Text to 3D Open Source: Deploy TripoSR (2026 Guide)

Step-by-step guide to deploying TripoSR locally for text-to-3D pipelines, with fixes for the infamous torchmcubes build errors that block most installs.

Alex Carter2026-05-148 min readIntermediate

The #1 mistake people make with text to 3D open source isn’t choosing the wrong model – it’s assuming pip install -r requirements.txt will actually work. It usually doesn’t. The build silently falls back to a CPU-only torchmcubes, the GPU never engages, and your sub-second model now takes 30+ seconds per mesh. You won’t see an error. You’ll just think TripoSR is slow.

This guide deploys TripoSR the way it should be deployed: CUDA verification before the install, the text-to-3D chain wired up correctly, and the silent failure modes caught before they waste your afternoon.

What you’re actually deploying

TripoSR is an open-source codebase for fast feedforward single-image-to-3D reconstruction, developed by Tripo AI and Stability AI. The architecture is built on the LRM (Large Reconstruction Model) framework – the technical details are in arXiv:2403.02151 if you want to go deep. Released March 5, 2024 under the MIT license, weights and source code included – which matters if you’re shipping output into a game or product.

TripoSR is image-to-3D, not text-to-3D. Anyone marketing it as text-to-3D is bundling a T2I model in front of it. That’s a perfectly valid pipeline – just know what you’re deploying. A common two-stage stack is PeRFlow-T2I as stage one with TripoSR as stage two, though SDXL or Flux work just as well. We’ll set up both stages.

System requirements (read this before cloning)

The repo doesn’t publish hard minimums. The values below come from community deployments and the issue tracker as of early 2026 – treat them as practical floors, not guarantees.

Component	Minimum	Recommended
OS	Linux (Ubuntu 22.04+) or Windows 10/11	Linux – fewer build headaches
GPU	6GB VRAM (CLI only)	16GB+ if you use the Gradio UI
Python	3.9	3.10
CUDA	11.8	Match your PyTorch build exactly
Build tools	g++ / MSVC C++14+	Required to compile torchmcubes
Disk	~5GB	10GB (includes T2I model cache)

The VRAM gap between CLI and Gradio is real and not documented in the README. A community report on issue #91 found run.py finishing at ~4GB VRAM while the Gradio app immediately consumed ~16GB on the same input (as of early 2026 – this may shift with future commits). If you have an 8GB card, the CLI is your only realistic path.

Download and clone

As of the time of writing, the repo has no tagged releases – you’re cloning whatever main looks like today. Pin a commit if reproducibility matters.

# Clone and enter
git clone https://github.com/VAST-AI-Research/TripoSR.git
cd TripoSR

# Pin to today's commit for reproducibility (optional but smart)
git rev-parse HEAD > .pinned-commit

# Create an isolated env - do NOT use your system Python
python -m venv .venv
source .venv/bin/activate # Windows: .venvScriptsactivate

Model weights download lazily on first run from huggingface.co/stabilityai/TripoSR. Don’t pre-download them manually – the loader uses HF cache paths and will re-download anyway.

Install – the CUDA-first order matters

This is where most installs go wrong. The README is explicit: install PyTorch first, and make sure the locally-installed CUDA major version matches the PyTorch-shipped CUDA major version. Skip the check and torchmcubes compiles in CPU-only mode. No error. Just mysteriously slow inference.

Verify CUDA first: Run nvcc --version. Note the major version (11.x or 12.x).
Install matching PyTorch: Get the exact command from pytorch.org/get-started/locally. For CUDA 11.8: pip install torch torchvision --index-url https://download.pytorch.org/whl/cu118
Confirm GPU is visible:python -c "import torch; print(torch.cuda.is_available())" – must print True before continuing.
Update setuptools:pip install --upgrade setuptools (the docs require ≥49.6.0).
Install requirements:pip install -r requirements.txt

Windows note: Install Visual Studio Build Tools with the “Desktop development with C++” workload before step 5. Without it, the torchmcubes wheel build dies with “Microsoft Visual C++ 14.0 or greater is required” (GitHub issue #48).

First-time verification

Run the included chair example – it confirms the model loads, CUDA is wired correctly, and mesh export works.

python run.py examples/chair.png --output-dir output/

Warning: "torchmcubes was not compiled with CUDA support, use CPU version instead" – stop here if you see it. The install technically worked, but every mesh extraction runs on CPU from this point. Fix: pip uninstall torchmcubes, confirm nvcc is on your PATH, then pip install git+https://github.com/tatsy/torchmcubes.git. This is documented in the official README troubleshooting section.

For the text-to-3D side: generate one image with SDXL or Flux, drop the PNG into examples/, run the same run.py command. Output lands in output/ as a .obj file.

Optional: Gradio web UI

python gradio_app.py --port 7860

Convenient, but see the VRAM numbers above. The CLI is the production path; Gradio is for exploration.

Wiring up the text-to-3D pipeline

Two stages.

# stage1_t2i.py - text to image
from diffusers import StableDiffusionXLPipeline
import torch

pipe = StableDiffusionXLPipeline.from_pretrained(
 "stabilityai/stable-diffusion-xl-base-1.0",
 torch_dtype=torch.float16
).to("cuda")

prompt = "a single wooden chair, white background, centered, product photo"
image = pipe(prompt, num_inference_steps=25).images[0]
image.save("examples/generated.png")

# stage2: hand off to TripoSR
# python run.py examples/generated.png --output-dir output/

Prompt hygiene is the difference between a clean mesh and garbage geometry. TripoSR was trained on isolated objects, not scenes – so force “white background, centered, single object” into every T2I prompt. The Gradio app defaults to removing the background automatically; if your input is already a clean RGBA PNG with the subject centered and filling at least 70% of the frame, disable that step to avoid artifacts.

Common errors and the actual fixes

Four root causes cover almost everything. The error messages are misleading; the fixes are mechanical.

CMake can’t find CUDA libraries during torchmcubes build – PyTorch is a CUDA build but CMake can’t locate the CUDA Toolkit. Fix: install the matching CUDA Toolkit (not just the driver), set CUDA_HOME, then reinstall torchmcubes.
“NO CUDA INSTALLATION FOUND, INSTALLING CPU VERSION ONLY!” in the torchmcubes build log – CMake found no CUDA compiler and built CPU-only. Fix: install nvcc (via CUDA Toolkit), confirm which nvcc resolves, reinstall torchmcubes from source.
“module ‘torchmcubes_module’ has no attribute ‘mcubes_cuda'” at inference – same root cause as above; the CPU-only build is missing the CUDA symbol. Same fix.
Mac Apple Silicon – there is no CUDA path on macOS. CPU execution is expected behavior. Budget roughly 30 seconds per mesh.

Why it’s fast – and why that matters for quality

TripoSR benchmarks at under 0.5 seconds on an A100 because it’s feedforward. No diffusion loop in the 3D stage. No iterative refinement. No score distillation. One forward pass through a transformer, geometry out. That speed is the whole point of the LRM architecture.

The tradeoff: there’s no iteration to fix mistakes. For background props in a game, that speed-quality balance is excellent. For hero assets, you’ll still want manual cleanup in Blender. The model isn’t pretending to be something it’s not.

Tuning, upgrades, and uninstall

geometry_resolution is the one parameter worth adjusting. The default is 256. Drop to 128 and the mesh gets noticeably noisy – but it’ll fit tighter VRAM budgets. Push above 256 and memory cost climbs; community deployments suggest 320 works on most 12GB cards, though your results will depend on the specific input. There are no official benchmarks for values above 256.

Upgrading means pulling main – there are no semver releases to track. Pin commits if you ship to production:

# Upgrade
cd TripoSR && git pull
pip install -r requirements.txt --upgrade

# Uninstall - TripoSR has no installer, just delete the repo and venv
deactivate
rm -rf TripoSR/ # cached weights live in ~/.cache/huggingface
rm -rf ~/.cache/huggingface/hub/models--stabilityai--TripoSR

That HuggingFace cache directory is the only hidden leftover. Without clearing it, you keep several GB of model weights on disk after deleting the repo.

FAQ

Does TripoSR really do text-to-3D, or just image-to-3D?

Just image-to-3D. Anyone marketing it as text-to-3D is bundling a T2I model in front of it. That’s a fine pipeline – just know what you’re deploying.

Why does my Gradio app OOM on a 12GB GPU when the CLI works fine?

Same model, different overhead. Gradio holds the model plus the UI’s preview pipeline – it renders a textured preview, runs background removal, and keeps tensors live across requests. A community thread on the repo (issue #91, as of early 2026) found Gradio sitting at ~16GB VRAM while run.py finished the same job at ~4GB. If you’re VRAM-constrained, use the CLI and view the OBJ in any external 3D viewer. The outputs are identical.

Can I use TripoSR commercially without paying anything?

Yes – MIT license covers commercial use, modification, and redistribution. The catch is whatever T2I model you chain in front of it. SDXL is permissive; some newer checkpoints aren’t. Check the front half of your pipeline, not the back.

Next: Drop one prompt through your pipeline today – generate an image with SDXL, pass it through run.py, and open the resulting OBJ in Blender. The first end-to-end run takes about 10 minutes to set up. Every subsequent mesh takes under a second.