Install Meta AudioCraft v1.3.0: Working Setup Guide

Deploy Meta AudioCraft v1.3.0 locally with MusicGen, AudioGen, and EnCodec - install commands, GPU specs, and fixes for the xformers errors that break most setups.

Jordan West2026-06-058 min readIntermediate

Most AudioCraft install failures aren’t about AudioCraft. They’re about a two-year-old PyTorch pin colliding with whatever Python and CUDA combo you happen to have on your machine. If you’ve been bouncing between Stack Overflow tabs trying to get xformers to compile, this guide is for you.

Meta AudioCraft is the library behind MusicGen, AudioGen, and EnCodec – the trio Meta released to handle text-to-music, text-to-sound, and neural audio compression. According to Meta’s official resource page, AudioCraft consists of three models: MusicGen (text-to-music trained on Meta-owned and licensed music), AudioGen (text-to-audio trained on public sound effects), and EnCodec (a real-time, high-fidelity neural audio codec). The catch: you have to deploy it yourself. There’s no hosted endpoint.

What you’re actually installing (v1.3.0)

The current stable release is v1.3.0, which dropped in May 2024. As of June 2026, no newer stable release has shipped – verify against the GitHub releases page before you start in case that’s changed.

Licensing is the part nobody reads but should. Per the official repo: code is MIT, model weights are CC-BY-NC 4.0. Translation: you can fork the code commercially, but you cannot sell music generated with the released MusicGen weights. If you’re building a product, that’s a real constraint.

System requirements (as of v1.3.0, May 2024)

The official requirements look modest until you read the fine print buried in the MAGNET docs.

Component	Minimum	Recommended
Python	3.9	3.9 (3.10 reported to work; 3.11+ inconsistent – see note below)
PyTorch	2.1.0 (exact pin)	2.1.0 with CUDA 11.8
GPU	None (CPU works, slowly)	NVIDIA, 16 GB VRAM for medium model
RAM	16 GB	32 GB
Disk	~10 GB estimate for small models	30+ GB estimate if pulling large/3.3B
Extras	ffmpeg, git	conda or venv for isolation

That 16 GB VRAM figure is the one that bites people. MAGNET.md in the official repo states AudioCraft requires a GPU with at least 16 GB of memory for inference with the medium-sized models (~1.5B parameters) – but this is only in the MAGNET doc, not the main README. If you have an 8 GB card, stick to the 300M small model; per the MusicGen model card, sizes go 300M, 1.5B, and 3.3B – and the 3.3B large will OOM on anything under 24 GB.

The install that actually works

Here’s the sequence. Order matters more than the commands themselves.

# 1. Isolated environment - don't skip this
conda create -n audiocraft python=3.9
conda activate audiocraft

# 2. PyTorch FIRST. This is non-negotiable.
python -m pip install 'torch==2.1.0'

# 3. Build prerequisites that pip's isolation hides
python -m pip install setuptools wheel

# 4. AudioCraft itself
python -m pip install -U audiocraft

# 5. ffmpeg via system or conda
conda install "ffmpeg<5" -c conda-forge

The reason for the ordering: the PyPI page requires Python 3.9 and PyTorch 2.1.0, and the install instructions say to make sure torch is installed first – particularly before xformers. xformers ships as a build-from-source dependency on some platforms, and if pip can’t see torch during that build, you get a confusing “No module named torch” error even though torch is right there.

If you want the bleeding-edge version with whatever just landed in main:

python -m pip install -U git+https://[email protected]/facebookresearch/audiocraft#egg=audiocraft

Or clone and install editable if you plan to train your own models – the only mode where editable install is actually mandatory. The official repo has the full clone-and-edit flow.

First run and verification

Don’t bother with the Gradio demo for verification – it downloads a model first and you won’t know if the failure is install-related or download-related. Run this instead:

python -c "import audiocraft; print(audiocraft.__version__)"

If that prints 1.3.0, the package is in. Now a real generation test:

python -c "
from audiocraft.models import MusicGen
model = MusicGen.get_pretrained('facebook/musicgen-small')
model.set_generation_params(duration=5)
wav = model.generate(['ambient piano with rain'])
print('shape:', wav.shape)
"

First run will download several GB of weights from Hugging Face. Per the official README, you can override where they land by setting the AUDIOCRAFT_CACHE_DIR environment variable – set this BEFORE the first run if your home directory is on a small SSD, because otherwise you’ll be moving a pile of cached weights later.

Pro tip: If you’re on Windows and you see WARNING[XFORMERS]: xFormers can't load C++/CUDA extensions followed by No module named 'triton', that’s not actually broken. Per GitHub issue #183, Triton optimizer isn’t supported on Windows but AudioCraft still works – just ignore the warning. Audio will still generate.

Real errors from the issue tracker

Three failures get repeatedly logged on the GitHub repo. Save yourself the hour.

Error 1: “No module named torch” while installing xformers

Reported in issue #362. You installed torch, you can import it, but pip still fails. Cause: pip’s build isolation creates a clean environment to build xformers, and that environment doesn’t inherit your torch install. Fix: install setuptools wheel first and pin torch BEFORE running the audiocraft install. If you’re on macOS, also pin xformers to a working version:

# macOS workaround
pip install xformers==0.0.20

Version 0.0.20 is the pin that works on macOS because MPS doesn’t support xformers at all – so you’re capping it at a version that fails gracefully rather than crashing mid-build.

Error 2: xFormers built for wrong PyTorch version

The error looks something like: “xFormers was built for PyTorch 2.0.1+cu118 but you have 2.0.1+cpu – please reinstall xformers.” This means you accidentally installed the CPU-only PyTorch wheel and xformers expected the CUDA one. Fix:

pip uninstall torch xformers
pip install torch==2.1.0 --index-url https://download.pytorch.org/whl/cu118
pip install xformers --force-reinstall

Error 3: EnCodec decoder crashes on Apple Silicon MPS

Official support covers NVIDIA GPUs and CPU only – MPS is not in that list. So when EnCodec’s decode() hits the MPS device, it falls over. The community fix, documented in issue #31 and the Peddals blog, routes only that specific decode call back to CPU while keeping everything else on MPS. You edit audiocraft/models/encodec.py and wrap the decode call:

if emb.device.type == 'mps':
 out = self.decoder.to('cpu')(emb.to('cpu')).to('mps')
else:
 out = self.decoder(emb)

It’s ugly. It works. The Peddals blog has the full patch with line numbers.

Upgrading and cleanup

Upgrading from 1.2.x or earlier is mostly painless because the API surface is stable, but the PyTorch pin moved from 2.0 to 2.1 along the way. If you upgraded torch without recreating your env, xformers will probably break – see Error 2 above.

# Upgrade in place
pip install -U audiocraft

# Or nuke and pave (safer)
conda env remove -n audiocraft
conda create -n audiocraft python=3.9
# then run the install sequence again

For uninstall:

pip uninstall audiocraft xformers
# Remove cached weights (potentially 30+ GB)
rm -rf ~/.cache/huggingface/hub/models--facebook--musicgen-*
rm -rf ~/.cache/torch/hub/checkpoints/ # if you used melody/Demucs

That cache directory grows fast. If you experimented with all four MusicGen sizes plus melody plus AudioGen, you’re probably sitting on a lot of .bin files you forgot about.

What I’d do differently next time

If you only need to use AudioCraft and not modify the source, skip the local install entirely. The MusicGen weights live on Hugging Face and the transformers library has its own MusicGen integration that doesn’t require xformers or the PyTorch 2.1.0 pin. You lose access to AudioGen and the training code, but you skip 90% of the dependency pain.

The local install is worth the trouble only if (a) you need AudioGen for sound effects, (b) you want to fine-tune on your own audio, or (c) you’re integrating it into a pipeline where being offline matters. For “I just want to make a lofi clip,” the hosted Hugging Face Spaces demo answers that in a browser tab.

The architectural details, if you care: MusicGen uses an EnCodec model for audio tokenization feeding into an autoregressive transformer language model. The full method is in the Simple and Controllable Music Generation paper if you want to understand why the token interleaving pattern matters for output coherence.

FAQ

Can I run AudioCraft on a CPU only?

Yes, but generation takes minutes per sample instead of seconds. Use the 300M small model and don’t bother with melody conditioning – it’ll be unusable.

Why does the install insist on Python 3.9 when 3.11 is everywhere?

The hard dependency is actually xformers and PyTorch 2.1.0 binary wheels, not AudioCraft itself. Based on community reports, 3.10 has the highest success rate of the non-3.9 options. Python 3.11 works inconsistently – as of the 1.3.0 release window, some transitive dependencies didn’t have 3.11 wheels ready. If you hit a cryptic compile error on 3.10 or 3.11, go back to 3.9 and stop fighting it. The compatibility picture may have shifted since May 2024, so check the issue tracker first.

Can I use AudioCraft-generated music commercially?

No – and this trips people up because the MIT license on the code makes it sound like the answer is yes. It’s not. The model weights are CC-BY-NC 4.0, which covers the outputs. MIT governs the source code only. The weights are what generate the audio, so the non-commercial restriction applies to anything you produce with them.

Next step: run the verification snippet from the “First run” section above. If it prints a tensor shape, you’re done – open demos/musicgen_app.py and start generating. If it fails, scroll back to the error matching yours.