Here’s an unpopular take: the riskiest part of running Llama locally isn’t running it – it’s the download. Most tutorials hand you a one-liner and a warm feeling about “open source.” They skip the part where roughly half the model repos on Hugging Face still ship pickle files that can execute arbitrary code on your machine the moment you load them.
This guide is the safe way to download a Llama model: which sources actually deserve trust, how to verify what you got, and the specific traps that competitor tutorials gloss over.
The threat model nobody explains before step 1
Before you type a single command, understand what you’re defending against. Three real risks:
- Trojanized model files.Hugging Face’s own documentation warns that there are dangerous arbitrary code execution attacks that can be perpetrated when you load a pickle file – they suggest loading models only from users and organizations you trust.
- Typosquatted repos. Fake “meta-llama” lookalikes that copy the README and ship a poisoned
pytorch_model.bin. - Tampered downloads. A corrupted or man-in-the-middled file that silently differs from what Meta published.
The pickle problem isn’t theoretical. In February 2025, researchers at ReversingLabs found two ML models on Hugging Face using a technique they called “nullifAI” – broken pickle files compressed with 7z instead of ZIP, with a reverse shell connecting to a hardcoded IP at the start of the pickle stream, deliberately designed to sidestep existing safeguards. The kicker: Picklescan, the tool Hugging Face uses to detect suspicious pickle files, failed to flag them.
Hugging Face removed the models within 24 hours of the report, but the technique is now public.
Pick the right source (and the right format)
Three sources deserve consideration: Meta directly, Hugging Face, and Kaggle – all listed as official distribution channels in Meta’s docs. Anything else – random mirrors, S3 buckets in tutorials, torrent links – is a gamble.
But “official source” isn’t enough on its own. The format matters more than most guides admit.
| Format | Can execute code on load? | Use when |
|---|---|---|
| safetensors | No | Default. Always prefer this. |
| GGUF (llama.cpp) | No | Running quantized models locally |
| .bin / .pth (pickle) | Yes | Only from sources you’d trust with sudo |
Safetensors exists specifically because pickle is dangerous. Trail of Bits audited it in May 2023 at Hugging Face’s request – no critical security flaw leading to arbitrary code execution was found. In 2025, Hugging Face contributed Safetensors to the PyTorch Foundation, putting it under the Linux Foundation’s umbrella for AI projects.
And yet – here’s the gap nobody talks about. A longitudinal study published on arXiv (PickleBall, arXiv:2508.15987) found that, as of March 2025, roughly 44.9% of Hugging Face repositories still contain pickle-format models, with pickle-only models pulling 400M+ downloads per month. The migration is far from done.
The actual download, step by step
Two paths. Pick the one that matches your situation.
Path A: Direct from Meta (you want the original weights)
- Request access at llama.com/llama-downloads. You’ll get an email with a signed URL.
- Install the official CLI:
pip install llama-stack - Run
llama model list --show-allto see model IDs. - Run
llama model download --source meta --model-id CHOSEN_MODEL_IDand paste the URL when prompted.
One gotcha buried in Meta’s README: pre-signed URLs expire after 24 hours and a certain number of downloads. Hit that limit and you get 403 Forbidden – re-request the link from the form. If your connection drops on a 70B download, that timer matters.
Path B: Hugging Face (fastest for safetensors)
# Install the CLI
pip install -U "huggingface_hub[cli]"
# Log in (you need an HF account that's accepted Meta's license)
huggingface-cli login
# Download - note the explicit .safetensors filter
huggingface-cli download meta-llama/Llama-3.1-8B-Instruct
--include "*.safetensors" "*.json" "tokenizer*"
--local-dir ./llama-3.1-8b
The --include filter is the part most tutorials skip. Without it, you’ll pull the legacy pytorch_model.bin alongside the safetensors version. Why download a pickle file you don’t plan to use?
Verify before you load
Verification is two questions: did the file arrive intact, and is the file actually safe to deserialize. Different checks.
For integrity: the SHA256 for any file on Hugging Face is shown under Files and Versions – click the filename and it’s right there. Then locally:
shasum -a 256 model-00001-of-00004.safetensors
# Compare byte-for-byte against the published hash
Meta’s download script ships MD5 checksums. A 2024 thread on the meta-llama GitHub repo made the case for adding SHA-256 alongside MD5, given MD5’s known weaknesses – fair, though for detecting accidental corruption MD5 does the job. For trust verification, prefer the SHA256 from Hugging Face when available.
For safety, if you absolutely must load a .bin file, run picklescan first:
pip install picklescan
picklescan --path ./model/pytorch_model.bin
# Exit 0 = clean, 1 = malware found, 2 = scan failed
Exit code 2 deserves the same suspicion as code 1. The nullifAI attack worked precisely because broken/malformed pickle files can evade the scanner while still executing code. A scan failure on a model file is not a “try again” situation.
Pro tip: Treat any model loaded with
torch.load()like running an .exe from an email. If you wouldn’t double-click it blindly, don’ttorch.load()it blindly. Pin yourself tosafe_open()from the safetensors library wherever possible.
Edge cases the docs don’t surface
A few traps you’ll only hit in practice:
- Ollama digest mismatches.Issue #8105 documents a verifying-sha256-digest error where the wanted hash doesn’t match the received hash – Ollama requires the full file be downloaded again. On llama3.3 that’s 42 GB with no resume support. Run Ollama pulls on a stable connection or expect to start over.
- The “unsafe” flag is not a block. Hugging Face scans pickle models and marks flagged ones as “unsafe,” but the download stays available. Users can still pull and execute potentially harmful models – the flag is a warning, not a quarantine. (Source: NSFOCUS AI supply chain security analysis.)
- The nullifAI bypass applies to picklescan specifically. If a model passes picklescan with exit 0 but uses a 7z-compressed archive, that exit code is not a clean bill of health – it may simply mean picklescan didn’t recognize the compression format. When in doubt, avoid .bin entirely.
Honest limitations of the “safe download” approach
None of this gets you to zero risk. Three honest gaps:
Hugging Face’s own position is that for existing PyTorch files, the best they can do is flag suspicious-looking ones. Flagging is detection, not prevention – and the nullifAI proof of concept already showed how to evade the detector.
Safetensors only protects against arbitrary code execution at load time. It does nothing about the model’s behavior – a fine-tuned model can still produce backdoored outputs, leak prompts, or misfire on poisoned triggers. That’s a separate problem outside this guide’s scope.
And even with checksums and safetensors, you’re trusting Meta’s signed URL and Hugging Face’s TLS chain. Low-probability, but not zero.
FAQ
Is it safe to use the leaked Llama 1 torrents?
No. Skip them.
Should I always pick safetensors over GGUF?
They’re for different jobs. Safetensors is what you want for full-precision inference and fine-tuning in PyTorch – say, running Llama-3.1-8B-Instruct on a workstation with a 24GB GPU. GGUF is what you want when you’re running quantized weights through llama.cpp or Ollama on consumer hardware. Both are safe-by-design formats (no executable code paths during load), so the choice is about your runtime, not your security posture.
Why does my Hugging Face download sometimes pull pytorch_model.bin even when safetensors exists?
The default huggingface-cli download grabs everything in the repo unless you filter. A common misconception is that the library “prefers” safetensors automatically – it doesn’t, when you’re downloading raw files. Use --include "*.safetensors" to force the safe format. If you’re loading via transformers, the library itself will prefer safetensors when both exist, but downloading raw is a different operation.
Next step: pick one model you’ve already downloaded, find its SHA256 on its Hugging Face page, and run shasum -a 256 against your local copy right now. If you’ve never done this before, the answer to “do my files match?” is sometimes no – and you’ll want to know before you load them.