Shai-Hulud in PyTorch Lightning: Audit Guide

PyTorch Lightning 2.6.2 and 2.6.3 shipped credential-stealing malware. Here's how to audit your environment, pin a safe version, and harden your AI training stack.

Jordan West2026-05-038 min readBeginner

Hot take: if you train models for a living and you treat pip install like a free action, this incident isn’t a story about Lightning. It’s a story about your threat model being five years out of date.

The Shai-Hulud themed malware found in the PyTorch Lightning AI training library dropped on April 30, 2026, and the security community is still unpacking it. This isn’t another “five tips for supply chain security” post – it’s a 5-minute audit you can run right now, plus the reasoning behind which recovery path actually makes sense for an ML engineer.

The Key Takeaway (Read This First)

If you ran pip install lightning or pip install --upgrade lightning between April 30 and the time PyPI quarantined the package, assume your developer machine and any CI runner that touched it are compromised. Pin to lightning==2.6.1, audit for the IOCs below, and rotate every token that lived on those machines.

If you didn’t update Lightning in that window, you’re probably fine – but probably isn’t a great word in security, so the audit takes 5 minutes and you should still run it.

What Actually Happened (The Short Version)

Eighteen minutes. That’s how fast Socket’s AI scanner caught lightning 2.6.2 and 2.6.3 after they hit PyPI on April 30, 2026 – fast, but not fast enough for every CI job that auto-upgraded in that window. Version 2.6.1 (published January 30, 2026) is the last clean baseline.

The technical setup, per safedep’s analysis: a Python dropper injected into lightning/__init__.py that bootstraps Bun JavaScript runtime v1.3.13, then executes an 11MB obfuscated worm called router_runtime.js. The payload runs as a daemon thread on every import. Not on install – on import.

That last sentence is the whole game, and most write-ups bury it. Your SCA scanner watching for preinstall hooks? Didn’t fire. Sandboxed pip install? Didn’t help. The malware runs inside your Python process the moment you import lightning – which means a Jupyter kernel that’s been running all afternoon may have been exfiltrating credentials for hours after you installed what looked like a clean update.

Method A vs Method B: Two Recovery Paths

Think of it like a contaminated water supply. You can replace the pipes in your kitchen (Method A – fast, targeted), or you can switch off the mains, flush everything, and bring in verified clean water (Method B – slower, but the only way to actually know). For most ML engineers working on anything real, Method B is the right call.

Method A is a surgical rollback: downgrade to 2.6.1, search the filesystem for IOCs, rotate tokens present on the machine, move on. Method B is a full rebuild: wipe the dev environment or CI image, reinstall everything from a clean baseline, rotate every credential that ever touched the box. Faster vs. safe – those aren’t the same thing here.

Why Method B? Turns out, if the malware obtained a GitHub token with write access, it pushed a workflow named Formatter to your repo (per Semgrep’s breakdown). That workflow dumps all repository secrets via toJSON(secrets) and uploads them as a downloadable Actions artifact called format-results – pinned to specific commit SHAs to look like legitimate automation. Once those secrets are out, downgrading Lightning doesn’t undo anything.

Pick Method B if the affected machine had any of: a logged-in gh CLI, an ~/.npmrc with a publish token, AWS/GCP credentials, or SSH keys. Method A is only defensible for a throwaway sandbox you can verify had no real secrets.

The Walkthrough: Method B, Step by Step

1. Stop the bleeding

Disconnect the machine from the network, or at minimum kill any running Python processes that imported Lightning. The worm runs as a daemon thread – it can be sitting in a Jupyter kernel right now even if you closed the notebook tab.

2. Search for indicators of compromise

Per safedep’s IOC list, look for these files in your repos and home directory:

# From your repo root
find . -type f ( -name "setup.mjs" -o -name "router_runtime.js" 
 -o -path "*/.claude/*" -o -path "*/_runtime/*" ) 2>/dev/null

# Check for the impersonated commits
git log --all --author="claude" --pretty=format:"%h %ae %s"

# Check for the Formatter workflow
find . -path "*/.github/workflows/Formatter*" 2>/dev/null

Unexpected files to flag: .vscode/tasks.json, .claude/settings.json, .claude/setup.mjs, .claude/router_runtime.js. Also watch for a VS Code task labelled “Environment Setup” with runOptions.runOn: folderOpen – that’s the persistence mechanism.

3. Pin the safe version

In requirements.txt or pyproject.toml:

lightning==2.6.1 # Last known clean baseline before the April 30 compromise

Not >=2.6.1. Not ~=2.6. Pin exactly until the Lightning team publishes a clean post-incident release and you’ve verified the SHA matches their announcement.

4. Rotate the tokens that mattered

GitHub PAT, npm token, PyPI token, AWS keys, GCP service account JSON, SSH keys, HuggingFace token, OpenAI/Anthropic API keys, any cloud secret that lived in .env. The worm targeted credentials broadly across cloud providers, registries, and version control. If it ran on your machine, treat every stored secret as suspect.

5. Audit your repos for the silent bonus payload

If a CI job ran the infected package with a write-scoped GITHUB_TOKEN, the worm may have pushed a Formatter workflow. Check the Actions tab and the .github/workflows/ directory of every repo that runs Lightning in CI.

Claude Code impersonation gotcha: The worm commits using author name claude with a users.noreply.github.com email – specifically impersonating Claude Code automated commits. If you use Claude Code legitimately in your workflow, checking author name isn’t enough. Cross-reference the commit email and SHA against your actual Claude Code configuration. Real Claude Code commits don’t use that email pattern in the way this worm does.

Edge Cases the News Posts Skipped

Three things most coverage didn’t flag clearly.

The Russian-locale geofence. Before doing anything else, the payload calls tu0() and checks both Intl.DateTimeFormat().resolvedOptions().timeZone and the LC_ALL, LC_MESSAGES, LANGUAGE, and LANG environment variables for Russian locale markers – exits immediately on a match. A developer who switches locales between machines could be compromised on one and clean on another. Don’t assume a clean result on your travel laptop means your desktop is safe.

Sandboxed installs don’t save you. The reason this matters more than a typical preinstall hook attack: npm hooks run inside the package manager’s process, where modern auditors can intercept them. Python’s import-time execution runs inside your process. A sandboxed pip install --no-deps followed by python -c "import lightning" in your real shell fires the payload. Your SCA tool may report zero preinstall scripts and show a green checkmark. That checkmark is wrong.

The maintainer side is an open question. The Lightning maintainers acknowledged “we are aware of the issue and are actively investigating” – and the investigation into the exact root cause was still ongoing at the time of writing (April/May 2026). Rotating your tokens fixes your exposure. It does not close the upstream attack vector. Future Lightning releases need extra scrutiny until a root-cause post-mortem is published.

The attacker’s behavior during disclosure makes this worse. Socket opened an issue in the Lightning-AI/pytorch-lightning repository warning that 2.6.2 and 2.6.3 were compromised. The pl-ghost account closed it within one minute and posted a “SILENCE DEVELOPER” meme. Then performed six create-and-delete branch operations in 70 minutes. That’s not normal incident response – that’s an attacker still inside the org at the time of disclosure, and it’s why “wait for the official fix” isn’t sufficient guidance right now.

FAQ

I only ran pip install lightning in a Docker container that’s already gone. Am I safe?

Probably yes for that container. But if it had mounted credentials, baked-in tokens, or registry access – rotate those. The container being gone doesn’t mean the secrets it touched are clean.

What if my CI uses a hash-pinned lockfile? Was I protected?

Yes, if you pinned a hash for 2.6.1 or earlier and your CI runs pip install --require-hashes. In that case, pip refuses 2.6.2 or 2.6.3 outright because the hashes don’t match the locked values. This is the most concrete argument for hash-pinning you’ll find this year – not a theoretical “it could help someday” but a real incident where it would have blocked the malicious build entirely. If you only used a version range like lightning>=2.6, CI happily pulled the malicious build the moment it landed on PyPI. The difference between those two lockfile approaches is the difference between “protected” and “compromised.”

Should I switch off PyTorch Lightning entirely?

No – and this framing misidentifies the problem. The framework isn’t compromised; a maintainer account was. Plain PyTorch, Hugging Face Accelerate, and Fabric are reasonable alternatives if you need to keep shipping while this resolves, but every popular Python ML package has the same attack surface. Switching frameworks doesn’t change your threat model. Hash-pinning, dependency review, and least-privilege CI tokens do.

Your Next Action

Open your terminal right now and run pip show lightning | grep Version on every dev machine and CI image you own. If it shows 2.6.2 or 2.6.3, start with Method B from the walkthrough above. If it shows 2.6.1 or earlier, add lightning==2.6.1 with a hash to your lockfile before you close this tab – because the next compromised version, whenever it ships, will land the same way.