Most legal-NLP tutorials skip the awkward question: can you actually install Blackstone in 2026? The README makes it look like a five-minute job. It isn’t anymore. The legal spaCy library hasn’t seen a release since 2019, and a single line in its setup.py – spacy==2.1.8 – turns a quick pip install into an afternoon of dependency archaeology.
This guide assumes you actually need Blackstone – for case-law NER, citation extraction, or its abbreviation detector – and want to deploy it cleanly. We’ll skip the marketing.
What you’re actually installing
Blackstone is a spaCy pipeline trained on English case law from the Incorporated Council of Law Reporting’s archive. Per the 2019 README, F1 on the NER sits at approximately 70% – and the project is openly described as a prototype. The latest PyPI release is blackstone 0.1.1, uploaded on 6 August 2019; there has been nothing since. Treat it as a frozen research artifact, not a maintained product.
The pipeline ships with three custom components beyond the core model: abbreviation detection (a modified scispaCy AbbreviationDetector), a legislation linker that resolves provisions to their parent instrument, and compound case reference detection that pairs CASENAME with CITATION. The abbreviation component traces back to the Schwartz & Hearst (2003) algorithm – the same one scispaCy uses for biomedical abbreviations.
System requirements (the realistic version)
Your Python version is what makes or breaks the install. The official claim – Python 3.6 and higher, per the setup.py classifiers – is technically true and practically misleading, because of the spaCy pin.
| Component | Minimum | Recommended (2026) |
|---|---|---|
| Python | 3.6 | 3.7 or 3.8 – last versions with prebuilt spaCy 2.1.8 wheels |
| RAM | Not officially documented; plan for enough headroom to load a spaCy 2.x model plus working memory | |
| Disk | Not officially documented; the model tarball is ~30 MB but spaCy 2.x + dependencies add up | |
| Compiler | Only if Python 3.10+ | Avoid by using Python 3.7/3.8 |
| OS | Linux, macOS, Windows | Linux/macOS – fewer wheel headaches |
Why the Python downgrade? A spaCy community discussion (thread #12514, 2022-2023) confirms it directly: there is no spaCy v2 release from after Python 3.10’s release, no prebuilt wheels exist for 3.10+, and building from sdist on 3.11 fails because of Cython changes between Python versions. Same problem applies to 2.1.8. So unless you enjoy compiling old Cython, drop down.
Install Blackstone, step by step
Two halves: the library (Python code), and the model (trained weights on a separate S3 bucket – not PyPI).
1. Create an isolated environment
Non-negotiable. The spaCy 2.1.8 pin will conflict with anything modern, so isolation isn’t just good practice here – it’s the only way this works.
# Use Python 3.7 or 3.8 specifically
python3.8 -m venv blackstone-env
source blackstone-env/bin/activate # or blackstone-envScriptsactivate on Windows
pip install --upgrade pip setuptools wheel
2. Install the library
pip install blackstone
Pulls in spaCy 2.1.8 and the requests dependency for the legislation linker. On Python 3.7/3.8 it should grab a wheel and finish in under a minute.
3. Install the model
The model isn’t on PyPI. It lives on an AWS bucket:
pip install https://blackstone-model.s3-eu-west-1.amazonaws.com/en_blackstone_proto-0.0.1.tar.gz
That URL comes straight from the project’s Dockerfile. If the bucket ever vanishes, there’s no PyPI fallback – so download the tarball locally as a backup the first time it works.
Pro tip: After step 3, run
pip download https://blackstone-model.s3-eu-west-1.amazonaws.com/en_blackstone_proto-0.0.1.tar.gz -d ./model-cache. Stash that tarball in your repo’s CI cache or an internal artifact store. The S3 URL is a single point of failure for the entire setup – there’s no fallback, no mirror, and no announcement policy if it disappears.
First-time configuration and a quick sanity check
No config file to edit. Loading the model is the configuration. Drop this in a file called verify.py:
import spacy
nlp = spacy.load("en_blackstone_proto")
doc = nlp("The defendant relied on Donoghue v Stevenson [1932] AC 562 "
"in arguing that section 2 of the Theft Act 1968 did not apply.")
for ent in doc.ents:
print(f"{ent.text:40} -> {ent.label_}")
print("spaCy:", spacy.__version__)
Run python verify.py. Entities tagged CASENAME, CITATION, INSTRUMENT, and PROVISION plus spaCy: 2.1.8 means you’re done. Stack trace? Jump to the next section.
Common errors and the fixes that actually work
“Could not read config.cfg” when loading the model
The classic. You’ve installed Blackstone into an environment that already had spaCy 3.x, or upgraded spaCy after the fact. GitHub issue #28 in the ICLRandD/Blackstone repo has the exact stack trace: an OSError inside spaCy’s util.py, because load_model_from_path tries to read config.cfg – a file that didn’t exist in the spaCy v2 packaging format the proto model uses. Turns out the model was packaged before config.cfg was a thing.
Fix: re-create your venv, install Blackstone first, and never run pip install --upgrade spacy in that environment. The 2.1.8 pin is the load-bearing wall.
Build failures on Python 3.10+ (cymem, murmurhash, blis)
pip starts compiling C extensions and dies. No prebuilt spaCy 2.1.8 wheel exists for your Python version. The fix is to install Python 3.8 via pyenv or conda create -n blackstone python=3.8 – don’t fight it.
Conflicts with other NLP libraries
Try to install Blackstone alongside scispaCy, transformers, or anything else expecting spaCy 3.x and pip will refuse – or worse, half-succeed. Keep Blackstone in its own environment and call it as a microservice if you need it next to modern tooling.
An honest detour: should you even use it?
I keep coming back to this. The model is six years old, frozen at F1 ~70% (as measured in 2019), and trained on a corpus you can’t access. For straight named-entity work on UK case law it still pulls its weight – there genuinely is nothing else open-source trained on long-form common-law judgments. But if your task is contract clause extraction, you’d be better off with a modern transformer and a hundred labelled examples. Match the tool to the corpus.
Upgrading and uninstalling
No upgrade path exists. The PyPI page hasn’t moved since August 2019, and the GitHub Releases page remains at the original prototype. If a fork ever ships a spaCy 3.x port, you’ll be installing it from a git URL – not pip install -U blackstone.
To uninstall cleanly:
pip uninstall blackstone en-blackstone-proto spacy
deactivate
rm -rf blackstone-env
Removing the venv is the cleanest cleanup. The model package is named en-blackstone-proto with hyphens – pip’s tab completion sometimes won’t find it under the underscore form.
FAQ
Is Blackstone still maintained?
No. Treat it as archived.
Can I use Blackstone with American or Canadian case law?
The README suggests the model generalises reasonably well beyond England and Wales – to Australasian, Canadian, and American content (this claim dates from 2019 and hasn’t been independently benchmarked since). In practice: citation formats and statute names that resemble UK conventions get caught; US-style cites like 410 U.S. 113 get partial recall at best. Validate on a held-out sample of your own documents before trusting it on a full corpus.
Can I run Blackstone in Docker to avoid the Python version mess?
Yes – the repo includes a Dockerfile that installs both the library and the model. Build it on a python:3.8-slim base, push to your internal registry, and you’ve insulated yourself from the spaCy 2.1.8 versioning problem entirely. Expose a Flask or FastAPI endpoint inside it and call it as a service from whatever modern Python stack you’re actually running. One caveat: the image size will vary depending on your base and what else you include – don’t plan around a specific number until you’ve built it yourself.
Next step: spin up a Python 3.8 venv now, run the three pip commands above, and point it at one paragraph from a judgment you actually care about. You’ll know within five minutes whether Blackstone fits your corpus – or whether you need to budget for a custom-trained model instead.