You’re choosing between running Surya OCR on a rented GPU or your local CPU. The GPU costs $0.50/hour on Lambda Labs. Your laptop is free – but will it actually work?
Here’s what nobody mentions: Surya benchmarks used a 1xA6000 (48GB VRAM) compared cost-matched to 28 CPU cores on Tesseract. Translation: it’ll run on CPU, but you’re trading money for time. For a one-off test, CPU is fine. For batch processing 500 invoices, rent the GPU.
This guide walks through installing Surya OCR locally, verifying it works, and fixing the errors that only show up after you’ve already committed.
What You Actually Need
You’ll need Python 3.10+ and PyTorch. Not 3.9. Not 3.8. The official docs list 3.10 as the floor – older Python versions will install but fail at import time with cryptic module errors.
If you’re on a Mac or a machine without a GPU, install the CPU version of PyTorch first. The default pip install pulls CUDA builds, which bloat your environment by 2GB for hardware you don’t have.
| Component | Minimum | Recommended |
|---|---|---|
| Python | 3.10 | 3.11+ |
| RAM | 8GB | 16GB+ |
| Disk (models) | 3GB free | 5GB free |
| GPU (optional) | None (CPU works) | 16GB VRAM for batch |
The toolkit requires substantial GPU resources for optimal performance, with recommendations like 16GB VRAM for batch processing according to community benchmarks. But “optimal” and “functional” are different targets.
Install Surya via Pip
One command:
pip install surya-ocr
It pulls dependencies – Pillow, transformers, torch if you skipped it. Model weights will automatically download the first time you run surya. Not during install. During first execution.
This trips people up. You run the install, see “Successfully installed surya-ocr,” then fire a test script and it freezes for four minutes. No progress bar. No log output. Just waiting while 2.8GB of model files download to ~/.cache/huggingface/.
Pro tip: The first run will download models silently – expect 2-4GB and several minutes with no feedback. Don’t kill the process. Check
~/.cache/huggingface/hub/to confirm download activity.
You can verify the install without triggering the download:
pip show surya-ocr
Check the version, location, dependencies. If it shows up, the package landed correctly.
Configure Device Settings (Critical for Mac Users)
Settings are documented in surya/settings.py and can be overridden with environment variables. Your torch device will be automatically detected, but you can override this – for example, TORCH_DEVICE=cuda.
Here’s the part that matters: For text detection, the mps device has a bug (on the Apple side) that may prevent it from working properly. If you’re on Apple Silicon, Surya will auto-detect MPS and try to use it – then silently fail or return empty bounding boxes.
Force CPU mode on Mac:
export TORCH_DEVICE=cpu
Add that to your .zshrc or .bashrc if you’re running Surya regularly. The performance hit is real but the alternative is spending an hour debugging why your M2 Max returns blank output.
On Linux with NVIDIA GPU:
export TORCH_DEVICE=cuda
Verify your device choice worked:
python -c "import torch; print(torch.cuda.is_available())"
True means CUDA is ready. False means you’re on CPU.
Run Your First OCR Test
Grab any PDF or image with text. Here’s the Python script:
from PIL import Image
from surya.ocr import run_ocr
from surya.model.detection.model import load_model as load_det_model
from surya.model.detection.model import load_processor as load_det_processor
from surya.model.recognition.model import load_model as load_rec_model
from surya.model.recognition.processor import load_processor as load_rec_processor
image = Image.open("test_document.png")
langs = ["en"] # Use ISO 639 codes
det_processor, det_model = load_det_processor(), load_det_model()
rec_model, rec_processor = load_rec_model(), load_rec_processor()
predictions = run_ocr([image], [langs], det_model, det_processor, rec_model, rec_processor)
for prediction in predictions:
for line in prediction.text_lines:
print(line.text)
First run: models download, takes 3-5 minutes. Subsequent runs: 5-15 seconds depending on image size and hardware.
You can find language support for OCR in surya/recognition/languages.py – over 90 languages listed. Use two-letter ISO codes: en, es, zh, ar, etc. Don’t specify more than 4 at once; accuracy drops.
Common Install Errors and Fixes
Error: “No module named ‘transformers'”
You’re in a different Python environment than where you installed surya-ocr. Verify with which python and pip list | grep surya. Match them.
Error: Model loading hangs indefinitely
First run is downloading models. If it truly hangs (>10 minutes), check network or try clearing ~/.cache/huggingface/ and re-running. Firewall/proxy can block Hugging Face downloads.
Error: “ImportError: cannot import name ‘cached_download'”
This does not work with the latest version of transformers 4.37+ yet, so you will need to keep 4.36.2, which is installed with surya. If another package upgraded transformers, downgrade it:
pip install transformers==4.36.2
Error: Empty output or no bounding boxes detected (Mac M1/M2)
MPS bug. Force CPU with export TORCH_DEVICE=cpu and re-run.
Error: Poor accuracy on high-res scans
Try increasing resolution of the image so the text is bigger. If the resolution is already very high, try decreasing it to no more than a 2048px width. Counterintuitive, but models were trained on typical document scans (200-300 DPI), not 4K flatbed scans.
Verify Installation with the Built-In Streamlit App
Surya includes a Streamlit app that lets you interactively try it on images or PDF files. Run it:
streamlit run $(python -c "import surya; print(surya.__path__[0])")/app.py
Opens in your browser. Upload a test PDF or image. Click process. If text appears with highlighted bounding boxes, your install works.
This is the fastest way to confirm everything – models, dependencies, device config – landed correctly. If the Streamlit app runs, you’re good.
What It Won’t Do (Save Yourself the Debugging)
Surya is specialized for document OCR. It will likely not work on photos or other images. It will also not work on handwritten text per the official limitations.
Translation: don’t feed it phone photos of whiteboards or cursive notes. It’s trained on printed documents – invoices, reports, forms, books. Anything else is a coin flip.
Surya showed consistent results across different formats and even scanned documents, but tends to ignore large texts that might look like logos – maybe because the model was trained to ignore advertisements according to financial document benchmarks. If your logo contains critical text, preprocess it out or use a different tool for that section.
One user reported discovering this after processing 200 insurance forms – every company name was missing because it was in a logo box.
FAQ
Can I use Surya commercially?
Model weights use a modified AI Pubs Open Rail-M license – free for research, personal use, and startups under $2M funding/revenue. Above that threshold, you need a commercial license. Code is GPL. Check the official repo for current licensing terms.
Why does my M1 Mac return empty results?
Apple’s MPS backend has a known bug with Surya’s text detection model. Override to CPU with export TORCH_DEVICE=cpu before running. Performance is slower but results will appear. This is an Apple-side issue, not Surya’s fault.
How do I process PDFs with more than 100 pages?
Surya handles large PDFs – community reports confirm processing up to 2000 pages works. For very large batches, use the --page_range parameter to chunk the job: --page_range 0-99, then 100-199, etc. This prevents memory issues and lets you parallelize. Output is JSON, so merging results is straightforward.
Next step: point Surya at your messiest scanned invoice and watch what it catches – and what it doesn’t. The threshold tuning starts there.