Skip to content

Install Haystack 2.x: The RAG Framework Most Tutorials Skip

Haystack 2.x installs in 60 seconds - but there's a package name trap, Python version gotcha, and a farm-haystack vs haystack-ai split nobody tells you about.

8 min readIntermediate

Most RAG tutorials waste 800 words explaining what retrieval-augmented generation is. You already know. You’re here because pip install haystack didn’t work, or you installed the wrong package and nothing imports correctly.

The actual breakage: Haystack changed its package name from farm-haystack to haystack-ai in March 2024 when version 2.0 launched. Both still exist on PyPI. Installing both? Silent import failures. Python 3.9 worked until October 2025 – now you need 3.10+. The Docker base image ships without integrations, so your Chroma connection fails even though the install succeeded.

This is the deployment guide that skips the theory.

Why Haystack 2.x (And Not 1.x, Which Is Dead)

Haystack 1.x reached End of Life on March 11, 2025 (according to the Docker Hub page). Final version: 1.26.4. No security patches, no bug fixes. Following a 2023 tutorial that uses farm-haystack? You’re installing abandoned software.

The 2.x rewrite changed everything: new package name (haystack-ai), completely different API (pipelines replaced nodes), and Python 3.10 minimum (as of version 2.22.0 released in late 2025, per GitHub release notes). You can’t just upgrade – it’s a rewrite.

But 2.x is faster. Benchmarks from January 2026 comparing LangChain, LlamaIndex, and Haystack found Haystack used ~1,570 tokens per query vs LangChain’s ~2,400. Framework overhead? Under 6 milliseconds. Real cost savings if you’re running production queries.

Think of the package name split like Python 2 vs 3 – except both versions still install from PyPI, and your code won’t tell you which one you have until it crashes.

System Requirements (The Real Ones)

Minimum (will install but might not run everything):

  • Python 3.10+ (3.9 reached EOL October 2025, Haystack dropped support in 2.22.0)
  • 4 GB RAM for basic in-memory pipelines
  • pip 21.3 or later (older versions can’t resolve dependency groups correctly)
  • Linux, macOS, or Windows with WSL2 (native Windows works but has more edge cases)

Recommended (for actual RAG workloads):

  • Python 3.11 or 3.12 (better performance, officially tested in CI)
  • 16 GB RAM if you’re using local embedders (sentence-transformers models are 400MB+)
  • GPU with CUDA runtime if running local LLMs (optional, most people use API providers)
  • Separate virtual environment (venv or conda) – package conflicts are common

The official docs used to say “Python 3.9+” but that changed. GitHub release notes for 2.22.0 state Python 3.10 minimum.

Install Haystack 2.x (Three Commands, One Trap)

Clean environment first. Non-negotiable because of the farm-haystack vs haystack-ai conflict.

python3 -m venv haystack_env
source haystack_env/bin/activate # On Windows: haystack_envScriptsactivate
pip install --upgrade pip

Now install Haystack 2.x:

pip install haystack-ai

Core install: ~200 MB, 30-60 seconds.

Pro tip: Accidentally installed farm-haystack (Haystack 1.x) before? You’ll get cryptic import errors. Fix: pip uninstall -y farm-haystack haystack-ai && pip install haystack-ai – removes both packages before reinstalling the correct one. The official docs bury this in the troubleshooting section.

Verify:

python -c "from haystack.version import __version__; print(__version__)"

Should see something like 2.25.2 (or latest 2.x version as of early 2026). If you see 1.26.4 or get ImportError, wrong package.

Optional Dependencies (What Breaks When You Skip This)

Haystack’s core install is minimal. Features like PDF parsing, specific vector databases, or file converters need extra packages. You won’t know until runtime.

Error you’ll see:

ImportError: "Haystack failed to import the optional dependency 'pypdf'. Run 'pip install pypdf'."

Common optional dependencies:

  • pip install pypdf – for PDF processing (PDFToDocument component)
  • pip install sentence-transformers – for local embedding models
  • pip install chroma-haystack – if using ChromaDB as vector store
  • pip install opensearch-haystack – if using OpenSearch document store

The docs don’t list all possible dependencies upfront. Haystack’s lazy error messages tell you what to install only after you try to use the feature. Keeps the base install small – but annoying when deploying.

Verify It Actually Works (Not Just Installs)

Installation succeeding ≠ your components will run. Test with a minimal pipeline (no API keys needed):

from haystack import Pipeline
from haystack.components.builders import PromptBuilder

pipeline = Pipeline()
prompt_builder = PromptBuilder(template="Test: {{query}}")
pipeline.add_component("prompt", prompt_builder)

result = pipeline.run({"prompt": {"query": "hello"}})
print(result)

Runs without errors? Core Haystack works. Now test with an actual LLM call (requires API key):

import os
from haystack.components.generators.chat import OpenAIChatGenerator
from haystack.dataclasses import ChatMessage

os.environ["OPENAI_API_KEY"] = "your-key-here"

generator = OpenAIChatGenerator(model="gpt-4o-mini")
result = generator.run(messages=[ChatMessage.from_user("Say 'working'")])
print(result["replies"][0].content)

Common failure: ImportError: cannot import name 'OpenAIChatGenerator' – you’re running Haystack 1.x code on a 2.x install, or vice versa. The API changed completely between versions.

Docker Deployment (The Base Image Gotcha)

Official image: deepset/haystack:base-v2.25.2 (check Docker Hub for latest version tag as of early 2026).

Pull and run:

docker pull deepset/haystack:base-v2.25.2
docker run -it --rm deepset/haystack:base-v2.25.2 python -c "from haystack.version import __version__; print(__version__)"

The trap: base image contains only core Haystack. No integrations. App uses ChromaDB, OpenSearch, or any external vector store? Custom Dockerfile needed:

FROM deepset/haystack:base-v2.25.2

RUN pip install chroma-haystack sentence-transformers

COPY your_pipeline.py /app/
WORKDIR /app

CMD ["python", "your_pipeline.py"]

Build: docker build -t my-haystack-app .

Most users assume the Docker image is batteries-included. Then their ChromaDocumentStore import fails with MissingDependency. The official docs mention this in the Docker page – easy to miss.

For local testing, docker-compose works (6GB+ recommended). Production? Custom Dockerfile with only your dependencies is cleaner.

Common Install Errors (Real Messages From GitHub Issues)

Error Cause Fix
ERROR: No matching distribution found for scikit-learn>=1.0.0 Python version too old (pre-3.8) or pip not upgraded pip install --upgrade pip then retry
AttributeError: type object 'MissingDependency' has no attribute 'load' Trying to use a feature (e.g., FAISS) without installing its optional dependency Install the missing package: pip install faiss-cpu
ImportError: cannot import name 'Pipeline' from 'haystack' Installed farm-haystack (1.x) but using 2.x code examples pip uninstall farm-haystack && pip install haystack-ai
OSError: sndfile library not found (Docker) System dependency missing for audio processing components Add to Dockerfile: RUN apt-get install libsndfile1 -yqq

Pulled from actual GitHub issues (#1993, #2982, #3860). The audio library error? System dependency, not a Python package – pip won’t fix it.

Upgrade From Haystack 1.x (You Can’t – It’s a Migration)

No in-place upgrade path. Haystack 2.x rewrote the API. Key changes per the official migration guide:

  • Pipelines replaced Nodes – same concept, completely different syntax
  • PromptNode is gone – use PromptBuilder + generator components
  • Package name changed: farm-haystackhaystack-ai
  • Document stores now provide retriever components (1.x had separate Retriever classes)

Got a 1.x app? You rewrite it. A medium-complexity pipeline (retriever + ranker + generator) takes 4-8 hours to port. The 2.x docs have a migration guide – more “conceptual mapping” than “run this script.”

Uninstall 1.x first:

pip uninstall -y farm-haystack
pip install haystack-ai

Then rewrite your pipeline using 2.x documentation.

Uninstall / Cleanup

Remove Haystack and all dependencies:

pip uninstall -y haystack-ai
pip uninstall -y farm-haystack # if you had both installed
rm -rf haystack_env # delete virtual environment if done

Docker cleanup:

docker rmi deepset/haystack:base-v2.25.2

Used a custom Dockerfile? Remove your built image:

docker rmi my-haystack-app

What To Deploy Next

Haystack installed. Now you need:

  1. A vector database – InMemoryDocumentStore works for prototyping but doesn’t persist. Production options: Chroma (easy, local-first), Qdrant (fast, self-hostable), Pinecone (managed, expensive).
  2. An embedding modelsentence-transformers/all-MiniLM-L6-v2 is fast and free (local). OpenAI’s text-embedding-3-small costs $0.02 per 1M tokens but has better recall.
  3. An LLM provider – OpenAI (easiest), Anthropic (best for long context), or local via Ollama (free but you need GPU).

Haystack’s modular design means you can swap any of these without rewriting your pipeline. That’s the actual value – not the install, which is trivial once you know the package name trap.

Deploy your first pipeline with the official quick start guide, which shows a working Agent-based RAG in ~20 lines. Then profile it – Haystack’s low token usage (as shown in the January 2026 benchmark) means it’s cheaper at scale than LangChain, but only if you architect the retrieval step correctly.

FAQ

Do I install farm-haystack or haystack-ai?

haystack-ai. farm-haystack is Haystack 1.x (reached End of Life in March 2025 per official docs). Installing both in the same environment? Imports fail silently. Uninstall both first: pip uninstall -y farm-haystack haystack-ai, then install only haystack-ai.

Why does my Dockerfile fail with ‘MissingDependency’ even though I installed Haystack?

Base Docker image (deepset/haystack:base-v2.x) contains only core Haystack. No integrations. Need Chroma? Build a custom Dockerfile that runs pip install chroma-haystack on top of the base image. Docs mention this on the Docker page, but most users assume the image is fully loaded and hit this error during deployment.

Can I run Haystack 2.x on Python 3.9?

Not anymore. Python 3.9 reached End of Life in October 2025. Haystack 2.22.0 (released January 2026) dropped support. Need Python 3.10 or later. Older tutorials still reference 3.8-3.9 – those are outdated. Stuck on 3.9? Use Haystack 2.21.0 or earlier, but you won’t get security updates. Upgrade Python instead – 3.11 and 3.12 are officially tested and faster.