The #1 mistake people make installing PaperQA: they run pip install paper-qa on a fresh machine, watch it explode with a wall of Rust compiler errors, and assume the package is broken. It isn’t. PaperQA2 depends on tantivy, a Rust-based full-text search engine, and pip can’t build it without a Rust toolchain on PATH. Fix that one thing first and the rest of the setup is straightforward.
This guide covers paper-qa v2026.3.18 – the current calendar-versioned release on PyPI as of March 2026. We skip the marketing and get the CLI answering questions in under ten minutes.
What you’re actually deploying
PaperQA2 is a Python package that does agentic RAG over scientific PDFs with grounded in-text citations. The 2024 paper (arXiv:2409.13740) reports that it outperforms PhD and postdoc-level biology researchers on LitQA2 – that’s the headline. The practical reason to self-host: your PDFs stay local, you control the LLM provider, and the agent runs keyword search plus re-ranking plus contextual summarization instead of naive chunk retrieval.
Naming note. In December 2025 the project moved to calendar versioning, so Git tags now look like v2026.3.18 instead of v5.x.x. “PaperQA2” refers to paper-qa>=5, which covers both the old SemVer tags and the new CalVer tags. If a tutorial tells you to install “version 5” – that’s fine, but you’ll miss recent additions like tables, figures, non-English languages, and math equation parsing (added in the 2025-2026 releases, per the GitHub changelog).
System requirements
Don’t guess on Python. PaperQA2 (version 5 onward) requires Python 3.11+, per the official Edison Cookbook docs. Older Python 3.12 install reports were dominated by faiss-cpu wheel failures, which went away when v5 dropped the faiss dependency entirely. Today, 3.11 and 3.12 both work; 3.13 works if Rust is installed first.
| Component | Minimum | Recommended |
|---|---|---|
| OS | macOS 12 / Ubuntu 22.04 / Windows 11 + WSL2 | Linux or Apple Silicon |
| Python | 3.11 | 3.12 |
| RAM | 8 GB (cloud LLM) | 32 GB (local LLM via Ollama) |
| Disk | 2 GB (package + cache) | 20 GB+ if storing a PDF corpus |
| Rust toolchain | Required for tantivy source builds | rustup latest stable |
| LLM access | OpenAI API key | Anthropic/Gemini/Ollama via LiteLLM |
For corpora above 100 papers, get API keys for both Crossref and Semantic Scholar to avoid hitting public rate limits on the metadata-fetching step (documented in the Edison Cookbook).
Install paper-qa v2026.3.18
Use a fresh virtualenv. Always. The dependency tree is large – litellm, pymupdf, tantivy, tiktoken, pydantic-settings – and it conflicts with anything pinned to old openai or pydantic v1.
# 1. Install Rust FIRST if you're on a fresh machine
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
source $HOME/.cargo/env
# 2. Create env
python3.12 -m venv pqa-env
source pqa-env/bin/activate
pip install --upgrade pip
# 3. Install paper-qa (pulls v2026.3.18 from PyPI as of March 2026)
pip install paper-qa
# Or, if you want local embedding model support:
pip install "paper-qa[local]"
The [local] extra pulls sentence-transformers for HuggingFace-based embeddings via SentenceTransformerEmbeddingModel. Skip it if you’re sticking with OpenAI embeddings – it adds roughly 2 GB of torch for no benefit in that case.
First-time configuration
The minimum viable config: one environment variable and a folder of PDFs.
# Cloud LLM (default - uses gpt-4o-2024-11-20)
export OPENAI_API_KEY=sk-...
mkdir my_papers
cd my_papers
# Drop some PDFs in here, then:
pqa ask 'What methods did the authors use?'
That’s it. By default PaperQA2 uses gpt-4o-2024-11-20 for summary_llm, llm, and agent_llm, and text-embedding-3-small for embeddings (per the Edison Cookbook defaults). The CLI builds a tantivy full-text index in the current directory, parses each PDF, fetches metadata from Crossref/Semantic Scholar, then runs the agent loop.
Switching to a local model (Ollama)
This is where most people get stuck. The official path uses LiteLLM model configs:
import os
from paperqa import Settings, ask
# THIS LINE IS THE TRAP - see the next section
os.environ["OPENAI_API_KEY"] = "EMPTY"
local_llm_config = {
"model_list": [{
"model_name": "ollama/llama3.2",
"litellm_params": {
"model": "ollama/llama3.2",
"api_base": "http://localhost:11434",
"timeout": 600, # critical - 60s default fails on most CPUs
},
}]
}
answer = ask(
"What methods did the authors use?",
settings=Settings(
llm="ollama/llama3.2",
llm_config=local_llm_config,
summary_llm="ollama/llama3.2",
summary_llm_config=local_llm_config,
embedding="ollama/mxbai-embed-large",
),
)
Verify it works
Three quick checks – run them in order:
pqa --version– confirms the CLI binary is on PATH and prints the installed CalVer tag.python -c "import paperqa; print(paperqa.__version__)"– confirms the package imports cleanly.ImportError: cannot import name 'AsyncOpenAI'means your openai package is stale; runpip install -U openaito fix it.- Drop a single PDF into an empty folder and run
pqa ask 'Summarize this paper in one sentence.'– a cited answer with cost printed means the full pipeline works.
Pro tip: Before running
pqa askon a real corpus, point it at a single throwaway PDF first. The first query triggers metadata lookups, embedding calls, and agent calls all at once – easier to debug failures one paper at a time than across fifty.
Here’s a question worth sitting with before you go further: when something fails, is the error actually from PaperQA? Turns out LiteLLM sits between PaperQA and every model provider, so most of the cryptic tracebacks you’ll hit aren’t PaperQA errors at all – they’re LiteLLM errors wearing PaperQA’s name tag. Search the LiteLLM repo’s issues first whenever you hit an authentication or connection problem. This saves a lot of time.
Common errors and the actual fixes
These three account for most install threads on the repo.
1. “Cargo, the Rust package manager, is not installed”
The classic. During the pip install of tantivy you get “metadata-generation-failed”, with output saying Cargo is not on PATH and pointing to rustup.rs (GitHub issue #851). Install Rust via rustup, open a new shell so the PATH refreshes, and retry. brew install rust works on macOS but can ship an older Cargo build – rustup is safer for tantivy’s build requirements.
2. Ollama config returns AuthenticationError for gpt-4o-2024-11-20
Genuinely confusing. You set llm="ollama/llama3.2", configure litellm_params, and still get: "AuthenticationError: OpenAIException - The api_key client option must be set... Received Model Group=gpt-4o-2024-11-20". Why is it asking for OpenAI? Somewhere in the agent loop – often agent_llm or a fallback during settings validation – LiteLLM probes the default model group (GitHub issues #1040 and #481). The community fix: set OPENAI_API_KEY=EMPTY as an environment variable. Yes, literally the string “EMPTY”. That satisfies the client-side check without making any actual OpenAI calls.
3. APIConnectionError: OllamaException – litellm.Timeout
Sixty seconds. That’s the default LiteLLM timeout – not enough for local models on CPU, especially anything above 7B parameters (GitHub issue #1237). Bump it to 600 in the litellm_params as shown in the config above. This will bite you on the first real query if you skip it.
Upgrade and uninstall
Upgrade in place:
pip install --upgrade paper-qa
pqa --version # confirm new CalVer tag
Watch out before upgrading across major versions: the CalVer scheme (adopted December 2025) signals that releases don’t guarantee index compatibility – always test on a copy of your index folder, not the original. If you built your corpus under an older major version, plan to re-index rather than assume the existing files carry forward.
Uninstall is clean:
pip uninstall paper-qa
# Optionally also:
pip uninstall litellm tantivy pymupdf
rm -rf ~/.pqa # cache folder, if you used the default
rm -rf ./my_papers/.pqa_index # local tantivy index
FAQ
Do I really need to pay for OpenAI to use PaperQA2?
No. Any LiteLLM-compatible provider works – Anthropic, Gemini, Together, or fully local via Ollama. The defaults assume OpenAI, so going local requires the extra config shown above.
I have 500 PDFs from Zotero. Will this scale?
The bigger problem isn’t the vector store – it’s metadata throttling. Without Crossref and Semantic Scholar API keys, the metadata-fetching step silently skips documents around the 100-paper mark. You’ll get an index that looks complete but quietly dropped a chunk of your library. Get the API keys first, then worry about corpus size.
What’s the difference between paper-qa and pqapi?
Two different packages, two different use cases. paper-qa is self-hosted, open-source, your PDFs on your machine. pqapi is the client for the hosted paperqa.app service – no local index, different auth (PQA_API_KEY). Pick pqapi if you’d rather not manage API costs and local storage. Otherwise, paper-qa.
Next step: grab three PDFs from your current research project, drop them in a folder, and run pqa ask 'What are the contradictions between these papers?' – the fastest way to see whether the agent loop is worth the install effort for your specific workflow.