Every tutorial about open source RAG search tells you to download the zip, run the installer script, and you’ll be chatting with your PDFs in ten minutes. That’s true on a clean Linux VM. On a real laptop with existing Python, conda, or any prior ML setup, the bundled installer is the slowest way in. Docker is faster, more predictable, and the only path that survives the dependency chaos documented in the project’s own bug tracker.
This guide walks you through deploying Kotaemon v0.11.2 – Cinnamon’s open-source RAG-based tool for chatting with your documents – with the commands that actually work in 2026, plus the five install errors I’ve watched people hit on GitHub.
What you’re actually deploying
Kotaemon is a self-hosted document QA web UI built on Gradio. According to the official README, the default pipeline is hybrid – full-text plus vector retrieval with re-ranking – which matters because pure vector search misses exact-string queries (names, IDs, error codes) that full-text catches instantly.
The project is mature – 25.2k GitHub stars and 2.1k forks as of early 2025 – but “mature” here means “actively patched,” not “frozen.” The latest tag is v0.11.2, released March 4, 2025, which added MCP tool support (per the official release notes). Right before that, v0.11.1 added uv package manager support for faster installation. If you’re reading this months later, check the releases page first – the dependency picture changes faster than the README does.
System requirements (minimum vs realistic)
The official docs don’t publish hard system requirements, so the table below is based on community reports and the official RAM guidance – treat it as rough orientation, not a spec sheet:
| Resource | Minimum | Recommended |
|---|---|---|
| OS | Linux/macOS/Windows 10+ | Linux (Ubuntu 22.04) for Docker |
| CPU | 4 cores | 8+ cores if running local LLMs |
| RAM | 8 GB (API-only) | 16-32 GB for local models |
| Disk | 10 GB | 50 GB+ once you index real documents |
| Python | 3.10 (manual install) | Skip – use Docker |
For local LLMs the official guidance is simple: choose a model smaller than available RAM, leaving about 2 GB free – so a 10 GB model on a machine with 12 GB free is the ceiling, per the Basic Usage docs. Larger models give better answers but throughput on CPU-only machines drops enough to make the interaction feel broken rather than slow.
The Docker install (recommended path)
Two image flavors. The full image bundles Unstructured for .doc, .docx, and other formats – the pull is bigger, but that’s a one-time cost. The lite image covers most users. Both are published for linux/amd64 and linux/arm64, which includes newer Apple Silicon Macs (per the official README).
For the lite image:
docker run
-e GRADIO_SERVER_NAME=0.0.0.0
-e GRADIO_SERVER_PORT=7860
-v ./ktem_app_data:/app/ktem_app_data
-p 7860:7860 -it --rm
ghcr.io/cinnamon/kotaemon:main-lite
The -v ./ktem_app_data:/app/ktem_app_data mount is non-negotiable. Without it, every container restart wipes your indexed files, conversations, and model settings. The official README confirms all application data lives in ./ktem_app_data.
Swap main-lite for main-full if you’ll feed it Word docs. You’ll thank yourself the first time a client sends a .docx.
Manual install (when Docker isn’t an option)
Need to modify source – say, to swap the default vector store for Milvus? Go manual. The sequence from the Milvus integration docs:
conda create -n kotaemon python=3.10
conda activate kotaemon
git clone https://github.com/Cinnamon/kotaemon
cd kotaemon
pip install -e "libs/kotaemon[all]"
pip install -e "libs/ktem"
python app.py
The conda step isn’t optional. The Linux installer fails with “Could not find an activated virtualenv (required). Installation failed” if you run it without one – documented in GitHub issue #425. Running inside an existing virtualenv triggers the same error.
First-time configuration in under five minutes
Open http://localhost:7860. Default login is username admin / password admin (per the Quick Start docs) – change this immediately on the Settings UI. Skipping this on a server with port 7860 exposed is how you wake up to a stranger’s chat history.
Head to the Resources tab and add an LLM. For local models via Ollama:
- api_key:
ollama(literal string) - base_url:
http://localhost:11434/v1/– but read the next paragraph first - model:
llama3.1:8bfor chat,nomic-embed-textfor embeddings
Docker + Ollama catch: If Kotaemon is running in Docker, replace
http://localhostwithhttp://host.docker.internalin the Ollama base_url. The container can’t see the host’s localhost – documented in docs/local_model.md. This single substitution accounts for maybe a third of “my local model won’t connect” issues on the tracker.
Verify it works
Container still running? docker ps | grep kotaemon. UI loads at :7860 and you can log in? Good. Now the real test: upload a small PDF in the File Index tab, hit Upload and Index, wait for completion, then ask a question whose answer is buried mid-document. A response with a citation snippet means retrieval is wired correctly. Hallucinated answers with no citations usually mean the embedding model isn’t connected – not the LLM.
Five real install errors and how to fix them
This is where competitor tutorials stop and reality starts. Every one of these comes from open GitHub issues on the tracker.
1. HfFolder ImportError on Windows. The installer pulls the latest huggingface-hub, which has removed HfFolder, but the bundled Gradio version still imports it – fatal crash on UI startup. Fix (from GitHub issue #833): pin huggingface-hub==0.23.2.
2. LangChain schema collisions. ktem expects an older LangChain schema layout, but a fresh install pulls a newer version where those module paths no longer exist – you’ll see ModuleNotFoundError for langchain.schema on startup. Issue #833 documents this; the fix is pinning LangChain to a compatible version in your venv before reinstalling.
3. Pydantic ceiling deadlock. Turns out kotaemon 0.11.3 requires pydantic<=2.10.6, while mcp and sqlmodel demand >=2.11.0 – pip cannot resolve this (issue #833). Install kotaemon first with the pydantic pin, then install mcp afterward with --no-deps.
4. Port 31415 won’t release on Windows. When Gradio crashes, llama-cpp-python keeps running silently and holds port 31415 – the next run_windows.bat dies with [Errno 10048] error while attempting to bind on address ('127.0.0.1', 31415) (issue #833). Open Task Manager, kill stray python.exe processes, then restart.
5. Missing cachetools / voyageai. The Windows .bat script doesn’t install cachetools or voyageai, so startup halts with ModuleNotFoundError on two separate files (issue #833). Fix: pip install cachetools voyageai. Watch out: installing voyageai can auto-upgrade langchain-core and re-break the LangChain fix from error #2 – check your pinned versions after.
The pattern? Almost every failure here traces back to unpinned dependencies. Docker dodges this by freezing the dependency tree at build time. The script installer doesn’t, and that gap is where all five errors live.
GraphRAG – but only the right flavor
There’s a footgun here worth knowing before you start indexing. Official MS GraphRAG indexing only works with OpenAI or Ollama – Claude, Gemini, and other providers aren’t supported (per the official README). For those providers, use the NanoGraphRAG implementation: set USE_NANO_GRAPHRAG=true in your environment and Kotaemon picks up your default LLM automatically. No competitor tutorial I’ve seen mentions this; you typically find out after a failed indexing run.
Upgrading and uninstalling
For Docker, upgrading is:
docker pull ghcr.io/cinnamon/kotaemon:main-lite
docker stop <container> && docker rm <container>
# then re-run your original docker run command
Your ./ktem_app_data volume carries forward – indexed files, settings, everything. To roll back, pin a specific tag like ghcr.io/cinnamon/kotaemon:v0.11.1 instead of main-lite.
Uninstall: stop the container, run docker rmi ghcr.io/cinnamon/kotaemon:main-lite, then delete ./ktem_app_data for a full wipe. Manual installs: conda env remove -n kotaemon plus deleting the cloned repo directory.
FAQ
Can I run Kotaemon entirely offline with no API keys?
Yes. Pair it with Ollama or llama.cpp, point both LLM and embedding settings to the local provider, disconnect. Quality is noticeably lower than GPT-4-class APIs – but for sensitive documents, it’s the only safe option.
Why does my first indexing run take so long?
Two things hit at once: the embedding model downloads on first use (a few hundred MB for nomic-embed-text), and then every PDF page goes through layout parsing, chunking, embedding, and full-text indexing in sequence. A 200-page PDF on a CPU-only machine with a local embedding model? Expect 10-15 minutes the first time. Subsequent files of similar size run roughly 3× faster because the model stays in memory. The wait is front-loaded, not ongoing.
Is the Docker lite image enough for production?
Lite handles PDF, HTML, MHTML, and XLSX. Add .doc or .docx users and you need full – there’s no runtime fix. It’s a build-time decision, so pick wrong and you’re rebuilding from scratch.
Start with the lite container, change the admin password, upload a PDF you actually care about. If retrieval feels off, the fastest fix is usually swapping the embedding model – not the LLM.