Deploy Kotaemon v0.11.2: Self-Hosted Document AI Chat

Install Kotaemon v0.11.2 - the open-source document AI chat tool. Docker, uv script, real install errors, and the .env trap nobody warns you about.

Jamie Lin2026-04-297 min readIntermediate

Most “chat with your documents” tools want your data on their servers. Kotaemon flips that – it’s an open-source document AI chat app you self-host, point at any LLM you like, and run from a single Docker container or a one-line shell script. Cinnamon’s project passed 25k GitHub stars (as of the v0.11.1 release in February 2025) and ships a Gradio UI with multi-user login, hybrid retrieval, and inline PDF citations.

This guide covers v0.11.2 (released March 4, 2025) – current latest as of this writing – with the install path that works on the first try, plus four GitHub-issue errors that will eat your afternoon if nobody warns you first.

Pick your install method first

Three options exist. They’re not equivalent.

Method	Best for	Speed
Docker (lite/full)	Servers, headless deploys, isolation	Pull once, fast restarts
`run_uv.sh` script	Local dev, faster cold install	Lowest-friction setup
Conda + pip editable	Contributors, custom forks	Slowest, most flexible

Which one is right depends on whether you trust your network more than your patience. Docker means a large pull up front and zero dependency headaches afterward. uv means installing from source in minutes instead of the 8-10 minutes conda typically takes to resolve the same environment.

The uv script landed in v0.11.1 and the official README now lists it as the recommended non-Docker option – it auto-installs uv, creates a Python 3.10 venv, installs all deps, sets up PDF.js, and launches the app in one shot (conda on the same hardware typically takes 8-10 min; uv finishes in under 2 – your mileage varies).

System requirements

Per the official README:

Python: >= 3.10 – only for non-Docker installs
Docker: optional, recommended for production
Unstructured: only if you need formats beyond .pdf, .html, .mhtml, .xlsx
Architecture: linux/amd64 or linux/arm64 – both officially supported, including Apple Silicon
RAM: The official docs don’t publish hard minimums. Community reports suggest 8 GB is a floor for API-backed LLMs, 16 GB+ for local models – treat those as rough estimates, not guaranteed specs.
Disk: The lite image is meaningfully smaller than full (~2-3 GB larger for full, based on community pull reports – check docker pull output for current sizes, as this may have changed).

Install with Docker

Pull the lite image and run it. The volume mount on ./ktem_app_data is not optional – skip it and you’ll lose your indexed documents the moment the container stops.

docker run 
 -e GRADIO_SERVER_NAME=0.0.0.0 
 -e GRADIO_SERVER_PORT=7860 
 -v ./ktem_app_data:/app/ktem_app_data 
 -p 7860:7860 -it --rm 
 ghcr.io/cinnamon/kotaemon:main-lite

For .doc/.docx/.pptx support, swap the tag to main-full – same command, ~2-3 GB larger image. On Apple Silicon, pass --platform linux/arm64 explicitly. The implicit default sometimes resolves wrong (per issues #132 and #257), and the fix is just that one flag.

The .env trap: Turns out the .env file only seeds the DB on the first run and is ignored afterward. Change a key in .env later? Nothing happens. Set keys in the Resources tab in the UI – community issue #138 documents exactly how this bites people.

Install without Docker (uv method)

git clone https://github.com/Cinnamon/kotaemon
cd kotaemon
bash scripts/run_uv.sh

That’s the whole install. The conda alternative still works – conda create -n kotaemon python=3.10, then pip install -e “libs/kotaemon[all]” and pip install -e “libs/ktem” – but expect a longer wait and manual venv management.

First-time configuration

Open http://localhost:7860. Default credentials are admin / admin – change them immediately on the Settings page, or anyone on your network owns your document index.

Go to Resources → LLMs and add a provider (OpenAI, Azure, Ollama, Groq, or local llama-cpp). Then add an embedding model under Resources → Embeddings. Mark both as Active. If you skip the active toggle, nothing works downstream – this is the most common first-run mistake.
Go to File Index, drop a PDF, click Upload and Index.
Switch to Chat and ask something specific to the file.

If the answer comes back with a citation pill that opens the source PDF on click, you’re done. No citation, or “low relevance” warnings everywhere? Your embedding model isn’t loaded – go back to Resources and check the green active dot.

Four errors from the GitHub tracker

1. “Illegal instruction (core dumped)” – container exits silently. The fix is one flag: --platform linux/arm64. That’s the entire solution (documented in issues #132 and #257, both tracing to architecture mismatches on M-series Macs running x86 images under Rosetta).

2. LanceError(IO): failed to shutdown object writer when uploading a file. The docstore path isn’t writable inside the container (per issue #539). The -v ./ktem_app_data:/app/ktem_app_data mount in the Docker command above prevents it. Already started without the mount? Stop the container, add the flag, restart.

3. “GraphRAG dependencies not installed” warning at startup. Don’t try to fix this by running pip install inside the container – it creates a version conflict with kotaemon’s pinned packages (issue #545). Use NanoGraphRAG instead: the README recommends it specifically for this reason. Check the README for the exact launch flag, as environment variable names may have changed since this writing.

4. API key changes in .env have no effect. Already covered in the blockquote above. Short version: use the UI.

None of these appear in the official tutorials. Which raises a fair question: how many similar gotchas exist in less-trafficked parts of the tracker?

Upgrade and uninstall

One line:

docker pull ghcr.io/cinnamon/kotaemon:main-lite

Then re-run the same docker run command as before. Your ./ktem_app_data folder carries forward – all indexed documents and conversations survive (that folder is also how you move the install to a new machine; copying it is officially supported).

For source installs: git pull, then re-run bash scripts/run_uv.sh. The script is idempotent – safe to run again.

To uninstall: stop the container, delete ./ktem_app_data if you want a clean slate, then docker rmi ghcr.io/cinnamon/kotaemon:main-lite. Source installs: delete the cloned directory plus the env (conda env remove -n kotaemon or just delete the .venv folder uv created).

FAQ

Can I run Kotaemon entirely offline with a local model?

Yes – pair it with Ollama (ollama pull llama3.1:8b + ollama pull nomic-embed-text), then point Kotaemon’s Resources tab at http://host.docker.internal:11434 if you’re running inside Docker. No external API calls at that point. One practical caveat: leave at least 2 GB of RAM headroom above your model size, otherwise inference starts swapping to disk and slows to a crawl.

Lite vs full image?

Lite, unless you need .doc, .docx, or .pptx. That’s the only reason to take the larger image.

Why does the first chat take so long even though files are already indexed?

Two things stack on the first query: the LLM connection is established lazily, and the embedding model only loads when retrieval actually runs. After that first message, subsequent queries are much faster – usually under a second for the retrieval step. If it stays slow after warmup, the re-ranker is probably running on CPU. Either switch to a smaller re-ranking model in Settings, or disable re-ranking entirely if your retrieval quality is already acceptable without it. A lot of people assume the index is the bottleneck. It usually isn’t.

Next: spin up the lite container with the docker run command above, drop a PDF you actually need to query, and time how long the first useful answer takes. That number is your baseline for every config tweak afterward.