// the find
Cinnamon/kotaemon
An open-source RAG-based tool for chatting with your documents.
kotaemon is a self-hosted RAG application for document QA, built on Gradio. It ships a working UI out of the box while also exposing its pipeline internals as a Python library for developers who want to build on top of it. The target is teams or individuals who want to run document chat locally or on their own infrastructure without paying for a SaaS wrapper.
The hybrid retrieval setup (full-text + vector + reranking) is the right default — pure vector search on documents fails badly on exact terms and numbers, and most competitors ship vector-only. Citation rendering with direct PDF highlighting and relevance scores is genuinely useful and not common in open-source alternatives. The multi-user model with private/public collections is more production-ready than the typical single-user demo project. GraphRAG, LightRAG, and NanoGraphRAG are all pluggable without forking the core, which matters if you're doing knowledge graph work.
The GraphRAG setup requires OpenAI or Ollama only, and the nano-graphrag install 'might introduce version conflicts' with a manual pip uninstall fix documented in the README — that's a rough experience for a feature advertised as a selling point. The two-library split (kotaemon + ktem) is an abstraction that doesn't pay for itself at this scale; it adds indirection without clear extension points documented for users. Local GGUF inference via llama-cpp-python works, but the embedding story for fully offline use requires extra hunting — the README points at Ollama but the .env examples default to OpenAI models. SQLite is the default store which will cause pain if you try to run multiple workers or scale beyond a single machine.