// the find

dmayboroda/minima

★ 1,048 · Python · MPL-2.0 · updated Jan 2026

On-premises conversational RAG with configurable containers

Minima is a containerized RAG system for indexing and querying your local documents on-premises. It supports four deployment modes: fully local via Ollama, any OpenAI-compatible API, ChatGPT custom GPT integration, and Claude via MCP. The target user is a developer or small team that needs private document search without sending data to a cloud indexing service.

The four deployment modes are genuinely useful and cover the main tradeoffs — fully air-gapped Ollama, bring-your-own vLLM server, or hook into an existing Claude/ChatGPT subscription. The MCP integration is a smart angle: you get Claude's reasoning on top of your own document index without building a custom UI. The custom LLM workflow using function calling to decide whether retrieval is even needed is a better design than blindly stuffing retrieved chunks into every prompt.

The multi-docker-compose-file approach — four separate YML files for four modes — is brittle maintenance overhead; divergences between them are inevitable and already visible (reranker skip logic is baked into Dockerfile conditionals rather than compose profiles). The ChatGPT integration relies on Firebase auth and an external custom GPT, which means the 'on-premises' claim has a hard asterisk for that mode. There is no incremental indexing: if your document set is large, you are re-indexing from scratch on rebuild. The embedding model configuration (EMBEDDING_MODEL_ID + EMBEDDING_SIZE as separate env vars) is a footgun — mismatching them silently produces wrong results in Qdrant.

View on GitHub →