// the find

superlinear-ai/raglite

★ 1,165 · Python · MPL-2.0 · updated May 2026

🥤 RAGLite is a Python toolkit for Retrieval-Augmented Generation (RAG) with DuckDB or PostgreSQL

RAGLite is a self-contained Python RAG toolkit that handles the full pipeline from document ingestion to LLM response, backed by either DuckDB or PostgreSQL. It skips the LangChain abstraction layer and works directly with LiteLLM, pgvector/DuckDB VSS, and rerankers. Aimed at developers who want a real RAG system without the framework tax.

Late chunking with multi-vector embedding is a genuine improvement over naive fixed-size chunks — it preserves document context that single-vector approaches lose. The closed-form query adapter (orthogonal Procrustes solution) is an unusual and mathematically sound way to align query embeddings with document embeddings without retraining. DuckDB support means you can run the whole thing locally with no server, which is useful for prototyping or small deployments. No PyTorch or LangChain in the core dependencies is a meaningful commitment — the install is lightweight and the failure surface is smaller.

Binary integer programming for sentence splitting and semantic chunking sounds rigorous but adds a hard dependency on a solver and makes chunking latency unpredictable on large documents — the README doesn't surface this tradeoff. The 'adaptive retrieval' feature (LLM decides whether to retrieve) burns tokens on the decision itself and can make latency highly variable in production; there's no obvious way to disable it per-request without switching config objects. Multi-tenancy is not addressed at all — if you have multiple users' documents in one database, there's no built-in isolation or access control, you'd have to build that yourself. The MCP server and Chainlit frontend are useful demos but the implementation lives in single files with no documented extension points, so customizing them beyond config knobs means forking internals.

View on GitHub →