// the find
infiniflow/ragflow
RAGFlow is a leading open-source Retrieval-Augmented Generation (RAG) engine that fuses cutting-edge RAG with Agent capabilities to create a superior context layer for LLMs
RAGFlow is a self-hosted RAG platform with a web UI, covering the full pipeline from document ingestion through chunking, embedding, retrieval, and LLM-generated answers with source citations. It targets teams who want a turnkey document Q&A system rather than assembling one from primitives. At 82k stars it's one of the most-starred RAG projects and is actively shipped.
1. Template-based chunking that actually understands document structure — separate parsers for papers, manuals, laws, resumes, books. Not naive text splitting. The deepdoc module handles tables and scanned PDFs in ways most RAG stacks skip entirely. 2. Chunk visualization UI lets you see exactly how your documents were split and manually intervene. This is rare and genuinely valuable; most RAG failures start at chunking and there's usually no way to inspect it. 3. Grounded citations — every answer maps back to specific chunks with traceable sources, not just a claimed reference. 4. Code sandbox for agent steps uses gVisor isolation, which is more serious about security than most 'AI agent' projects that just exec() untrusted code.
1. Infrastructure footprint is punishing: MySQL + MinIO + Elasticsearch + Redis + the app itself, minimum 16GB RAM, requires vm.max_map_count tuning. Not something you run on a dev machine or in a lightweight environment. 2. Docker images are x86-only — no ARM64 support means Apple Silicon and AWS Graviton users are building from source or running emulated. 3. The agent workflow is a JSON DSL with ~20 component types; it's hard to version-control, test programmatically, or diff meaningfully. The presence of dsl_migration.py suggests it's broken compatibility before. 4. The codebase mixes Python and Go (cmd/*.go files), has multiple overlapping service abstraction layers, and shows the scars of fast organic growth — expect non-trivial onboarding time if you need to go off the happy path.