// the find
datawhalechina/all-in-rag
🔍大模型应用开发实战一:RAG 技术全栈指南,在线阅读地址:https://datawhalechina.github.io/all-in-rag/
A Chinese-language educational tutorial series covering RAG (Retrieval-Augmented Generation) from first principles through production deployment. It walks through data loading, chunking, vector embeddings, hybrid search, Text2SQL, GraphRAG, and system evaluation with working code examples. Targeted at Chinese-speaking Python developers new to LLM application development.
- Unusually complete coverage: goes from basic document loading all the way through multimodal embeddings, hybrid dense+sparse retrieval, knowledge graph RAG with Neo4j, and RAGAS-style evaluation in one coherent learning path
- Code and docs are co-located per chapter (code/C1 through C9 mirrors docs/chapter1 through chapter9), making it easy to follow along without hunting for examples
- Real end-to-end project in chapters 8-9 using a cooking recipe dataset, complete with docker-compose for Milvus and Neo4j, which is far more useful than toy hello-world examples
- Covers Text2SQL with a proper structured module (knowledge_base, sql_generator, agent) rather than just a single script, showing realistic query-routing architecture
- A folder in C9 is literally named 'agent(代码系ai生成)' (code is AI-generated) and contains an AI_AGENT_README — shipping AI-generated code without review as part of a teaching resource is a bad practice that teaches bad habits
- No requirements pinning beyond a top-level requirements.txt; LangChain and LlamaIndex have broken APIs repeatedly and these notebooks will likely rot within months without locked versions per chapter
- Chapter 10 (second project) is listed as '规划中' (planned) and has been since at least the last push, so the curriculum has a visible incomplete tail
- English translation exists (README_en.md) but the actual tutorial docs and code comments are Chinese-only, making this largely inaccessible to non-Chinese readers despite the English readme suggesting otherwise