// the find
1517005260/graph-rag-agent
拼好RAG:手搓并融合了GraphRAG、LightRAG、Neo4j-llm-graph-builder进行知识图谱构建以及搜索;整合DeepSearch技术实现私域RAG的推理;自制针对GraphRAG的评估框架| Integrate GraphRAG, LightRAG, and Neo4j-llm-graph-builder for knowledge graph construction and search. Combine DeepSearch for private RAG reasoning. Create a custom evaluation framework for GraphRAG.
A from-scratch implementation of GraphRAG that combines Microsoft's GraphRAG, LightRAG, and Neo4j's llm-graph-builder into a single Python system with Neo4j as the graph backend. It layers a DeepSearch-style multi-step reasoning loop on top of the knowledge graph, targeting private-domain QA (the sample data is a Chinese university's student handbook). The multi-agent Plan-Execute-Report architecture is the most ambitious piece — a full pipeline from task decomposition through parallel retrieval to long-document generation with consistency checking.
The Plan-Execute-Report orchestration is genuinely well-structured: Clarifier, TaskDecomposer, and PlanReviewer are separate components rather than one monolithic planner, which makes the reasoning trace debuggable. The consistency checker catches its own hallucinations and logs specific corrections with severity levels — the example output shows it correctly flagging a fabricated citation, which is more self-awareness than most RAG systems demonstrate. Entity disambiguation and alignment (string recall + vector reranking + NIL detection) is the right architecture for resolving canonical entities in a graph and goes beyond what the upstream projects provide. The evaluation framework covers 20+ metrics across answer quality, retrieval performance, and graph-specific dimensions — including a HotpotQA multi-hop test script, which is a real benchmark rather than toy examples.
The demo data is a single university's policy documents (华东理工大学), so the system has never been stress-tested on a real heterogeneous corpus — all the impressive evaluation numbers are on in-distribution data. Deep research tasks take 42–158 seconds per query; there is no apparent async batching or early-exit when evidence count is sufficient, so latency will compound badly on multi-turn sessions. The acknowledged embedding confusion between semantically similar but legally distinct terms ('优秀学生' vs '国家奖学金') isn't just a fine-tuning problem — it reveals that the graph entity nodes are storing LLM-generated descriptions rather than grounded canonical definitions, which is a structural issue. Setup requires running Neo4j, a vector store, and multiple LLM API keys simultaneously with no Docker Compose for the full stack — only the database has compose support, making local reproduction harder than it should be for a research project.