// the find

vitali87/code-graph-rag

★ 2,289 · Python · MIT · updated Jul 2026

The ultimate RAG for your monorepo. Query, understand, and edit multi-language codebases with the power of AI and knowledge graphs

Code-Graph-RAG parses your codebase with Tree-sitter, stores AST relationships (functions, classes, call edges, imports, inheritance) in Memgraph, and lets you query it in natural language via LLM-generated Cypher. It supports ten languages and can run as an MCP server so Claude Code can use the graph as a tool. The target is developers dealing with large multi-language monorepos where grep-based search breaks down.

Tree-sitter parsing gets actual AST accuracy — it correctly handles decorators, nested functions, lambda captures, C++ operator overloads, and template specializations rather than pattern-matching text. The graph schema captures real code semantics: CALLS, INHERITS, IMPLEMENTS, OVERRIDES, IMPORTS — enough to answer cross-file 'what would break if I rename this?' queries with structural correctness. The MCP server integration is the standout feature: Claude Code can query the graph as a tool, giving the LLM real structural knowledge of a codebase instead of reading files blind. The test suite is extensive by any measure — hundreds of tests including per-language oracle tests that validate specific structural expectations, incremental update correctness, and retrieval evaluation.

The GitHub account hosting this repo was suspended and the project moved to Bitbucket — the README says so explicitly with multiple comment blocks around disabled badges. That's a serious flag for project continuity. The NL-to-Cypher translation is the load-bearing failure mode: if the LLM generates a query referencing a wrong label name or property that doesn't exist in Memgraph, you get empty results silently — no error, no feedback loop, just 'nothing found.' Setup is genuinely heavy: Python 3.12+, cmake (needed to compile the Memgraph Python client from C), ripgrep, Docker for Memgraph and Qdrant; on Windows, cmake builds of C extensions are fragile. The realtime updater recalculates all CALLS relationships on every file change — the README admits this is slow on large codebases, and the incremental call-graph story is acknowledged as incomplete.

View on GitHub → Homepage ↗