// the find
giancarloerra/SocratiCode
Enterprise-grade (40m+ LOC) codebase intelligence, zero-setup, local & private Plugin/Skill/Extension or MCP: hybrid semantic search, polyglot dependency graphs, symbol-level impact analysis & call-flow, interactive HTML viewer, cross-project & branch-aware search, DB/API/infra knowledge. 61% less tokens, 84% fewer calls, 37x faster. Cloud in beta.
SocratiCode is an MCP server that gives AI coding assistants deep structural knowledge of a codebase: hybrid semantic+BM25 search via Qdrant, AST-aware chunking with ast-grep, polyglot dependency graphs, symbol-level blast-radius analysis, and call-flow tracing. It runs fully local via Docker with zero config, and plugs into Claude Code, Cursor, VS Code Copilot, and any other MCP-compatible host. Aimed at developers working on large, unfamiliar codebases who want their AI to navigate rather than grep.
The hybrid search (dense vector + BM25 with RRF fusion) is the right architecture — pure semantic search misses exact identifiers, pure keyword search misses conceptual queries, and this handles both in one call. Resumable batched indexing with content-hash checkpointing means a 40M-line codebase doesn't become a liability if the process crashes halfway through. The symbol-level impact analysis (blast-radius + call-flow) is genuinely useful and not something the built-in tools in Cursor or Copilot offer. The multi-agent shared index with cross-process file locking is a real design decision, not a marketing bullet — it means parallel agents don't race each other on the same project.
The Docker requirement is a hard dependency that will break in CI, air-gapped corporate machines, and anywhere Docker Desktop isn't allowed — the README addresses this only partially with 'use external Qdrant' workarounds. The benchmark numbers (61% less tokens, 84% fewer calls, 37x faster) are single-run comparisons against a grep-based baseline on VS Code's repo; they tell you nothing about accuracy or recall on unfamiliar codebases where semantic search typically struggles. AGPL-3.0 license is a deal-breaker for any team building proprietary tooling on top of it, and the repo quietly ships a separate commercial license file without making the dual-licensing terms obvious in the README. The AST-aware chunking falls back to line-based for unsupported languages, which quietly degrades search quality without any warning to the user.