// the find

safishamsi/graphify

★ 66,993 · Python · MIT · updated Jun 2026

AI coding assistant skill (Claude Code, Codex, OpenCode, Cursor, Gemini CLI, and more). Turn any folder of code, SQL schemas, R scripts, shell scripts, docs, papers, images, or videos into a queryable knowledge graph. App code + database schema + infrastructure in one graph.

Graphify turns a codebase — plus any docs, PDFs, images, or videos in the same folder — into a queryable knowledge graph you can ask questions against from inside your AI coding assistant. Code extraction runs locally via tree-sitter (36 languages, no API key needed); docs and media go through whatever LLM backend you configure. Aimed at developers who want their AI assistant to understand architecture rather than grep through files.

1. Code-only corpora run entirely offline via tree-sitter AST — no API calls, no cost, no data leaving the machine. That's the right call and makes it usable in regulated environments with `--backend ollama`.

2. Confidence tags (EXTRACTED, INFERRED, AMBIGUOUS) on every relationship are genuinely useful. You can tell what the tool found in the code versus what it guessed, which matters when you're debugging a wrong answer.

3. The git merge driver for `graph.json` is clever — two developers committing in parallel get a union-merge instead of conflict markers. Small detail, but it's the kind of thing that would've been reported as a bug three months in.

4. The MCP server mode (`python -m graphify.serve`) gives structured tool-call access (`query_graph`, `shortest_path`, `get_pr_impact`) rather than forcing the LLM to parse free text — that's the right abstraction for repeated queries.

1. The PyPI package is named `graphifyy` (double-y) but the CLI command is `graphify`. Anyone who types `pip install graphify` gets a different unaffiliated package and a confusing failure. This is a known footgun the docs acknowledge but don't fix.

2. For large codebases (the docs say >5000 nodes), the HTML visualization breaks and the answer is 'skip it and use JSON'. That's a hard ceiling on the primary deliverable for any serious monorepo.

3. Doc, image, and video extraction quality is entirely dependent on whichever LLM you wire in. The graph is only as good as that extraction — there's no validation step, so a weak model silently produces a weaker graph with no signal that something was missed.

4. The README is doing double duty as a product page for 'Penpax' (a commercial always-on layer built on top). The repo is YC-backed with a Gumroad book and a waitlist at the bottom. That's not disqualifying, but it means the OSS tool's roadmap is downstream of a startup's commercial priorities.

View on GitHub → Homepage ↗