Engineering 12 min read

How graphify Turns Code Into a Queryable Graph

MMNMNOTE
githubknowledge-graphai-coding-assistanttree-sitterleidengraphragclaude-codedeveloper-tools
Updated June 8, 2026

Reference: safishamsi/graphify — MIT · Python

graphify is an AI coding-assistant skill that maps a whole project — code, docs, PDFs, images, video — into a knowledge graph the assistant can query instead of grepping through files. You type /graphify ., and it builds an interactive HTML graph, a plain-language report, and a queryable graph.json. One skill installs into twenty-plus tools.

What problem is graphify actually solving?

When an AI coding assistant answers a question about your codebase, it greps and reads files one at a time, burning context on every pass. graphify replaces that with a persistent map. As the README puts it, the project becomes "a knowledge graph you can query instead of grepping through files."1 The map is built once and reused.

The structural ideas are not new — code-intel tools like code-review-graph and SocratiCode also build local graphs to cut token cost. What is new in graphify is the packaging: the graph builder ships as a skill that drops into more than twenty different coding agents, with a build system that keeps every copy in sync.

The repo sits at 59,521 stars with 6,189 forks as of 2026-06-05, on release v0.8.31.2 Treat the star count as a snapshot — it moves daily, and the headline number sits next to a YC and book badge that are commercial context, not engineering claims.

How does graphify build the graph?

graphify runs a pipeline of seven pure functions, each living in its own module and each doing exactly one job. A directory of files flows through detection, extraction, graph building, clustering, analysis, reporting, and export. The ARCHITECTURE.md states the shape exactly, and the constraint between stages is what keeps the whole thing readable:

detect()  →  extract()  →  build_graph()  →  cluster()  →  analyze()  →  report()  →  export()

The design rule is what makes it readable. From ARCHITECTURE.md, verbatim: "Each stage is a single function in its own module. They communicate through plain Python dicts and NetworkX graphs - no shared state, no side effects outside graphify-out/."3 Every extractor returns the same {nodes, edges} dict, and validate.py enforces that schema before the graph is built. Every edge carries a confidence label — EXTRACTED, INFERRED, or AMBIGUOUS — so you always know what was found versus guessed.

flowchart LR
  F[Your files] --> D[detect]
  D --> E[extract<br/>tree-sitter AST<br/>local, no API]
  E --> B[build_graph<br/>NetworkX]
  B --> C[cluster<br/>Leiden / Louvain]
  C --> A[analyze<br/>god nodes]
  A --> R[report<br/>GRAPH_REPORT.md]
  A --> X[export<br/>graph.json / .html]
  X --> Q[graphify query<br/>CLI + MCP]

Does graphify send your code to a model?

No — code is parsed locally. The README is explicit: "Code is extracted locally with no API calls (AST via tree-sitter)."4 Tree-sitter walks the syntax tree to pull out classes, functions, imports, and call graphs, all on your machine. Only docs, PDFs, images, and transcripts go to the model, because those need semantic reading. Video and audio are transcribed locally with faster-whisper.

That split matters for two reasons. It keeps source code on your machine, and it makes the structural pass free — no tokens spent until you reach prose. tree-sitter is a mature incremental parser used across the editor ecosystem,5 which is why graphify can claim broad language coverage from one extraction path.

Why no vector database?

graphify finds communities with the Leiden algorithm, a graph-clustering method that groups nodes by how densely they connect.6 In cluster.py it uses Leiden via the graspologic library and falls back to Louvain in NetworkX when graspologic is absent. The interesting decision is what it skips. From how-it-works.md, verbatim:

"No embeddings needed." "The graph structure is the similarity signal — there's no separate embedding step or vector database."7

Most retrieval tools reach for embeddings and a vector store. graphify instead treats the edges the model already extracted as the similarity signal and lets clustering do the rest. Fewer moving parts, no separate index to keep warm, and the same graph drives both querying and the "surprising connections" in the report.

What makes graphify a skill and not just a CLI?

This is the design decision worth stealing. tools/skillgen/ renders one shared skill template into a separate SKILL.md for every supported host. The platforms.toml manifest declares each tool as a small set of deltas over a lean core — verbatim, "split = lean core + references sidecar," and "a platform declares only its deltas."

Each host overrides only what differs: how it launches a subagent, whether it reads CLAUDE.md or AGENTS.md, posix versus PowerShell, the query stub variant. The frontmatter description is, in the manifest's own words, "PRESERVED VERBATIM per platform." A 900-line gen.py renders the artifacts idempotently and ships a --check mode that byte-diffs the render against what is committed, so drift fails CI. This is the same one-source-many-dialects pattern that agent-skills applies to engineering process — here it is applied to a graph tool.

How does the assistant know to use the graph?

graphify install does more than copy a file. It writes always-on instruction blocks into the host's persistent config — CLAUDE.md, AGENTS.md, or a Cursor rule with alwaysApply: true — telling the assistant to prefer graphify query over grepping raw files. On hosts with payload-bearing hooks, like Claude Code, a hook fires before search-style tool calls.

For repeated structured access, graphify also runs as an MCP server. The server exposes query_graph, get_node, get_neighbors, shortest_path, and PR-impact tools, so an agent can traverse the graph through typed calls instead of reading the whole report. The team workflow leans on plain files: commit graphify-out/, and graphify hook install adds a git merge driver that union-merges graph.json so parallel commits never leave conflict markers.

What graphify teaches

Two ideas travel well beyond this repo. First, you can give an AI assistant structural understanding without embeddings or a vector store — a tree-sitter pass plus community detection gets you a queryable map. Second, a skill can be a generated artifact, not a file hand-maintained per tool: keep one source of truth, render the dialects, gate drift in CI.

That second idea is the part most teams underrate. Like a plain markdown note, a graph that lives as committed JSON on your own machine outlives the tool that drew it — the same reason your notes should outlive the app that wrote them. The map is portable because the format is boring.

Verdict

graphify is worth studying if you build developer tooling or care about how AI agents read code. The architecture is clean, the local-first parsing is honest, and the skillgen build system is a small masterclass in shipping one thing to many tools without drift. Watch the self-reported benchmark numbers with the usual skepticism, but the engineering underneath is the real lesson.

Frequently Asked Questions

What is graphify?

graphify is an open-source AI coding-assistant skill, backed by a Python library, that turns any folder of code, docs, PDFs, images, or video into a queryable knowledge graph. You type /graphify . in your assistant and get an interactive HTML graph, a plain-language report, and a graph.json you can query later.1

Does graphify send my code to an LLM?

No. Code files are parsed locally with tree-sitter and never sent to a model, per the README: "Code is extracted locally with no API calls."4 Only docs, PDFs, images, and transcripts go to the model for semantic reading. Video and audio are transcribed locally with faster-whisper, so source code stays on your machine.

Which AI coding tools does graphify support?

graphify installs into more than twenty tools, including Claude Code, Codex, OpenCode, Kilo Code, Cursor, Gemini CLI, GitHub Copilot CLI, Aider, Amp, OpenClaw, Factory Droid, Trae, Kiro, Pi, Devin CLI, and Google Antigravity. One graphify install command writes the right skill format for each host, generated from a shared source.

Does graphify need a vector database?

No. graphify uses no embeddings and no vector store. As its docs state, "the graph structure is the similarity signal."7 It clusters the graph with the Leiden algorithm and treats the edges the model already extracted as the semantic signal, which removes a separate index from the system entirely.

What are "god nodes"?

God nodes are the most-connected concepts in your project — the hubs everything flows through. graphify's report surfaces them alongside "surprising connections" (links between things in different files) and a handful of suggested questions the graph is uniquely positioned to answer.

How does graphify keep its skill in sync across tools?

A build-time tool, skillgen, renders one shared template plus per-platform deltas into a committed SKILL.md for each host. The manifest describes each as "lean core + references sidecar," and a --check mode byte-diffs the render against the committed files so any drift fails continuous integration.

Is graphify free and open source?

Yes. graphify is MIT-licensed and published on PyPI as the package graphifyy. As of 2026-06-05 it has 59,521 stars and 6,189 forks on release v0.8.31.2 The README also links a paid book and a company site, which are commercial context separate from the open-source tool.


A code graph that lives as plain JSON on your machine outlives the tool that drew it.


If you like tools that keep your work in open, portable files you own, that is the same idea behind mnmnote.com.

Footnotes

  1. graphify README, "Type /graphify in your AI coding assistant and it maps your entire project … a knowledge graph you can query instead of grepping through files." https://github.com/safishamsi/graphify (accessed 2026-06-05). 2

  2. GitHub REST API, repos/safishamsi/graphify — 59,521 stars, 6,189 forks, release v0.8.31. curl -s https://api.github.com/repos/safishamsi/graphify (as-of 2026-06-05). 2

  3. graphify ARCHITECTURE.md, "Each stage is a single function in its own module. They communicate through plain Python dicts and NetworkX graphs - no shared state, no side effects outside graphify-out/." https://github.com/safishamsi/graphify/blob/v8/ARCHITECTURE.md (accessed 2026-06-05).

  4. graphify README, "Code is extracted locally with no API calls (AST via tree-sitter)." https://github.com/safishamsi/graphify (accessed 2026-06-05). 2

  5. tree-sitter documentation, incremental parsing / concrete syntax trees. https://tree-sitter.github.io/tree-sitter/ (accessed 2026-06-05).

  6. Traag, Waltman & van Eck, "From Louvain to Leiden: guaranteeing well-connected communities," Scientific Reports 9, 5233 (2019). https://www.nature.com/articles/s41598-019-41695-z (accessed 2026-06-05).

  7. graphify docs/how-it-works.md, "No embeddings needed." / "The graph structure is the similarity signal — there's no separate embedding step or vector database." https://github.com/safishamsi/graphify/blob/v8/docs/how-it-works.md (accessed 2026-06-05). 2