ts-knowledge-graph
v0.1.2
Published
Parse TypeScript source into a knowledge graph for autonomous code optimization
Readme
ts_knowledge_graph
Parse TypeScript source code into a knowledge graph, then use that graph as the substrate for an autonomous AI agent that finds and applies code optimizations.
Documentation
Full documentation lives in ./docs. The
documentation index describes every guide and command — start
there, or jump straight to Getting Started.
Why a graph
An optimization agent constantly needs to reason about blast radius:
- If I rewrite this function, who calls it and what breaks? —
CALLSedges - Is this export dead code I can delete? — cross-file reference resolution
- What is affected if I change this type? —
USES_TYPE/ type-checker edges
These questions require semantic parsing (symbol + type resolution), which is
why the extractor is built on ts-morph (the TypeScript
Compiler API) rather than a syntax-only parser.
Graph model
Nodes — Module, Class, Interface, TypeAlias, Enum, Function,
Method, Property, Parameter, Variable, ExternalModule.
Edges
| Layer | Edges |
| --- | --- |
| Structural | CONTAINS, IMPORTS, EXPORTS |
| Type | EXTENDS, IMPLEMENTS, USES_TYPE, RETURNS, PARAM_TYPE |
| Behavioral | CALLS, INSTANTIATES, OVERRIDES, READS, WRITES |
The structural layer is cheap and always emitted. The type + behavioral layers
require symbol resolution and are emitted with --semantic.
Usage
npm install
# structural graph only (fast)
npm run extract -- <path-to-project>
# full graph with heritage + CALLS edges
npm run extract -- <path-to-project> --semanticOutput is two JSONL files — outputs/graph/nodes.jsonl and
outputs/graph/edges.jsonl (override with --out) — one record per line, easy
to inspect, diff, and load into any store.
Querying the graph
Load the JSONL into an embedded Kùzu database, then run the query tools:
npm run dev -- load # reads ./outputs/graph, writes ./outputs/graph.kuzu
npm run dev -- find <name> # resolve a name to node ids
npm run dev -- who-calls <id> # direct callers of a symbol
npm run dev -- calls <id> # what a symbol calls
npm run dev -- blast-radius <id> --depth 10 # transitive callers (impact set)
npm run dev -- references <id> # everything that references a symbol/type
npm run dev -- dead-exports # exported symbols with no inbound refs
npm run dev -- neighbors <id> # one-hop neighbourhood (in + out)Every query command accepts --json to emit machine-readable output — this is
the shape the optimization agent consumes. Node ids come from find or another
query's results; do not hand-write them.
The query methods on GraphQuery (whoCalls, blastRadius, deadExports,
neighborhood, …) are designed to map one-to-one onto agent tools: JSON in,
JSON out.
For a task-oriented walk-through of these commands — using them by hand to answer impact, dead-code, and dependency questions — see the Static Analysis guide.
dead-exportsaccuracy: it is member-aware (a class/interface counts as live when any contained member is referenced) and considersCALLS,EXTENDS,IMPLEMENTS,USES_TYPE,RETURNS,PARAM_TYPE,INSTANTIATES, andREADS(value-identifier) edges. On this repository it reports exactly the two genuinely-unused type aliases — no false positives.
Web visualisation
Serve the database as an interactive graph — pan/zoom, kind filters, symbol search, per-node edge listing (see contribs/web_visualisation):
npm run web # reads ./outputs/graph.kuzu, serves http://localhost:4173
npm run web -- --db ./outputs/graph.kuzu --port 8080The optimization agent
The end goal: an agent that uses the graph to find and apply optimizations, verifying each one before keeping it.
cp .env-sample .env # pick a provider block, set key + model
npm run dev -- optimize --db ./outputs/graph.kuzu
npm run dev -- optimize "Inline the single-use helper X" --db ./outputs/graph.kuzu --model gpt-5.1The LLM layer sits on the OpenAI-compatible chat-completions API, so any
provider exposing that surface works — OpenAI, OpenRouter, Ollama, LM Studio,
vLLM — configured entirely through OPENAI_API_KEY, OPENAI_BASE_URL, and
OPENAI_MODEL (see .env-sample).
The agent runs a tool-calling loop. Its tools are the read-only GraphQuery
methods plus read_file; it gathers context, confirms blast radius, then calls
propose_optimization. The harness applies the edit, runs tsc --noEmit,
and keeps it only if type-checking passes — otherwise it reverts and hands
the compiler errors back for another attempt. Edits are unique-match
find/replace with in-memory backups; run on a clean git tree so you can review
(and git checkout) what it kept.
Architecture
src/
schema/ Zod schemas for nodes and edges (the wire format)
extract/
project_loader.ts load a ts-morph Project from tsconfig
node_id.ts deterministic, position-stable node ids
structural_extractor.ts modules, declarations, imports, containment
semantic_extractor.ts heritage, CALLS, INSTANTIATES, type edges
graph_builder.ts orchestrates extraction, dedupes by id
store/
jsonl_store.ts serialize the graph to JSONL
jsonl_reader.ts read + Zod-validate the JSONL back in
kuzu_store.ts load the graph into embedded Kùzu, run Cypher
query/
graph_query.ts the agent's query tools (who-calls, blast-radius…)
agent/
agent_tools.ts graph queries + read_file + propose_optimization, as LLM tools
code_editor.ts unique-match find/replace with in-memory backup + revert
verifier.ts runs `tsc --noEmit`, returns pass/fail + output
optimizer_agent.ts the LLM tool-calling loop (propose → verify → keep/revert)
cli.ts extract / load / query / optimize commandsNode ids are derived purely from the declaration (kind:relPath#name@line), so
any extractor computes the same id for the same symbol without a shared
registry — that is what lets the semantic layer link a call site to the exact
declaration node the structural layer emitted.
Roadmap
- [x] Embedded query layer — load into Kùzu (embedded,
Cypher) with traversal tools:
who-calls,calls,blast-radius,dead-exports,neighbors,find. - [x] Type edges —
USES_TYPE,RETURNS,PARAM_TYPE(plusINSTANTIATES) resolved through import aliases. - [x] Member-aware reference counting — a class/interface is live when any contained member is referenced.
- [x] Value-reference (
READS) edges — value-identifier usage, so exportedconsts (e.g. schemas) are no longer false-positive dead exports. - [x] Optimization agent — an LLM tool-calling loop (OpenAI-compatible API,
provider-agnostic) that proposes edits and keeps them only if
tsc --noEmitpasses (otherwise reverts). - [ ] Test verification — run the test suite alongside
tscin the verify step, so behavior-changing edits are caught, not just type errors. - [ ] Vector index — embed per-node summaries for hybrid graph + semantic retrieval, so the agent can find candidates by meaning, not just by name.
