qtools-graph-doc
v0.1.0
Published
Neo4j-backed code knowledge graph — parses source code annotations and documentation into a queryable graph for AI agents and developers
Downloads
111
Maintainers
Readme
graphDoc
A Neo4j-backed code knowledge graph that makes your codebase queryable by humans and AI agents.
Source code annotations and documentation are parsed into a graph of concepts, files, and sections — then embedded for semantic search and traversable via Cypher for impact analysis, gap detection, and orientation briefings.
Built on qtools-graph-forge-core.
The Problem
AGENTS.md doesn't scale. A single flat file can describe a small project, but as a codebase grows, maintaining one monolithic document becomes impractical. Key design decisions get buried, business logic goes undocumented, and AI agents hallucinate context they should be able to look up.
Existing tools (lat.md, Axon, Context+) solve this with in-memory graphs or markdown files. graphDoc uses Neo4j — a real graph database with Cypher traversal, vector indexes, and persistent storage — because some questions can only be answered by walking a graph.
Quick Start
npm install -g qtools-graph-doc
cd your-project
graphdoc -init # Scaffold config, install git hook
# Edit .graphdoc/config.ini to set your Voyage API key
graphdoc -rebuild # Scan + load + embed (one command)
graphdoc -addToClaudeCode # Install Claude Code hooks (optional)That's it. Three commands. After that:
graphdoc -search --query="how does auth work"
graphdoc -plan "add rate limiting to the API"
graphdoc -orientationRequires Docker (for Neo4j) and a Voyage AI API key (for embeddings).
What It Does
graphDoc parses your codebase into a knowledge graph with three node types:
- SourceFile — every source file and markdown doc in your project
- Section — every heading in your markdown documentation
- Concept — architectural ideas extracted from
@conceptannotations in your code
These are connected by edges:
- CONTAINS — file → section (a file has these documentation sections)
- IMPLEMENTS — file → concept (this code implements this idea)
- DESCRIBES — section → concept (this documentation explains this idea)
- REFERENCES — section → section (wiki links between docs)
- DEPENDS_ON — concept → concept (one idea requires another)
Annotating Your Code
Add @concept annotations near the code that implements an idea:
// @concept: [[BatchLoadingPipeline]]
// @concept: [[StandardNodeFormat]]
async function loadNodes(session, nodes, graphName) {
// UNWIND batch creation, 500 at a time
...
}For Python:
# @concept: [[BatchLoadingPipeline]]
def load_nodes(session, nodes, graph_name):
...In markdown documentation, bind sections to concepts:
## Batch Loading Pipeline
<!-- @concept: BatchLoadingPipeline -->
The loader accepts standard node format objects and...Concepts are auto-created as stubs when first referenced. Use graphdoc -gaps to find stubs that need documentation.
CLI Commands
Setup & Maintenance
| Command | Description |
|---------|-------------|
| graphdoc -init | Scaffold .graphdoc/config.ini in current project. Installs git post-commit hook. |
| graphdoc -scan | Parse source files and docs, extract annotations and sections |
| graphdoc -load | Start Neo4j container and load scan output into graph |
| graphdoc -embed | Generate vector embeddings for all nodes (requires API key) |
| graphdoc -rebuild | Full pipeline: scan → load → embed in one command. Use -skipEmbed for fast rebuilds. |
| graphdoc -check | Validate source annotations match the graph — detect drift |
Query
| Command | Description |
|---------|-------------|
| graphdoc -orientation | The signature verb. Full project briefing: architecture, central concepts, health metrics, entry points |
| graphdoc -search --query="..." | Hybrid BM25 + vector semantic search across all nodes |
| graphdoc -plan "description" | Planning assistant. Survey docs, files, and concepts before making changes. Suggests reading order. |
| graphdoc -about --target="..." | What is this file or concept? Shows all relationships |
| graphdoc -impact --concept="..." | Blast radius: what files and concepts are affected if this changes? |
| graphdoc -gaps | What's missing: unannotated files, stub concepts, broken links |
Claude Code Integration
| Command | Description |
|---------|-------------|
| graphdoc -addToClaudeCode | Install graphDoc hooks into Claude Code's .claude/settings.json. Adds auto scan+load on git commit and file context lookup before edits. |
| graphdoc -removeFromClaudeCode | Cleanly remove all graphDoc hooks from Claude Code settings |
Orientation
The orientation command synthesizes a full project briefing from graph queries:
============================================================
graphForge — Code Knowledge Graph
============================================================
### Graph Summary
Total nodes: 653
Section: 548
SourceFile: 90
Concept: 15
Total edges: 763
CONTAINS: 744
IMPLEMENTS: 19
### Central Concepts (by connectivity)
1. BootstrapOncePattern — 3 connections
2. ProviderContract — 3 connections
3. VectorEmbedding — 3 connections
4. HybridSearchStrategy — 3 connections
### Key Contracts (implemented by multiple files)
! BootstrapOncePattern — 2 files depend on this
! ProviderContract — 2 files depend on this
### Health
V 548 documentation sections indexed
! 9 of 67 source files annotated (13%)
! 58 source files with no concept annotations
### Entry Points (files implementing the most concepts)
forgeRunner.js (3 concepts): BootstrapOncePattern, ConfigDrivenDispatch, ProviderContract
graphEmbedder.js (3 concepts): VectorEmbedding, SuperLabelIndexing, HybridSearchStrategy
============================================================Every section is a graph query, not hand-written documentation.
Example: Evaluating a Change
Before modifying the embedding system:
$ graphdoc -impact --concept="ProviderAgnosticEmbedding"
=== Impact Analysis: ProviderAgnosticEmbedding ===
Blast radius: 1 files affected
Direct implementors (1):
embeddingClient.js
$ graphdoc -search --query="adding a new embedding provider module"
1. [Provider Modules] Each provider is a file in lib/providers/.
Adding a provider = add one file. No other code changes.
2. [embeddingClient.js] create() factory with provider registry...
3. [Patterns Followed] Provider registry auto-discovers from providers/ directoryThree queries. Ten seconds. Complete understanding of the blast radius and implementation pattern.
Example: Planning a New Feature
Before writing any code, survey the landscape:
$ graphdoc -plan "add a new bb2 tool for querying memory nodes by date range"
=== Planning: "add a new bb2 tool for querying memory nodes by date range" ===
### Related Documentation
Create Node/Memory Tool Design Report
Date: January 11, 2025 — Design exploration with TQ and Milo
Query memories
bb2 -findMemories --type insight --minEnergy 2.5 --recent 7d
BB2 Tool Implementation Guide
Step-by-step guide for adding new tools to existing bb2 infrastructure
### Source Files Likely Involved
bb2_queryLogAnalysis.md
### Suggested Reading Order
1. READ: Create Node/Memory Tool Design Report (understand the design)
2. READ: Query memories (understand the design)
3. READ: bb2_queryLogAnalysis.md (understand the implementation)The planner surveys documentation, source files, and concepts to give you a starting point — what to read, what already exists, and where to begin.
Claude Code Integration
For AI-assisted development, graphDoc integrates directly into Claude Code:
graphdoc -addToClaudeCodeThis installs two hooks in .claude/settings.json:
Post-commit: After every
git commit, automatically runsgraphdoc -scan && graphdoc -loadin the background to keep the graph current. No API cost (embeddings are separate).Pre-edit: Before Claude edits any file, runs
graphdoc -abouton that file so the agent knows what concepts it implements and what other files share those concepts.
To remove: graphdoc -removeFromClaudeCode
How It Works
graphDoc is a graphForge provider. It follows the same pipeline as any graphForge data source:
Source Code + Docs
|
Parsers (JS, Python, Markdown)
|
Standard Node Format (JSON)
|
graphLoader (batch UNWIND into Neo4j)
|
graphEmbedder (Voyage AI vector embeddings)
|
graphSearchTool (hybrid BM25 + vector search)Each project gets its own superLabel (graphDoc_projectName) for index isolation. Multiple projects can share a single Neo4j container — cross-project queries work via Cypher while per-project indexes keep search scoped.
Configuration
graphdoc -init creates .graphdoc/config.ini:
[graphDoc]
projectName=myProject
languages=javascript
sourcePaths=system/code
docPaths=system/management/zNotesPlansDocs
ignorePaths=node_modules,.git,dataStores,logs
[embedding]
provider=voyage
model=voyage-4
dimension=1024
apiKey=your-voyage-api-key
batchSize=20Requirements
- Node.js >= 22
- Docker (for Neo4j containers, managed automatically)
- Voyage AI API key (for vector embeddings — set in
.graphdoc/config.ini)
Real-World Numbers
| Project | Files | Sections | Concepts | Nodes | Scan Time | Load Time | |---------|-------|----------|----------|-------|-----------|-----------| | graphForge | 90 | 548 | 15 | 653 | <1s | ~10s | | neoBrain | 1,086 | 19,467 | 0* | 20,553 | ~3s | ~60s |
*neoBrain has no @concept annotations yet — search works purely on documentation sections.
Architecture
graphDoc stands on two foundations:
- qtools-graph-forge-core — container management, batch loading, vector embedding, hybrid search. The infrastructure layer.
- Neo4j — a real graph database with Cypher traversal, ACID transactions, and native vector indexes. Not an in-memory simulation.
This means graphDoc can answer questions that markdown-based tools cannot:
- "What's the shortest path between concept A and concept B?"
- "Which concepts are implemented by files that nobody has touched in 6 months?"
- "What's the transitive blast radius of changing this contract?"
License
MIT
