qtools-graph-doc

v0.1.0

Published

2 months ago

Neo4j-backed code knowledge graph — parses source code annotations and documentation into a queryable graph for AI agents and developers

0High
0Medium
0Low

tqwhite

code-intelligence knowledge-graph neo4j mcp documentation graph code-analysis ai-agent claude-code semantic-search

graphDoc

A Neo4j-backed code knowledge graph that makes your codebase queryable by humans and AI agents.

Source code annotations and documentation are parsed into a graph of concepts, files, and sections — then embedded for semantic search and traversable via Cypher for impact analysis, gap detection, and orientation briefings.

Built on qtools-graph-forge-core.

The Problem

AGENTS.md doesn't scale. A single flat file can describe a small project, but as a codebase grows, maintaining one monolithic document becomes impractical. Key design decisions get buried, business logic goes undocumented, and AI agents hallucinate context they should be able to look up.

Existing tools (lat.md, Axon, Context+) solve this with in-memory graphs or markdown files. graphDoc uses Neo4j — a real graph database with Cypher traversal, vector indexes, and persistent storage — because some questions can only be answered by walking a graph.

Quick Start

npm install -g qtools-graph-doc

cd your-project
graphdoc -init                              # Scaffold config, install git hook
# Edit .graphdoc/config.ini to set your Voyage API key
graphdoc -rebuild                           # Scan + load + embed (one command)
graphdoc -addToClaudeCode                   # Install Claude Code hooks (optional)

That's it. Three commands. After that:

graphdoc -search --query="how does auth work"
graphdoc -plan "add rate limiting to the API"
graphdoc -orientation

Requires Docker (for Neo4j) and a Voyage AI API key (for embeddings).

What It Does

graphDoc parses your codebase into a knowledge graph with three node types:

SourceFile — every source file and markdown doc in your project
Section — every heading in your markdown documentation
Concept — architectural ideas extracted from @concept annotations in your code

These are connected by edges:

CONTAINS — file → section (a file has these documentation sections)
IMPLEMENTS — file → concept (this code implements this idea)
DESCRIBES — section → concept (this documentation explains this idea)
REFERENCES — section → section (wiki links between docs)
DEPENDS_ON — concept → concept (one idea requires another)

Annotating Your Code

Add @concept annotations near the code that implements an idea:

// @concept: [[BatchLoadingPipeline]]
// @concept: [[StandardNodeFormat]]
async function loadNodes(session, nodes, graphName) {
    // UNWIND batch creation, 500 at a time
    ...
}

For Python:

# @concept: [[BatchLoadingPipeline]]
def load_nodes(session, nodes, graph_name):
    ...

In markdown documentation, bind sections to concepts:

## Batch Loading Pipeline
<!-- @concept: BatchLoadingPipeline -->
The loader accepts standard node format objects and...

Concepts are auto-created as stubs when first referenced. Use graphdoc -gaps to find stubs that need documentation.

CLI Commands

Setup & Maintenance

| Command | Description | |---------|-------------| | graphdoc -init | Scaffold .graphdoc/config.ini in current project. Installs git post-commit hook. | | graphdoc -scan | Parse source files and docs, extract annotations and sections | | graphdoc -load | Start Neo4j container and load scan output into graph | | graphdoc -embed | Generate vector embeddings for all nodes (requires API key) | | graphdoc -rebuild | Full pipeline: scan → load → embed in one command. Use -skipEmbed for fast rebuilds. | | graphdoc -check | Validate source annotations match the graph — detect drift |

Query

| Command | Description | |---------|-------------| | graphdoc -orientation | The signature verb. Full project briefing: architecture, central concepts, health metrics, entry points | | graphdoc -search --query="..." | Hybrid BM25 + vector semantic search across all nodes | | graphdoc -plan "description" | Planning assistant. Survey docs, files, and concepts before making changes. Suggests reading order. | | graphdoc -about --target="..." | What is this file or concept? Shows all relationships | | graphdoc -impact --concept="..." | Blast radius: what files and concepts are affected if this changes? | | graphdoc -gaps | What's missing: unannotated files, stub concepts, broken links |

Claude Code Integration

| Command | Description | |---------|-------------| | graphdoc -addToClaudeCode | Install graphDoc hooks into Claude Code's .claude/settings.json. Adds auto scan+load on git commit and file context lookup before edits. | | graphdoc -removeFromClaudeCode | Cleanly remove all graphDoc hooks from Claude Code settings |

Orientation

The orientation command synthesizes a full project briefing from graph queries:

============================================================
  graphForge — Code Knowledge Graph
============================================================

### Graph Summary

  Total nodes: 653
    Section: 548
    SourceFile: 90
    Concept: 15
  Total edges: 763
    CONTAINS: 744
    IMPLEMENTS: 19

### Central Concepts (by connectivity)

  1. BootstrapOncePattern — 3 connections
  2. ProviderContract — 3 connections
  3. VectorEmbedding — 3 connections
  4. HybridSearchStrategy — 3 connections

### Key Contracts (implemented by multiple files)

  ! BootstrapOncePattern — 2 files depend on this
  ! ProviderContract — 2 files depend on this

### Health

  V 548 documentation sections indexed
  ! 9 of 67 source files annotated (13%)
  ! 58 source files with no concept annotations

### Entry Points (files implementing the most concepts)

  forgeRunner.js (3 concepts): BootstrapOncePattern, ConfigDrivenDispatch, ProviderContract
  graphEmbedder.js (3 concepts): VectorEmbedding, SuperLabelIndexing, HybridSearchStrategy

============================================================

Every section is a graph query, not hand-written documentation.

Example: Evaluating a Change

Before modifying the embedding system:

$ graphdoc -impact --concept="ProviderAgnosticEmbedding"

=== Impact Analysis: ProviderAgnosticEmbedding ===

  Blast radius: 1 files affected

  Direct implementors (1):
    embeddingClient.js

$ graphdoc -search --query="adding a new embedding provider module"

  1. [Provider Modules] Each provider is a file in lib/providers/.
     Adding a provider = add one file. No other code changes.
  2. [embeddingClient.js] create() factory with provider registry...
  3. [Patterns Followed] Provider registry auto-discovers from providers/ directory

Three queries. Ten seconds. Complete understanding of the blast radius and implementation pattern.

Example: Planning a New Feature

Before writing any code, survey the landscape:

$ graphdoc -plan "add a new bb2 tool for querying memory nodes by date range"

=== Planning: "add a new bb2 tool for querying memory nodes by date range" ===

### Related Documentation

  Create Node/Memory Tool Design Report
    Date: January 11, 2025 — Design exploration with TQ and Milo
  Query memories
    bb2 -findMemories --type insight --minEnergy 2.5 --recent 7d
  BB2 Tool Implementation Guide
    Step-by-step guide for adding new tools to existing bb2 infrastructure

### Source Files Likely Involved

  bb2_queryLogAnalysis.md

### Suggested Reading Order

  1. READ: Create Node/Memory Tool Design Report (understand the design)
  2. READ: Query memories (understand the design)
  3. READ: bb2_queryLogAnalysis.md (understand the implementation)

The planner surveys documentation, source files, and concepts to give you a starting point — what to read, what already exists, and where to begin.

Claude Code Integration

For AI-assisted development, graphDoc integrates directly into Claude Code:

graphdoc -addToClaudeCode

This installs two hooks in .claude/settings.json:

Post-commit: After every git commit, automatically runs graphdoc -scan && graphdoc -load in the background to keep the graph current. No API cost (embeddings are separate).
Pre-edit: Before Claude edits any file, runs graphdoc -about on that file so the agent knows what concepts it implements and what other files share those concepts.

To remove: graphdoc -removeFromClaudeCode

How It Works

graphDoc is a graphForge provider. It follows the same pipeline as any graphForge data source:

Source Code + Docs
       |
    Parsers (JS, Python, Markdown)
       |
  Standard Node Format (JSON)
       |
  graphLoader (batch UNWIND into Neo4j)
       |
  graphEmbedder (Voyage AI vector embeddings)
       |
  graphSearchTool (hybrid BM25 + vector search)

Each project gets its own superLabel (graphDoc_projectName) for index isolation. Multiple projects can share a single Neo4j container — cross-project queries work via Cypher while per-project indexes keep search scoped.

Configuration

graphdoc -init creates .graphdoc/config.ini:

[graphDoc]
projectName=myProject
languages=javascript
sourcePaths=system/code
docPaths=system/management/zNotesPlansDocs
ignorePaths=node_modules,.git,dataStores,logs

[embedding]
provider=voyage
model=voyage-4
dimension=1024
apiKey=your-voyage-api-key
batchSize=20

Requirements

Node.js >= 22
Docker (for Neo4j containers, managed automatically)
Voyage AI API key (for vector embeddings — set in .graphdoc/config.ini)

Real-World Numbers

| Project | Files | Sections | Concepts | Nodes | Scan Time | Load Time | |---------|-------|----------|----------|-------|-----------|-----------| | graphForge | 90 | 548 | 15 | 653 | <1s | ~10s | | neoBrain | 1,086 | 19,467 | 0* | 20,553 | ~3s | ~60s |

*neoBrain has no @concept annotations yet — search works purely on documentation sections.

Architecture

graphDoc stands on two foundations:

qtools-graph-forge-core — container management, batch loading, vector embedding, hybrid search. The infrastructure layer.
Neo4j — a real graph database with Cypher traversal, ACID transactions, and native vector indexes. Not an in-memory simulation.

This means graphDoc can answer questions that markdown-based tools cannot:

"What's the shortest path between concept A and concept B?"
"Which concepts are implemented by files that nobody has touched in 6 months?"
"What's the transitive blast radius of changing this contract?"

License

MIT

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

graphDoc

The Problem

Quick Start

What It Does

Annotating Your Code

CLI Commands

Setup & Maintenance

Query

Claude Code Integration

Orientation

Example: Evaluating a Change

Example: Planning a New Feature

Claude Code Integration

How It Works

Configuration

Requirements

Real-World Numbers

Architecture

License