@intentweave/index
v0.8.0
Published
Code-aware retrieval index — SQLite-based index for agents and CI
Downloads
892
Readme
@intentweave/index
Code-Aware Retrieval Index (CARI) — a lightweight, precomputed SQLite index for ranked file retrieval, connection discovery, and CI drift detection.
What It Does
CARI combines three independent signals into a single queryable index:
| Signal | Source | What It Captures | | ---------------------- | ---------------------------------------------------------- | --------------------------------------------------- | | Code structure | AST extraction (tree-sitter) | Exported symbols, classes, functions, methods | | Document semantics | Keyword extraction (headings, bold, code spans, body text) | Mentions of code symbols in docs | | Temporal coupling | Git log analysis | Co-change frequency, hotspots, staleness, ownership |
The output is a single .iw/index.db SQLite file. No LLM calls, no external services, $0 cost.
Installation
pnpm add @intentweave/indexOr as part of the full IntentWeave CLI:
npm install -g @intentweave/cli
iw index buildSQLite Schema
| Table | Purpose |
| ---------------- | --------------------------------------------------------------------------------- |
| symbols | Code symbols from AST (name, kind, file, line, export, body_hash, structure_hash) |
| annotations | Document spans linked to code symbols (confidence, source, qualifier, IDF score) |
| co_occurrences | Entity pairs that appear together in docs or code |
| co_changes | File pairs that change together in git (Jaccard + recency) |
| files | Per-file metadata (last modified, churn, hotspot, owner, doc_group) |
| imports | Import relationships between files |
| todos | Inline TODO/FIXME/HACK/XXX markers with file, line, and kind |
API
Build an Index
import { buildIndex } from "@intentweave/index";
const stats = buildIndex(dbPath, {
symbols, // AxOutput from ast-extractor
annotations, // from annotate()
coOccurrences, // from COX stage
coChanges, // from TCG pipeline
files, // file metadata
});Annotate Documents
import { annotate } from "@intentweave/index";
const annotations = annotate(kwxOutputs, symbols, {
minConfidence: 0.1,
applyIdfPenalty: true, // enable IDF noise filtering for full-depth mode
idfScores, // from computeIdf()
});Compute IDF Scores
import { computeIdf } from "@intentweave/index";
const idfScores = computeIdf(kwxOutputs);
// High-value terms: ~0.85 confidence
// Filler words ("system", "data"): floored at 0.1Query the Index
import { openIndex } from "@intentweave/index";
const db = openIndex(".iw/index.db");
// Ranked file retrieval
const files = db.retrieve({ query: "authentication", limit: 10 });
// Cross-layer connections
const conn = db.connections({ entity: "AuthService" });
// CI drift detection
const findings = db.check({ changed: ["src/auth.ts"] });
// Corpus-wide report
const report = db.report();
// Exact clone detection (identical body hash)
const exactClones = db.clones();
// Structural clone detection (same control flow, different identifiers)
const structClones = db.structuralClones();
// Circular import detection
const cycles = db.circularImports();
// Exported symbols never imported anywhere
const unused = db.unusedExports();
// High-churn, low-doc files ranked by urgency
const hotspots = db.hotspotPriority();
// TODO/FIXME/HACK/XXX inventory
const todoItems = db.todos();
// Documentation coverage percentage per directory
const coverage = db.moduleCoverage();
// Doc sections where all entity mentions are ungrounded
const orphans = db.orphanedSections();
// Per-doc completeness vs. referenced exports
const completeness = db.docCompleteness();
// Cross-group entity coverage conflicts
const drift = db.crossGroupDrift();Depth Modes
| Mode | Flag | Sources | Use Case |
| ------------------------ | -------------------- | ----------------------------------------------- | ---------------------------------------- |
| Structured (default) | --depth structured | Headings, bold, code spans, identifiers | Fast, precise, low noise |
| Full | --depth full | + body text dictionary matching + IDF filtering | Higher recall, more grounded annotations |
Full-depth mode uses a symbol dictionary (from AX) to match terms in markdown body text, then applies IDF penalties to suppress high-frequency filler words. The IDF scorer includes a baseline of 50 common stopwords pre-seeded at a 0.15 confidence ceiling.
Benchmark
Measured on the IntentWeave monorepo (264 code files, 7 docs, 5316 symbols):
| Metric | Structured | Full | | ------------------- | ---------: | ------------: | | Build time | 1.1 s | 2.8 s | | Annotations | 6,721 | 11,533 (+72%) | | Grounded | 2,548 | 7,360 (+189%) | | Co-occurrence edges | 1,099 | 2,631 (+139%) |
Dependencies
better-sqlite3^11.7.0 — synchronous SQLite for Node.js
License
Apache-2.0
