@boyanbalevdev/repomap

v0.4.0

Published

7 days ago

Local-first multi-repo code index for agents. Drop into any folder of git repos, index into SQLite + FTS5, search and traverse the code graph with file:line citations, serve it all over MCP. Optional deep mode adds a compiler accurate TS/JS call graph.

0High
0Medium
0Low

boyanbalevdev

code-index code-search code-graph monorepo multi-repo mcp agents claude-code sqlite fts5 cli

repomap

Local-first multi-repo code index built for coding agents. Drop it into any folder that contains git repositories (work, personal, client, anything), index everything into a single SQLite file, then search it, traverse its code graph, and serve it all over MCP with repo/file:line citations.

No server, no daemon, no API keys, no runtime dependencies. One folder, one .repomap directory, one database. The consumers are agents (Claude Code, Codex, Pi, anything MCP capable or CLI capable); repomap gives them precise, cheap, reindexable context instead of repeated cross-repo grepping.

Requirements

Node.js 22.13 or newer (uses the built-in node:sqlite, so there are no native builds and nothing to compile).

Install

npm install -g @boyanbalevdev/repomap     # the package is @boyanbalevdev/repomap, the command is repomap

The npm package is named @boyanbalevdev/repomap (the unscoped repomap name was already taken by an unrelated project); the installed binary is repomap everywhere.

From a checkout:

cd repomap
npm install
npm run build
npm link        # or: npm install -g .

Usage

cd ~/work               # any folder containing git repos
repomap init            # creates ./.repomap (config + sqlite + exports)
repomap scan            # discover repos, record branch/commit/dirty state
repomap index           # incremental index: files, chunks, symbols, packages
repomap ask "DATABASE_URL"
repomap ask "restaurantId" --repo my-api --limit 10
repomap graph "table:users" --direction in   # who defines, reads, writes this table
repomap graph "file:my-api/src/db/client.ts" --direction in --depth 2   # impact analysis
repomap graph --export  # writes .repomap/exports/graph.json
repomap status
repomap map             # writes .repomap/exports/map.md
repomap wiki            # writes .repomap/exports/wiki/ (one context page per repo)
repomap mcp             # MCP stdio server exposing all of the above as tools
repomap version         # print the version

Every command works from any subdirectory of the workspace; the tool walks upward to find .repomap, the same way git finds .git.

What ask finds

Results are ranked exact-first: symbol definitions, declared packages, file paths, then FTS5 full text matches scored by bm25. Every hit carries a citation.

specialist-mw-api/src/common/db/client.ts:1-33
    ... const connectionString = process.env.DATABASE_URL; ...
repo-api/src/server.ts:4  [endpoint] GET /users
repo-api/src/db/schema.sql:1  [table] users
specialist-mw-api/package.json  [npm dependencies 2.2.3] dataloader

Symbols cover functions, classes, types, enums, SQL tables and views, Prisma models, GraphQL types, Payload collection slugs, and HTTP endpoints (app.get, router.post, ...) across TypeScript, JavaScript, Python, Go, Rust, Java, Kotlin, Ruby, PHP, and C#. Packages come from package.json, requirements.txt, and go.mod.

The code graph

repomap index also builds an edges table: file imports file (relative imports are resolved on disk, including NodeNext ./x.js to x.ts), file defines table/endpoint/Payload collection, repo depends_on package, and file reads/writes SQL tables (SELECT ... FROM, JOIN, INSERT INTO, UPDATE ... SET, DELETE FROM). Node keys are plain strings:

file:<repo>/<path>   table:<name>   endpoint:<VERB /path>   collection:<slug>
package:<ecosystem>:<name>   module:<import specifier>   repo:<path>
function:<repo>/<path>#<name>   type:<repo>/<path>#<name>   (deep mode)

repomap graph "query" matches nodes (exact key first, substring fallback) and walks edges breadth first:

repomap graph "table:investors" --direction in        # who defines / reads / writes it
repomap graph "module:dataloader" --direction in      # every file importing dataloader
repomap graph "file:my-api/src/auth.ts" --direction in --depth 2 --json   # blast radius

After upgrading from a pre-graph index, run repomap index --force once so edges exist for unchanged files.

Deep mode (optional)

repomap index --deep
repomap graph "function:my-api/src/auth.ts#validateLicence" --direction in   # who calls it
repomap graph "type:my-api/src/users.ts#User" --direction in                 # who extends/implements it

--deep additionally parses TypeScript and JavaScript files with the TypeScript compiler API (syntax only, no type checker, so no tsconfig needed and it stays fast) and adds edges the regex layer cannot see: file defines function:, file calls function:, and type: extends/implements type:. That turns "who calls this function" and "who implements this interface" into one graph query.

Resolution is deliberately conservative: only named and namespace imports from relative specifiers that resolve to a real file on disk produce call edges. Default imports and package imports are skipped, not guessed, so an absent call edge means "not provable", never "not called". Two known limits of the syntax-only approach: a name re-exported through a barrel file (export { x } from './x.js') attributes calls to the barrel, not the defining file, and a local declaration shadowing an imported name can still produce an edge. When an exact function: key returns nothing, query the substring (repomap graph "#myFunction" --direction in) to catch callers through barrels.

The typescript package is an optional peer dependency, loaded lazily from the indexed repo's node_modules first, then from the global install. Without --deep it is never loaded, so the base install keeps zero runtime dependencies; with --deep and no typescript anywhere, the command fails loud with install instructions.

Deep state is incremental and one way per content hash: running --deep upgrades unchanged TS/JS files once, and a later plain repomap index keeps the deep edges (the content has not changed, so they are still true) without reindexing anything.

Agent integration

repomap is designed to be consumed by coding agents three ways; pick whichever the harness supports.

1. MCP server (best). repomap mcp speaks Model Context Protocol over stdio with zero dependencies and exposes repomap_ask, repomap_graph, repomap_status, repomap_index, and repomap_wiki. For Claude Code:

claude mcp add repomap -- repomap mcp --root ~/work

or in .mcp.json / any MCP client config:

{
  "mcpServers": {
    "repomap": { "command": "repomap", "args": ["mcp", "--root", "/Users/you/work"] }
  }
}

2. CLI with --json. Every query command emits machine readable output, so a CLAUDE.md / AGENTS.md snippet is enough for harnesses without MCP:

This folder is a repomap workspace: a SQLite index of every repo here.
- At session start run `repomap index` (fast no-op when nothing changed).
- Before grepping across repos: `repomap ask "<identifiers or phrases>" --json`
  returns ranked matches with repo, file, and line citations.
- For impact analysis and data flow: `repomap graph "<node or substring>" --direction in --json`
  (node keys: file:<repo>/<path>, table:<name>, endpoint:<VERB /path>, package:npm:<name>).

3. Pre-rendered context. repomap wiki writes one deterministic markdown page per repo (stack, layout, endpoints, tables, data access, dependencies) plus an index into .repomap/exports/wiki/, each flagged stale when commits land after the last index. repomap map and repomap graph --export produce map.md and graph.json for anything that wants the whole picture at once.

There is also a ready-made agent skill: repomap in the agentique skills repo. It teaches an agent when to reach for repomap instead of grepping, the query patterns for search and graph traversal, and the citation discipline (never assert a cross-repo fact without a repo/file:line citation). Install it with:

npx skills add balevdev/agentique --skill repomap

Freshness model: the source of truth is always the working tree. repomap index rehashes every file and skips unchanged ones, so running it at session start costs around a second on thousands of files and guarantees the index matches disk.

How it works

repomap init creates .repomap/ with config.json and repomap.sqlite (WAL mode, FTS5).
repomap scan walks the workspace (default depth 4), treats any directory containing .git as a repo, and records git metadata. It never descends into a repo looking for more repos, so nesting stays predictable.
repomap index hashes every indexable file (sha1). Unchanged hash: skipped. Changed hash: derived data (chunks, symbols, packages, edges) is deleted and rebuilt. Missing file: marked deleted and removed from search and graph. Each repo is indexed in one transaction, so a crash never leaves a repo half indexed.
Content is chunked in fixed 100-line windows with exact line ranges, which is what makes file:12-111 citations possible. FTS5 sync is handled by database triggers, so application code cannot forget it.
Repo identity is the path relative to the workspace root. Moving or syncing the whole workspace folder keeps the index valid; the next scan refreshes absolute paths.

Configuration

Edit .repomap/config.json. Everything has a sensible default: ignored directories (node_modules, dist, .next, ...), included extensions, excluded lockfiles and minified files, max file size (512 KB), scan depth, and chunk size. Binary files are detected and skipped automatically.

Design rules

Deep modules, thin interfaces: workspace, git, indexer, extract, deep, resolve, search, graph, wiki, mcp, report each own one concern end to end; the CLI only parses argv and prints.
Source files are the source of truth; the index is derived data and can always be rebuilt with repomap index --force.
Append-friendly and idempotent: every command can be re-run safely.
Fail loud: broken preconditions raise DependencyError with a human message and exit code 1.

Tests

npm test

Runs four suites: extraction unit tests (imports per language including multi-line and Go blocks, SQL table ops), deep extractor unit tests (defines, calls, heritage, conservative resolution), the workspace end to end suite (discovery, indexing, incremental skip, single-file reindex, deletion, search citations, repo filters, graph traversal, deep upgrade, exports, wiki, relocation), and an MCP suite that spawns the real server and speaks JSON-RPC to it like an agent harness would.

Intentionally not built

LLM answer composition and embeddings. The consumers are coding agents, which are themselves the LLM: they need exact, cheap, citable retrieval (search, graph, wiki, MCP), not a second model in the middle or an API key dependency. Everything here runs offline.