pi-codebase-memory

v1.0.5

Published

3 days ago

A fast, lightweight codebase indexing and search extension for pi-coding-agent.

0High
0Medium
0Low

r-dson

pi-coding-agent codebase-indexing search navigation symbol-navigation typescript extension

codebase-memory — pi-coding-agent extension

A minimal port of codebase-memory-mcp as a pi-coding-agent extension.

Instead of a Go binary + tree-sitter + SQLite the extension runs entirely inside the Node.js process that already hosts pi:

| MCP component | Extension equivalent | |---|---| | Go binary + CGO | Node.js built-ins — zero native deps | | tree-sitter AST | Per-language regex, line-by-line with quick-filter | | Content-hash incremental index | MD5 per file, async stat batch, same skip-if-unchanged logic | | SQLite WAL database | .pi-codebase.bin next to your project (v8 serialized) | | 11 MCP tools via stdio | 5 pi tools + 1 slash-command |

Installation

# Install directly from GitHub:
pi install git:github.com/R-Dson/pi-codebase

# Or via npm:
pi install npm:pi-codebase-memory

Resources

GitHub: R-Dson/pi-codebase
npm: pi-codebase-memory

Tools

`codebase_index`

Full scan — walks the project, extracts symbols, writes .pi-codebase.bin.

codebase_index()
codebase_index({ root_path: "/my/app" })

Supported languages: TypeScript · JavaScript · Python · Go · Rust · Java · C# · PHP · C · C++ · Ruby · Swift · Kotlin · Shell · Perl · Dart · Lua · Scala · R

Ignored directories: node_modules, .git, dist, build, .next, __pycache__, target, .cache, vendor, .venv, venv, coverage, .nyc_output, out

`codebase_update` (incremental)

Re-parses only files whose MD5 content hash has changed since the last index run. Unchanged files are reused verbatim — identical to the MCP's incremental reindex strategy. Falls back to a full scan when no prior index exists.

codebase_update()                        # check everything since last run
codebase_update({ root_path: "/my/app" })

Output tells you how many files were +added, -removed, or ~changed.

`codebase_search`

Query the in-memory index — much faster than grep for structural questions. Equivalent to search_graph in the MCP.

codebase_search({ query: "Handler" })                     # name regex
codebase_search({ kind: "class" })                        # by kind
codebase_search({ query: "process", file_pattern: "api" })
codebase_search({ kind: "function", limit: 100 })

Supported kinds: function · method · class · interface · type · variable · struct · enum · trait · module · route · http_call · macro · protocol · extension · object

`codebase_refs`

Find every usage of a symbol across the project. Equivalent to trace_call_path(direction="inbound") in the MCP.

Search back-end priority:

ripgrep (rg) — if installed; fastest, cross-platform including native Windows
grep — Unix (Linux / macOS / WSL)
Pure Node.js — always available; slower on large trees but works everywhere

codebase_refs({ symbol: "processOrder" })
codebase_refs({ symbol: "UserService", file_pattern: "*.ts" })
codebase_refs({ symbol: "main", limit: 200 })

`codebase_schema`

High-level overview: file counts per language, symbol counts per kind, index age, root directory listing. Equivalent to get_graph_schema.

codebase_schema()

The output also reports the platform and which search back-ends are active (find, grep, rg), so you know exactly what the extension is using.

Command

/codebase   →  index status (root, file/symbol count, age, platform info)

What gets extracted

| Language | Kinds | |---|---| | TypeScript / TSX | function, arrow function, class, interface, type, enum, method, route, http_call | | JavaScript / JSX | function, arrow function, class, method, route, http_call | | Python | function, method, class, route | | Go | function, method, struct, interface, type | | Rust | function, struct, enum, trait, type, module | | Java | class, interface, enum, method, route | | C# | class, interface, enum, struct, function, route | | PHP | function, class, interface, route | | C | function, struct, enum, type, macro | | C++ | class, struct, enum, function, method, type, macro | | Ruby | class, module, method | | Swift | class, struct, protocol, enum, function, method, type, extension | | Kotlin | class, interface, function, method, type, enum, object | | Shell | function | | Perl | function, module, class | | Dart | class, function, method, enum, type, mixin | | Lua | function, module | | Scala | class, object, trait, function, method, type, enum | | R | function |

Signatures are captured up to 200 characters — enough to show full generic bounds in Rust (pub fn foo<T: Serialize + Clone>() and long Java return types.

Platform support

| Environment | Discovery | Symbol extraction | Reference search | |---|---|---|---| | Linux / macOS | find (fast) | Node.js regex | rg → grep | | WSL | find (fast) | Node.js regex | rg → grep | | Native Windows | Node.js walk | Node.js regex | rg → JS scan |

Install ripgrep (winget install ripgrep / brew install ripgrep / apt install ripgrep) to get the fastest reference search on all platforms.

Persistence & incremental workflow

# Day 1 — initial index
codebase_index()          →  writes .pi-codebase.bin

# Day 2, session start    →  index reloaded automatically from .pi-codebase.bin

# After editing a few files
codebase_update()         →  only changed files are re-parsed (hash diff)

# After a big refactor
codebase_index()          →  full re-scan (safe to run at any time)

Add .pi-codebase.bin to .gitignore if you prefer not to commit it:

echo ".pi-codebase.bin" >> .gitignore

Workflow examples

# Structural overview of an unfamiliar repo
You: "What does this codebase look like?"
  → codebase_index() then codebase_schema()

# Find all HTTP handlers
You: "Where are the route handlers?"
  → codebase_search({ query: "Handler|Route|Controller", kind: "function" })

# Call-site tracing
You: "What calls processPayment?"
  → codebase_refs({ symbol: "processPayment" })

# Dead-code hint
You: "Find all exported functions in the billing package"
  → codebase_search({ query: "^[A-Z]", kind: "function", file_pattern: "billing" })

# After editing
You: "I just moved some files around, update the index"
  → codebase_update()

Comparison with codebase-memory-mcp

| Feature | MCP | This extension | |---|---|---| | Requires Go + CGO | ✅ | ❌ — zero external deps | | tree-sitter AST accuracy | ✅ | ⚠️ regex (resilient to syntax errors) | | Content-hash incremental index | ✅ | ✅ MD5, async stat batch, same strategy | | Call-graph edges (multi-hop) | ✅ | ❌ (use codebase_refs for single-hop) | | Cross-service HTTP linking | ✅ | ❌ | | Cypher-like query language | ✅ | ❌ | | Dead-code detection | ✅ | ❌ | | Works inside pi without MCP | ❌ | ✅ | | Modular, no build step | ❌ | ✅ | | Persistent index | ✅ | ✅ | | Reference search | ✅ | ✅ (rg / grep / JS) | | Symbol search | ✅ | ✅ | | Schema / overview | ✅ | ✅ | | Windows support | ❌ (WSL only) | ✅ (native + WSL) | | Resilient to broken syntax | ⚠️ | ✅ regex keeps working |

File structure

codebase-memory/
├── index.ts       # Entry point — state, events, registration (~140 lines)
├── types.ts       # Interfaces, constants, language specs with quickFilter (~420 lines)
├── indexing.ts    # File discovery, symbol extraction, full/incremental index (~340 lines)
├── search.ts      # ripgrep / grep / JS reference search (~90 lines)
└── tools.ts       # Helpers + 5 tool registrations with renderers (~480 lines)

Performance

The indexer is optimized for speed:

Concurrent I/O — semaphore-based worker pool keeps N files in flight simultaneously
Async stat batch — incrementalIndex stats all files in parallel, not sequentially
Per-language quick-filter — a single cheap regex skips ~85% of lines before running expensive pattern matches
Native crypto — MD5 via Node's C++ crypto module (hardware-accelerated)
v8 serialization — binary index persistence is 5–10× faster than JSON
Auto-tuned thread pool — UV_THREADPOOL_SIZE set to max(cpus × 2, 32) at startup

Real-world result on the Linux kernel (64,770 files, 7M+ symbols): ~24 seconds on a modern machine.

License

MIT

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme