rlm-navigator
v2.0.0
Published
Token-efficient codebase navigation for AI-assisted coding
Downloads
3,041
Readme
RLM Navigator
Token-efficient codebase navigation for AI-assisted coding. Treats codebases as navigable hierarchical trees of AST skeletons — the AI sees structure first, drills into implementations only when needed. A file-watching daemon caches AST structures, a stateful REPL with dependency-aware staleness tracking provides targeted analysis, and automatic output truncation keeps every tool response within budget.
Problem
AI coding assistants treat source files as opaque text blobs. Every interaction starts the same way: read the whole file, scan for the relevant section, discard the rest. This is fundamentally wasteful because source code is structured — it has a hierarchy (modules → classes → methods → statements) that can be navigated without reading implementations.
The cost compounds quickly:
- Context Bloat: A 500-line file consumes ~2,000 tokens even when you only need one function. Across a multi-file task, the window fills with irrelevant code that the model must attend to on every generation step.
- Context Rot: LLM attention degrades over long contexts. Important instructions and earlier findings get diluted as the window fills with raw source. The model "forgets" what it already learned — not because the tokens are gone, but because attention is spread too thin.
- Exploration Loops: Without structural summaries, the AI has no compact representation of what a file contains. It re-reads files it already saw, or reads adjacent files speculatively, burning tokens on redundant I/O.
- Stale Data: Results stored in variables go stale when underlying files change. Without tracking, the AI operates on outdated information — a silent correctness problem that's worse than wasted tokens.
The root cause is a mismatch between how code is organized (hierarchical, structured) and how AI tools access it (flat, full-text). RLM Navigator closes this gap by exposing code structure as a first-class navigation primitive.
Solution
RLM Navigator provides 10 MCP tools that enforce a surgical navigation workflow:
Navigation tools:
| Tool | Purpose |
|------|---------|
| get_status | Check daemon health |
| rlm_tree | See directory structure (replaces ls/find) |
| rlm_map | See file signatures only (replaces cat/read) |
| rlm_drill | Read specific symbol implementation |
| rlm_search | Find symbols across files |
REPL tools (stateful Python environment with pickle persistence):
| Tool | Purpose |
|------|---------|
| rlm_repl_init | Initialize the stateful REPL |
| rlm_repl_exec | Execute Python code (variables persist across calls) |
| rlm_repl_status | Check variables, buffers, execution count + staleness warnings |
| rlm_repl_reset | Clear all REPL state |
| rlm_repl_export | Export accumulated buffers |
Built-in REPL helpers: peek() (read lines), grep() (regex search), chunk_indices() / write_chunks() (file chunking), add_buffer() (accumulate findings). All helpers automatically track file dependencies — when source files change, stale variables and buffers are flagged in repl_status and repl_exec output.
The workflow: tree → map → drill → edit. For complex analysis: init → exec with helpers → export buffers. Each step loads only what's needed.
Architecture
graph TB
MCP["MCP Server<br/>(TypeScript)"]
Client["Claude Code<br/>(AI Client)"]
Daemon["Python Daemon"]
FS["File System<br/>+ AST Cache"]
MCP -->|stdio| Client
MCP -->|TCP/JSON| Daemon
Daemon -->|watchdog| FS
Daemon -->|tree-sitter| FS
Daemon -->|cache| FSQuick Start
npx rlm-navigator@latest installThis copies the daemon and server into a local .rlm/ directory, installs dependencies, builds the MCP server, and registers with Claude Code. The daemon auto-starts when Claude Code connects — no separate terminal needed.
Other Commands
npx rlm-navigator@latest update # Update to latest version
npx rlm-navigator status # Check daemon health
npx rlm-navigator uninstall # Remove from projectManual / Development Setup
# 1. Install Python deps
pip install -r daemon/requirements.txt
# 2. Build MCP server
cd server && npm install && npm run build
# 3. Register with Claude Code
claude mcp add rlm-navigator -- node /path/to/server/build/index.js
# 4. Start the daemon (in a separate terminal)
python daemon/rlm_daemon.py --root /path/to/your/projectLegacy install scripts (install.sh, install.ps1) are still available for development.
Supported Languages
Tree-sitter powered parsing for: Python, JavaScript, TypeScript, Go, Rust, Java, C, C++
Unsupported file types get a graceful fallback (first 20 lines + line count).
Benchmarks
benchmark.py supports four modes that measure different aspects of token efficiency.
Workflow: Navigation Overhead
Compares "grep + full file reads" vs "tree → search → map → drill". Benchmarked against tiangolo/fastapi:
| Query | Approach | Files | Tokens | Reduction | Efficiency |
|---|---|---|---|---|---|
| authenticate | Traditional | 42 full reads | 47,131 | — | — |
| authenticate | RLM (full repo tree) | 9 maps | 19,109 | 59% | 2.5x |
| authenticate | RLM (targeted tree) | 9 maps | 8,364 | 82% | 5.6x |
| OAuth2PasswordBearer | Traditional | 20 full reads | 25,725 | — | — |
| OAuth2PasswordBearer | RLM (targeted tree) | 1 map | 3,267 | 87% | 7.9x |
Self-benchmark (this repo, query squeeze):
| Approach | Files | Tokens | Reduction | |---|---|---|---| | Traditional | 6 full reads | 22,139 | — | | RLM | 5 maps + 5 drills | 3,358 | 85% (6.6x) |
Scoping rlm_tree to the relevant subdirectory (--tree-path fastapi/security) is critical for large repos — it reduces tree overhead from ~11K tokens to 205, making the difference between 2-3x and 6-8x savings.
REPL: Targeted Analysis
Compares full file reads vs REPL-assisted grep + peek windows. Self-benchmark (query handle_request):
| Approach | Tokens | Reduction | |---|---|---| | Traditional (4 full reads) | 15,594 | — | | REPL (grep + peek) | 16 | ~100% (974x) |
The REPL's grep() returns only matching lines with file/line references — no need to read surrounding context unless you choose to peek() a specific range.
Truncation: Response Capping
Measures how much the 8,000-char truncation cap saves across all tool responses. For well-structured codebases where skeletons are concise, truncation rarely activates — but for large files or verbose tree outputs it prevents runaway token consumption.
Chunks: Skeleton vs Full-File vs Per-Chunk
Compares the cost of reading a file three ways: full text, skeleton only, and chunked windows. Self-benchmark (daemon/rlm_daemon.py, 397 lines):
| Approach | Tokens | Savings vs Full |
|---|---|---|
| Full file read | 3,332 | — |
| Skeleton (rlm_map) | 492 | 85% |
# Run benchmarks yourself
python benchmark.py --root /path/to/project --query "symbol" # workflow
python benchmark.py --root /path/to/project --query "symbol" --mode truncation # truncation
python benchmark.py --root /path/to/project --query "symbol" --mode repl # repl
python benchmark.py --root /path/to/project --file "src/file.py" --mode chunks # chunksConfiguration
| Environment Variable | Default | Description |
|---------------------|---------|-------------|
| RLM_DAEMON_PORT | 9177 | TCP port for daemon communication |
| RLM_MAX_RESPONSE | 8000 | Max chars before output truncation |
Development
# Run tests
cd daemon && python -m pytest tests/ -v
# Start daemon in dev mode
python daemon/rlm_daemon.py --root .How It Works
- Daemon watches your project with
watchdog, parses files withtree-sitter, caches AST skeletons. File change events propagate to both the skeleton cache and the REPL's dependency tracker. - REPL provides a pickle-persisted Python environment with codebase helpers (peek, grep, chunking, buffers). Tracks file dependencies per variable/buffer via mtime snapshots — when files change, staleness warnings surface automatically.
- MCP Server bridges Claude Code to the daemon via TCP JSON protocol, with automatic output truncation and staleness warning formatting.
- Skill enforces the navigation workflow (tree → map → drill → edit) and the chunk-delegate-synthesize workflow for large analyses.
- Sub-agent (Haiku) analyzes file chunks with structured output — relevance rankings, missing items, and suggested next queries.
PageIndex Integration
RLM Navigator integrates PageIndex — an LLM-powered document indexing library — to bring the same "map before drill" navigation paradigm to documentation files (.md, .pdf, .txt, .rst). Where tree-sitter parses code into AST skeletons, PageIndex parses documents into hierarchical section trees with semantic summaries.
Architecture Overview
┌──────────────────────────────────────────────────────────┐
│ Document Navigation │
│ │
│ rlm_doc_map ──┐ rlm_doc_drill ──┐ rlm_assess │
│ │ │ │ │
│ ┌──────▼─────────────────────▼─────────▼──────┐ │
│ │ Python Daemon │ │
│ │ │ │
│ │ ┌───────────────────────────────────────┐ │ │
│ │ │ doc_indexer.py │ │ │
│ │ │ │ │ │
│ │ │ PageIndex available? │ │ │
│ │ │ ├─ YES → md_to_tree() / page_index()│ │ │
│ │ │ │ (GPT-4o via OpenAI API) │ │ │
│ │ │ └─ NO → index_markdown_local() │ │ │
│ │ │ (regex header parsing) │ │ │
│ │ └───────────────────────────────────────┘ │ │
│ │ │ │ │
│ │ Unified Node Tree │ │
│ └──────────────────────────────────────────────┘ │
└──────────────────────────────────────────────────────────┘Dual-Provider Configuration
The configuration layer (daemon/config.py) manages two independent API providers. Both are optional — core navigation works entirely offline.
| Provider | Purpose | API Key Env Var | SDK | Default Model |
|----------|---------|----------------|-----|---------------|
| OpenAI | Document indexing via PageIndex | CHATGPT_API_KEY | pageindex | gpt-4o-2024-11-20 |
| Anthropic | Code enrichment via Haiku | ANTHROPIC_API_KEY | anthropic | claude-haiku-4-5-20251001 |
Feature flags are computed properties that require both the API key AND the SDK to be installed:
class RLMConfig:
@property
def doc_indexing_enabled(self) -> bool:
return self.openai_api_key is not None and self.pageindex_available
@property
def enrichment_enabled(self) -> bool:
return self.anthropic_api_key is not None and self.anthropic_availableThe model can be overridden via PAGEINDEX_MODEL environment variable. Both providers support .env files via python-dotenv.
Document Indexing Pipeline
When rlm_doc_map is called on a document file, the daemon routes through a fallback chain in doc_indexer.py:
1. PageIndex path (when CHATGPT_API_KEY is set and pageindex is installed):
For markdown files, calls pageindex.page_index_md.md_to_tree() with:
md_path: file pathmodel: configurable (defaultgpt-4o-2024-11-20)if_add_node_summary:"yes"— generates 1-line semantic summaries per sectionif_add_node_id:"yes"— assigns unique node identifiers
For PDF files, calls pageindex.page_index.page_index() with the same parameters.
Both are async functions wrapped with a manual event loop since the daemon is synchronous:
loop = asyncio.new_event_loop()
try:
result = loop.run_until_complete(md_to_tree(
md_path=file_path,
model=config.pageindex_model,
if_add_node_summary="yes",
if_add_node_id="yes",
))
finally:
loop.close()2. Local fallback (when PageIndex is unavailable or fails):
For markdown, a regex-based parser extracts headings (^#{1,6}\s+) while skipping headings inside code blocks. A stack algorithm builds the hierarchy:
- Tracks heading levels to nest children under parents
- Assigns line ranges (each section spans from its heading to the next heading)
- No API calls — works entirely offline
For plain text and RST files, a minimal indexer returns line count and a preview of the first 10 lines.
3. Error handling: If PageIndex raises any exception (network error, rate limit, invalid response), the indexer silently falls through to the local path. The user always gets a result.
Unified Node Tree Format
Both PageIndex and local indexing produce the same unified node schema, making downstream tools (MCP server, skill workflow) provider-agnostic:
{
"name": "Installation",
"type": "section",
"source": "pageindex_md",
"summary": "Steps to install the project using pip and npm.",
"metadata": {
"node_id": "pi-abc-123",
"text_preview": "Run pip install -r requirements.txt..."
},
"range": { "start": 15, "end": 28 },
"children": [
{
"name": "Prerequisites",
"type": "section",
"source": "pageindex_md",
"summary": "Required Python and Node.js versions.",
"range": { "start": 20, "end": 25 },
"children": []
}
]
}| Field | Description |
|-------|-------------|
| name | Section title (from heading or PageIndex) |
| type | "document" (root) or "section" (child) |
| source | "pageindex_md", "pageindex_pdf", "local_md", or "local_txt" |
| summary | LLM-generated 1-line summary (PageIndex only, null for local) |
| range | 1-indexed line range {start, end} for surgical extraction |
| metadata | PageIndex node IDs, text previews |
| children | Recursive array of child sections |
The source field lets consumers distinguish how the tree was built. When PageIndex is available, summaries provide semantic context that local parsing cannot — enabling richer navigation decisions.
Document Navigation Workflow
The MCP tools mirror the code navigation workflow:
rlm_doc_map → See section hierarchy (like rlm_map for code)
rlm_doc_drill → Read specific section (like rlm_drill for code)
rlm_assess → Check if gathered context answers the queryrlm_doc_drill uses the line ranges from the unified tree to extract only the requested section's content — the same surgical read pattern used for code symbols. A recursive _find_section() helper does case-insensitive title matching through the tree.
DSPy-Inspired Multi-Agent Navigation
RLM Navigator's multi-agent system draws directly from DSPy (Stanford NLP) — a framework for compiling declarative language model calls into self-improving pipelines. While the production implementation uses raw prompt templates rather than the DSPy library itself, the architecture faithfully follows DSPy's Signature/Module design patterns.
From DSPy Research to Production
The project's research/ directory contains the original DSPy prototypes — three dspy.Signature classes and a dspy.Module that used dspy.ChainOfThought() for each agent:
# Original DSPy prototype (research/Navigator Prompts.py)
class ExplorerSignature(dspy.Signature):
"""Policy network — proposes which code symbols to investigate."""
tree_skeleton = dspy.InputField(desc="AST skeleton of the codebase")
session_state = dspy.InputField(desc="Current MCTS session state")
selected_nodes = dspy.OutputField(desc="Ranked list of symbols to explore")
class MultiAgentNavigator(dspy.Module):
def __init__(self):
self.explorer = dspy.ChainOfThought(ExplorerSignature)
self.validator = dspy.ChainOfThought(ValidatorSignature)
self.orchestrator = dspy.ChainOfThought(OrchestratorSignature)The production implementation replaces dspy.Signature with prompt templates and dspy.ChainOfThought with structured JSON output parsing, but preserves the same three-agent architecture and input/output contracts.
The Triad Architecture
The system uses an AlphaGo-inspired pattern: Policy Network (Explorer) + Value Network (Validator) + Search Control (Orchestrator), coordinated by MCTS session state.
┌─────────────────────────────────────────────────────────────────┐
│ MCTS Navigation Loop │
│ │
│ 1. EXPLORE 2. INVESTIGATE 3. VALIDATE │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │ Explorer │─────>│ Squeezer │───────>│ Validator│ │
│ │ (Policy) │ │ (Drill) │ │ (Value) │ │
│ └────┬─────┘ └──────────┘ └────┬─────┘ │
│ │ proposes │ critiques │
│ │ nodes │ relevance │
│ │ │ │
│ ┌────▼──────────────────────────────────────▼─────┐ │
│ │ Orchestrator (Control) │ │
│ │ • Reads session state (visited, blacklist) │ │
│ │ • Decides: drill / answer / backtrack │ │
│ │ • Updates blacklist on irrelevant branches │ │
│ │ • Forces answer at max depth │ │
│ └────┬────────────────────────────────────────────┘ │
│ │ │
│ ┌────▼────────────────────┐ │
│ │ MCTSSession State │ │
│ │ • visited: [nodes...] │ │
│ │ • blacklist: {nodes} │ │
│ │ • scores: {node: 0.9} │ │
│ │ • context_accumulated │ │
│ │ • depth / max_depth │ │
│ └─────────────────────────┘ │
└─────────────────────────────────────────────────────────────────┘Agent Details
Explorer (daemon/agents/explorer.py) — the Policy Network:
- Receives: AST skeleton + session state (visited nodes, blacklist, current depth)
- Produces: 1-3 ranked node proposals with relevance scores (0.0-1.0) and reasons
- Filters: never proposes blacklisted or already-visited nodes
- Actions:
drill(investigate symbol),map(get skeleton),answer(sufficient context),pivot(change strategy)
{
"selected_nodes": [
{"path": "auth.py", "symbol": "AuthManager", "score": 0.95, "reason": "Handles authentication"}
],
"action": "drill"
}Validator (daemon/agents/validator.py) — the Value Network:
- Receives: user query + symbol path + drilled code snippet
- Produces: relevance verdict (
is_valid), confidence score, critique, and dependency list - The
dependenciesfield enables cascading exploration — if validatingAuthManagerreveals it depends onTokenStore, the Orchestrator can queue that for investigation
{
"is_valid": true,
"confidence": 0.9,
"critique": "Directly implements the authentication flow.",
"dependencies": ["token.py::TokenStore"]
}Orchestrator (daemon/agents/orchestrator.py) — the Search Controller:
- Receives: user query + full session state + last validation result
- Decision logic:
is_valid=true→ accumulate context, check if sufficient to answeris_valid=false→ blacklist the branch, propose alternative via Explorerdepth >= max_depth→ force answer with accumulated context- All branches exhausted → answer with best available context
- Produces: next action, target node, reasoning, and optional blacklist entry
{
"next_action": "drill",
"target_node": "token.py::TokenStore",
"reasoning": "Need to understand token storage to complete auth picture.",
"should_blacklist": null
}MCTS Session State
daemon/mcts.py manages navigation sessions with thread-safe state:
MCTSSession: Per-query state container with UUID. Tracksvisited(ordered exploration history),blacklist(rejected branches),scores(relevance per node), andcontext_accumulated(gathered code snippets). Theat_max_depthproperty triggers forced answer generation.MCTSSessionManager: Thread-safe registry of concurrent sessions. Creates, retrieves, and cleans up sessions with athreading.Lock()for safe concurrent access.
The session state is serialized to JSON and injected into every agent prompt, giving each agent full visibility into the search history. This prevents circular exploration — the Explorer won't propose nodes that are already visited or blacklisted.
Node Enrichment (Haiku API)
daemon/node_enricher.py adds semantic annotations to AST skeletons, improving Explorer's proposal quality:
parse_skeleton_symbols()extracts symbol definitions from skeleton text via regexbuild_enrichment_prompt()batches symbols and asks Haiku for 1-line summariesEnrichmentCachestores results keyed by(file_path, mtime)— invalidates when files changemerge_enrichments()annotates skeleton lines with summaries:def validate_token(self, token: str) -> bool: # L25-30 # Checks if token starts with 'sk-' prefix.EnrichmentWorkerprocesses the queue in a background daemon thread, enriching files asynchronously without blocking navigation
The enrichment pipeline is entirely optional (requires ANTHROPIC_API_KEY). When available, it transforms raw signatures into semantically meaningful descriptions that help the Explorer make better-informed navigation proposals.
Why Not DSPy Directly?
The research phase prototyped with DSPy's dspy.ChainOfThought() modules. The production implementation moved to raw prompt templates for three reasons:
- Dependency minimization: DSPy pulls in a significant dependency tree. The prompt-based approach requires only the
anthropicSDK (already needed for enrichment). - Transparency: Raw prompts make the agent behavior fully inspectable and debuggable. Each agent's exact prompt template lives in a single file.
- Architecture preservation: The core insight from DSPy — structured Signatures with typed input/output fields coordinated by a Module — translates directly to prompt templates with JSON schemas. The Triad architecture, session state management, and backtracking logic are all preserved.
All three agents use identical JSON parsing with graceful fallback:
def parse_output(raw: str) -> Optional[dict]:
text = raw.strip()
if text.startswith("```"):
text = text.split("\n", 1)[1].rsplit("```", 1)[0]
return json.loads(text)This handles both raw JSON and markdown-wrapped responses, with None return on parse failure rather than exceptions.
Manual Testing & Demonstration Guide
This section provides a comprehensive walkthrough for manually verifying RLM Navigator's functionality and demonstrating its token-saving capabilities. Each test builds on the previous one, following the core navigation workflow.
Prerequisites
- Install RLM Navigator in a target project:
cd /path/to/your/project npx rlm-navigator@latest install - Open the project in Claude Code. The daemon auto-starts when Claude connects.
- Verify the setup by asking Claude: "Check if the RLM daemon is running."
Expected: Claude calls get_status and reports the daemon is ALIVE with the correct project root, cached file count, and supported languages.
Tip: For the most compelling demo, use a medium-to-large codebase (100+ files) where token savings are dramatic. The FastAPI repo is a good benchmark target.
Test 1: Directory Exploration (rlm_tree)
Purpose: Verify Claude uses rlm_tree instead of ls/find/Glob for directory exploration.
Prompt: "Show me the project structure."
What to verify:
- Claude calls
rlm_tree(notls,find, orGlob) - Output shows directories with item counts, files with sizes and detected languages
- Hidden directories (
.git,node_modules,__pycache__) are excluded - Response stays within the truncation budget
Follow-up: "What's inside the src/ directory?"
What to verify:
- Claude scopes the tree to a subdirectory (
rlm_tree path="src/") rather than re-fetching the entire project - Deeper nesting is visible within the focused subtree
Test 2: File Signatures (rlm_map)
Purpose: Verify Claude reads structural skeletons instead of full files.
Prompt: "What functions and classes are in <pick a Python/JS/TS file>?"
What to verify:
- Claude calls
rlm_map(notcat,Read, orhead) - Output shows class/function/method signatures with line ranges
- Docstrings are preserved, but implementation bodies show
...(elided) - No raw source code appears — only the structural skeleton
Key observation: Compare the skeleton length to the actual file size shown in rlm_tree. A 500-line file might produce a 30-line skeleton — that's the token saving in action.
Test 3: Surgical Drill (rlm_drill)
Purpose: Verify Claude can retrieve a single symbol's implementation without reading the entire file.
Prompt: "Show me the implementation of <function_name> in <file>."
What to verify:
- Claude calls
rlm_mapfirst (to confirm the symbol exists and get its location) - Then calls
rlm_drillwith the exact symbol name - Output shows only the targeted function/method with line numbers (e.g.,
L45-82) - No surrounding functions or unrelated code appears
Edge case: Ask for a symbol that doesn't exist:
"Show me the nonexistent_function in <file>."
Expected: Claude reports an error cleanly rather than reading the full file to search.
Test 4: Cross-File Search (rlm_search)
Purpose: Verify symbol discovery across the entire codebase.
Prompt: "Find all files that reference <common_symbol> in this project."
What to verify:
- Claude calls
rlm_search(notgreporGrep) - Results show file paths with matching skeleton lines
- Multiple files are returned if the symbol appears across the codebase
- No full file contents are loaded — only skeleton excerpts
Follow-up: "Drill into the most relevant one."
What to verify: Claude picks one result and calls rlm_drill on the specific symbol, completing the search → map → drill workflow.
Test 5: Document Navigation (rlm_doc_map + rlm_doc_drill)
Purpose: Verify structured navigation of documentation files.
Prompt: "What sections are in the README?"
What to verify:
- Claude calls
rlm_doc_mapon the markdown file - Output is a hierarchical section tree with titles and line ranges
- Nested headings (
##,###) appear as children of their parent sections
Follow-up: "Show me the Installation section."
What to verify:
- Claude calls
rlm_doc_drillwith the section title - Only that section's content is returned, not the entire document
Test 6: Context Sufficiency (rlm_assess)
Purpose: Verify the assessment tool guides navigation decisions.
Prompt: "How does authentication work in this project?" (or any broad architectural question)
What to verify:
- After gathering some context (tree, map, drill), Claude calls
rlm_assessto check whether it has enough information - The assessment either confirms sufficiency or suggests specific areas to explore further
- Claude follows the assessment's guidance rather than speculatively reading more files
Test 7: REPL-Assisted Analysis (rlm_repl_*)
Purpose: Verify the stateful REPL for targeted analysis workflows.
Prompt: "Use the REPL to find all TODO comments across the codebase and summarize them."
What to verify:
- Claude calls
rlm_repl_initto start a fresh session - Uses
rlm_repl_execwithgrep("TODO")to search - Uses
peek()to read context around specific matches - Uses
add_buffer("todos", ...)to accumulate findings - Calls
rlm_repl_exportto retrieve the collected results - The entire analysis uses minimal tokens compared to reading every file
Staleness test (requires two terminals or a manual file edit):
- Ask Claude to grep for something and store the result
- Manually edit the file that was found
- Ask Claude to check REPL status
What to verify: rlm_repl_status shows a staleness warning for the modified file, flagging that the stored variable is based on outdated data.
Test 8: File Chunking (rlm_chunks + rlm_chunk)
Purpose: Verify large file handling via chunked reading.
Prompt: "How many chunks does <large_file> have? Show me the first chunk."
What to verify:
- Claude calls
rlm_chunksto get metadata (total chunks, line count, chunk size, overlap) - Calls
rlm_chunkwith index 0 to read the first chunk - Chunk content includes a header with line range (e.g., "lines 1-200")
- Subsequent chunks can be read independently without re-reading earlier ones
Test 9: Full Navigation Workflow (End-to-End)
Purpose: Verify the complete tree → map → drill → edit workflow in a realistic task.
Prompt: "Find where HTTP request validation happens and add input length checking to the main handler."
What to observe (step by step):
- Tree: Claude explores the project structure to identify relevant directories
- Search/Map: Claude searches for validation-related symbols, then maps candidate files
- Drill: Claude drills into the specific handler function
- Edit: Claude makes a surgical edit using only the lines it drilled into
What to verify:
- At no point does Claude read an entire file with
catorRead - Each navigation step loads only what's needed for the next decision
- The edit targets specific lines rather than rewriting the whole file
Test 10: Token Savings Verification
Purpose: Quantify the actual token reduction achieved during a session.
Prompt (after completing several of the tests above): "Show me the session statistics."
What to verify:
- Claude calls
get_status - Session stats show:
- Tokens served: Total tokens delivered across all tool calls
- Tokens avoided: Tokens that would have been consumed by full-file reads
- Reduction percentage: The overall savings (typically 60-90% on real codebases)
- Tool call breakdown: Per-tool usage counts and token contributions
Benchmark comparison: For a quantitative demo, run the built-in benchmark against your project:
python benchmark.py --root . --query "<common_symbol>"
python benchmark.py --root . --query "<common_symbol>" --mode repl
python benchmark.py --root . --file "<large_file>" --mode chunksTest 11: File Watcher Integration
Purpose: Verify that code changes are reflected without restarting the daemon.
Steps:
- Ask Claude to map a file: "Show me the skeleton of
<file>." - In a separate editor, add a new function to that file and save
- Ask Claude to map the same file again
What to verify:
- The second
rlm_mapcall shows the newly added function - No daemon restart was needed — the file watcher detected the change and invalidated the cache automatically
Test 12: Multi-Language Support
Purpose: Verify AST parsing works across supported languages.
Prompt: "Map one Python file, one JavaScript file, and one TypeScript file."
What to verify:
- Each file produces a proper structural skeleton with language-appropriate constructs:
- Python:
class,def,async def, decorators - JavaScript:
class,function,const/letarrow functions,export - TypeScript:
interface,type,class,function, generics
- Python:
- Unsupported file types (e.g.,
.toml,.yaml) get a graceful fallback showing the first 20 lines and total line count
Test 13: Haiku Enrichment Verbosity
Purpose: Verify that Haiku enrichment activity is visible — status flags, progress notifications, and annotated skeletons all surface correctly.
Prerequisites: Set ANTHROPIC_API_KEY in your environment or .env file, and ensure the anthropic SDK is installed (pip install anthropic).
13a: Status Reports Enrichment Availability
Prompt: "Check if RLM Navigator is running."
What to verify:
- Claude calls
get_status - The response includes
enrichment_available: true(visible in the daemon's raw JSON response) - If the API key or SDK is missing, it reports
enrichment_available: false— enrichment degrades gracefully without errors
13b: Progress Notifications During Navigation
Prompt: "Find and explain the authentication flow in this project." (or any broad question that triggers multi-file exploration)
What to verify:
- As the sub-agent system dispatches work,
rlm_progressemits[RLM]-prefixed messages:[RLM] Chunking <file>...— file is being split for analysis[RLM] Dispatching chunk 1/3 of <file> to rlm-enricher...— Haiku enrichment dispatched[RLM] chunk 1/3 complete — N relevant symbols found— enrichment finished
- After the workflow completes,
get_statusshows updated Sub-agent Activity:Dispatches: N (M chunk analysis, K enrichment)Chunks analyzed: X | Answers found: YLast: [RLM] <most recent progress message>
13c: Enriched Skeleton Annotations
Prompt: "Map <file_with_many_functions>." (pick a file with 5+ functions/methods)
What to verify:
- Claude calls
rlm_mapand the skeleton includes Haiku-generated annotations after line ranges:def validate_token(self, token: str) -> bool: # L25-30 # Checks if token starts with 'sk-' prefix. class AuthManager: # L1-80 # Manages JWT-based authentication lifecycle. - Each annotation is a concise 1-line semantic summary describing what the symbol does (not just its type)
- Without
ANTHROPIC_API_KEY, the samerlm_mapcall returns a clean skeleton with no annotations — no errors, no placeholders
13d: Enrichment Cache Behavior
Steps:
- Map a file: "Show me the skeleton of
<file>." — first call triggers Haiku enrichment (may take 1-2 seconds) - Map the same file again immediately
What to verify:
- The second call returns instantly with identical annotations (served from
EnrichmentCache) - Edit the file in a separate editor, then map again — annotations refresh because the cache invalidates on mtime change
13e: Background Enrichment Worker
Prompt: "Show me the project structure, then map the 3 largest Python files."
What to verify:
- The
EnrichmentWorkerprocesses files in the background — enrichment does not block therlm_mapresponse - On the first call to a file, the skeleton may return without annotations (enrichment still queued)
- A subsequent call to the same file shows annotations once the worker has processed it
get_statusreflects enrichment worker activity in the session stats
Troubleshooting
| Symptom | Check |
|---------|-------|
| get_status shows OFFLINE | Run npx rlm-navigator status to verify daemon. Restart with python daemon/rlm_daemon.py --root . |
| rlm_map returns fallback (first 20 lines) for a supported language | Verify tree-sitter is installed: pip install tree-sitter tree-sitter-python tree-sitter-javascript tree-sitter-typescript |
| Claude uses Read/cat instead of RLM tools | Check that the skill is loaded: verify .claude/skills/rlm-navigator/SKILL.md exists in your project |
| Stale data after file edits | Verify the file watcher is active: get_status should show the daemon watching the correct root |
| Port conflict on startup | Set a custom port: RLM_DAEMON_PORT=9200 python daemon/rlm_daemon.py --root . |
| Haiku enrichment not appearing | Verify ANTHROPIC_API_KEY is set and pip install anthropic is installed. Check get_status for enrichment_available: true |
| Enrichment annotations stale after edit | The EnrichmentCache invalidates on mtime change — re-map the file to trigger a fresh Haiku call |
Demo Script (5-Minute Walkthrough)
For a quick live demonstration, run these prompts in sequence:
- "Check if RLM Navigator is running." — verifies setup
- "Show me the project structure." — demonstrates
rlm_tree - "What's in
<main_file>?" — demonstratesrlm_map(skeleton vs full file) - "Show me the implementation of
<key_function>." — demonstratesrlm_drill - "Find all files that use
<key_symbol>." — demonstratesrlm_search - "Show me the session stats." — reveals token savings
Talking points at each step:
- Step 2: "Notice it shows structure and sizes without reading any file contents."
- Step 3: "This 400-line file was summarized in 25 lines — signatures and docstrings only."
- Step 4: "We loaded exactly 15 lines — just the function we needed."
- Step 5: "Found the symbol in 8 files without reading any of them."
- Step 6: "We served X tokens while avoiding Y tokens — that's a Z% reduction."
Inspired By
- brainqub3/claude_code_RLM — RLM for document navigation
- Tree-sitter — universal AST parsing
