@13w/local-rag
v1.6.1
Published
Distributed semantic memory + code RAG as an MCP plugin for Claude Code agents
Maintainers
Readme
local-rag — Distributed Memory + Code RAG for Claude Code
Semantic memory and code intelligence as an MCP plugin for Claude Code agents. 9 tools that give Claude persistent memory, semantic code search, and import graph traversal — all running locally.
What it does
| Tool | Description |
|------|-------------|
| recall(query) | Semantic search across stored memories |
| remember(content) | Store memory with type / scope / tags / importance |
| search_code(query) | Hybrid RAG over indexed codebase |
| get_file_context(file_path) | Read file + list indexed symbols |
| get_dependencies(file_path) | Import graph traversal (forward / reverse / transitive) |
| project_overview() | 3-level directory tree, entry points, top imports |
| forget(memory_id) | Delete a memory permanently |
| consolidate() | Merge semantically similar memories |
| stats() | Memory and index statistics |
Stack
- Qdrant — vector database (Rust, production-ready)
- Ollama — local embeddings (
embeddinggemma:300m) - tree-sitter — multi-language code parser (TypeScript, JavaScript, Go, Rust)
- MCP — Model Context Protocol (stdio transport)
Prerequisites
1. Ollama (local embeddings)
Install: https://ollama.com/download
# Linux
curl -fsSL https://ollama.com/install.sh | sh
# macOS — download the app from:
# https://ollama.com/download/mac
# Windows — download the installer from:
# https://ollama.com/download/windowsPull the embedding model:
ollama pull embeddinggemma:300m2. Qdrant (vector database)
Option A — Docker Compose (recommended)
A ready-to-use docker-compose.yml is included in this repo:
docker compose up -dExposes port 6333 (REST) and 6334 (gRPC). Data persists in a named volume qdrant-data.
Option B — Docker run
docker run -d --name qdrant \
-p 6333:6333 -p 6334:6334 \
-v qdrant-data:/qdrant/storage \
qdrant/qdrantOption C — Qdrant Cloud
https://cloud.qdrant.io/ — set qdrant-url in .memory.json to your cluster endpoint.
3. Node.js 18+
Installation
From npm (recommended):
npm install -g @13w/local-ragFrom source:
git clone https://github.com/13W/local-rag.git
cd local-rag
npm install && npm run buildClaude Code Plugin Setup
Install local-rag
Option A — claude mcp add with npx (no global install needed)
Per-project (stored in .mcp.json, shared with the team):
claude mcp add memory -- npx -y @13w/local-rag serve --config .memory.jsonGlobal — available in all projects on this machine:
claude mcp add memory -s user -- npx -y @13w/local-rag serve --config .memory.jsonOption B — .mcp.json directly
{
"mcpServers": {
"memory": {
"type": "stdio",
"command": "npx",
"args": ["-y", "@13w/local-rag", "serve", "--config", ".memory.json"]
}
}
}Option C — After global npm install -g
claude mcp add memory -- local-rag serve --config .memory.jsonInstall Serena (recommended companion)
Serena provides filesystem access and precise symbolic code editing that complements local-rag: local-rag finds code by meaning, Serena reads and edits it surgically.
Repo: https://github.com/oraios/serena
Requirements: Python 3.10+, uv
# Install uv
curl -LsSf https://astral.sh/uv/install.sh | sh
# Register Serena as a Claude Code plugin (per-project)
claude mcp add serena -- uvx --from serena serena-mcp-server --context ide-assistant --project .Or in .mcp.json:
{
"mcpServers": {
"serena": {
"type": "stdio",
"command": "uvx",
"args": ["--from", "serena", "serena-mcp-server", "--context", "ide-assistant", "--project", "."]
}
}
}Combined .mcp.json (both plugins)
{
"mcpServers": {
"memory": {
"type": "stdio",
"command": "npx",
"args": ["-y", "@13w/local-rag", "serve", "--config", ".memory.json"]
},
"serena": {
"type": "stdio",
"command": "uvx",
"args": ["--from", "serena", "serena-mcp-server", "--context", "ide-assistant", "--project", "."]
}
}
}Agent workflow setup
Run init once in your project root after registering the MCP plugin.
It installs hooks that enforce the recall → search → remember protocol on every session and prompt, and writes reference guides into .claude/rules/ so Claude always has the tool conventions at hand.
npx @13w/local-rag init
# If installed globally
local-rag initOutput:
wrote .claude/hooks/session-start.sh
wrote .claude/hooks/prompt-reminder.sh
wrote .claude/settings.json
wrote .claude/settings.local.json
wrote .claude/rules/continuous-remember.md
wrote .claude/rules/memory-protocol-reference.md
wrote .claude/rules/serena-conventions.mdWhat each file does:
| File | Purpose |
|------|---------|
| hooks/session-start.sh | Injects the full protocol cheatsheet as a system-reminder at every session start and after context compaction |
| hooks/prompt-reminder.sh | Fires on every user prompt — reminds Claude to recall() before acting and remember() after |
| rules/continuous-remember.md | When and how to call remember() immediately (trigger events, format, anti-patterns) |
| rules/memory-protocol-reference.md | Full tool reference with parameter tables and call examples |
| rules/serena-conventions.md | Serena vs Memory MCP routing guide and end-to-end editing workflow |
| settings.json | Registers the hooks in Claude Code (commit this) |
| settings.local.json | Local hook overrides — add to .gitignore |
Commit .claude/hooks/, .claude/rules/, and .claude/settings.json to share the workflow with your team.
Configuration
Create .memory.json in your project root (auto-discovered if present):
{
"project-id": "my-project",
"project-root": ".",
"qdrant-url": "http://localhost:6333",
"embed-provider": "ollama",
"embed-model": "embeddinggemma:300m",
"ollama-url": "http://localhost:11434"
}Full config reference
| Key | Default | Description |
|-----|---------|-------------|
| project-id | "default" | Isolates memories and code index per project |
| project-root | config file directory | Root path for code indexing |
| qdrant-url | http://localhost:6333 | Qdrant REST API URL |
| embed-provider | "ollama" | Embedding provider: ollama, openai, voyage |
| embed-model | provider default¹ | Embedding model name |
| embed-dim | 1024 | Embedding vector dimension |
| embed-api-key | "" | API key for OpenAI / Voyage embed providers — falls back to OPENAI_API_KEY / VOYAGE_API_KEY env var |
| embed-url | "" | Custom embedding API endpoint |
| ollama-url | http://localhost:11434 | Ollama API URL |
| agent-id | "default" | Agent identifier (for multi-agent setups) |
| llm-provider | "ollama" | LLM provider: ollama, anthropic, openai |
| llm-model | provider default² | LLM model for reranking / description generation |
| llm-api-key | "" | API key for Anthropic / OpenAI LLM providers — falls back to ANTHROPIC_API_KEY / OPENAI_API_KEY env var |
| llm-url | "" | Custom LLM API endpoint |
| include-paths | [] | Glob patterns to limit indexing scope (monorepos) |
| generate-descriptions | false | Auto-generate LLM descriptions for code chunks (slow) |
| dashboard | true | Enable the live dashboard HTTP server |
| dashboard-port | 0 | Dashboard HTTP port; 0 lets the OS pick a random port |
| collection-prefix | "" | String prepended to all Qdrant collection names (useful on shared Qdrant instances) |
| no-watch | false | Disable automatic file re-indexing when files change (applies during serve) |
¹
embed-modeldefaults:ollama→embeddinggemma:300m,openai→text-embedding-3-small,voyage→voyage-code-3²
llm-modeldefaults:ollama→gemma3n:e2b,anthropic→claude-haiku-4-5-20251001,openai→gpt-4o-miniResolution order (highest to lowest priority): CLI flag →
.memory.jsonvalue → environment variable → built-in default.API key environment variables are provider-specific: | Provider |
embed-api-keyenv var |llm-api-keyenv var | |----------|------------------------|-----------------------| |openai|OPENAI_API_KEY|OPENAI_API_KEY| |voyage|VOYAGE_API_KEY| — | |anthropic| — |ANTHROPIC_API_KEY|All other keys can also be passed as CLI flags (e.g.
--project-id foo). CLI flags override config file values.include-pathsis config-file only.
Indexing Your Codebase
Before search_code and get_file_context tools return results, index the project:
# Index once
npx @13w/local-rag index . --config .memory.json
# Watch mode — re-indexes on file changes
npx @13w/local-rag watch . --config .memory.json
# If installed globally
local-rag index . --config .memory.json
local-rag watch . --config .memory.jsonOther indexer commands:
local-rag clear --config .memory.json # remove all indexed chunks
local-rag stats --config .memory.json # show collection statistics
local-rag file <abs-path> <root> # index a single fileLive Dashboard
local-rag serve automatically opens a browser dashboard on a local HTTP port.
It displays real-time tool call statistics (calls, bytes, latency, errors per tool),
a scrolling request log, a server info bar (project, branch, version, watch status),
and an interactive tool playground for testing calls manually.
The port is OS-assigned by default (printed to stderr as [dashboard] http://localhost:PORT).
To use a fixed port or disable the dashboard:
{ "dashboard-port": 4242 }
{ "dashboard": false }Memory Types
| Type | Use for | Decay |
|------|---------|-------|
| episodic | Events, bugs, incidents | Time-decayed |
| semantic | Facts, architecture, decisions | Long-lived |
| procedural | Patterns, conventions, how-to | Long-lived |
Agent Protocol
Run local-rag init (see Agent workflow setup) to install the full
RECALL → SEARCH_CODE → THINK → ACT → REMEMBER protocol into your project.
The hooks fire automatically — no manual prompting required.
