@13w/local-rag

v1.6.1

Published

21 hours ago

Distributed semantic memory + code RAG as an MCP plugin for Claude Code agents

0High
0Medium
0Low

13w

mcp claude claude-code rag memory vector-search qdrant ollama code-search

local-rag — Distributed Memory + Code RAG for Claude Code

Semantic memory and code intelligence as an MCP plugin for Claude Code agents. 9 tools that give Claude persistent memory, semantic code search, and import graph traversal — all running locally.

What it does

| Tool | Description | |------|-------------| | recall(query) | Semantic search across stored memories | | remember(content) | Store memory with type / scope / tags / importance | | search_code(query) | Hybrid RAG over indexed codebase | | get_file_context(file_path) | Read file + list indexed symbols | | get_dependencies(file_path) | Import graph traversal (forward / reverse / transitive) | | project_overview() | 3-level directory tree, entry points, top imports | | forget(memory_id) | Delete a memory permanently | | consolidate() | Merge semantically similar memories | | stats() | Memory and index statistics |

Stack

Qdrant — vector database (Rust, production-ready)
Ollama — local embeddings (embeddinggemma:300m)
tree-sitter — multi-language code parser (TypeScript, JavaScript, Go, Rust)
MCP — Model Context Protocol (stdio transport)

Prerequisites

1. Ollama (local embeddings)

Install: https://ollama.com/download

# Linux
curl -fsSL https://ollama.com/install.sh | sh

# macOS — download the app from:
# https://ollama.com/download/mac

# Windows — download the installer from:
# https://ollama.com/download/windows

Pull the embedding model:

ollama pull embeddinggemma:300m

2. Qdrant (vector database)

Option A — Docker Compose (recommended)

A ready-to-use docker-compose.yml is included in this repo:

docker compose up -d

Exposes port 6333 (REST) and 6334 (gRPC). Data persists in a named volume qdrant-data.

Option B — Docker run

docker run -d --name qdrant \
  -p 6333:6333 -p 6334:6334 \
  -v qdrant-data:/qdrant/storage \
  qdrant/qdrant

Option C — Qdrant Cloud

https://cloud.qdrant.io/ — set qdrant-url in .memory.json to your cluster endpoint.

3. Node.js 18+

https://nodejs.org/

Installation

From npm (recommended):

npm install -g @13w/local-rag

From source:

git clone https://github.com/13W/local-rag.git
cd local-rag
npm install && npm run build

Claude Code Plugin Setup

Install local-rag

Option A — claude mcp add with npx (no global install needed)

Per-project (stored in .mcp.json, shared with the team):

claude mcp add memory -- npx -y @13w/local-rag serve --config .memory.json

Global — available in all projects on this machine:

claude mcp add memory -s user -- npx -y @13w/local-rag serve --config .memory.json

Option B — .mcp.json directly

{
  "mcpServers": {
    "memory": {
      "type": "stdio",
      "command": "npx",
      "args": ["-y", "@13w/local-rag", "serve", "--config", ".memory.json"]
    }
  }
}

Option C — After global npm install -g

claude mcp add memory -- local-rag serve --config .memory.json

Install Serena (recommended companion)

Serena provides filesystem access and precise symbolic code editing that complements local-rag: local-rag finds code by meaning, Serena reads and edits it surgically.

Repo: https://github.com/oraios/serena

Requirements: Python 3.10+, uv

# Install uv
curl -LsSf https://astral.sh/uv/install.sh | sh

# Register Serena as a Claude Code plugin (per-project)
claude mcp add serena -- uvx --from serena serena-mcp-server --context ide-assistant --project .

Or in .mcp.json:

{
  "mcpServers": {
    "serena": {
      "type": "stdio",
      "command": "uvx",
      "args": ["--from", "serena", "serena-mcp-server", "--context", "ide-assistant", "--project", "."]
    }
  }
}

Combined `.mcp.json` (both plugins)

{
  "mcpServers": {
    "memory": {
      "type": "stdio",
      "command": "npx",
      "args": ["-y", "@13w/local-rag", "serve", "--config", ".memory.json"]
    },
    "serena": {
      "type": "stdio",
      "command": "uvx",
      "args": ["--from", "serena", "serena-mcp-server", "--context", "ide-assistant", "--project", "."]
    }
  }
}

Agent workflow setup

Run init once in your project root after registering the MCP plugin. It installs hooks that enforce the recall → search → remember protocol on every session and prompt, and writes reference guides into .claude/rules/ so Claude always has the tool conventions at hand.

npx @13w/local-rag init

# If installed globally
local-rag init

Output:

wrote  .claude/hooks/session-start.sh
wrote  .claude/hooks/prompt-reminder.sh
wrote  .claude/settings.json
wrote  .claude/settings.local.json
wrote  .claude/rules/continuous-remember.md
wrote  .claude/rules/memory-protocol-reference.md
wrote  .claude/rules/serena-conventions.md

What each file does:

| File | Purpose | |------|---------| | hooks/session-start.sh | Injects the full protocol cheatsheet as a system-reminder at every session start and after context compaction | | hooks/prompt-reminder.sh | Fires on every user prompt — reminds Claude to recall() before acting and remember() after | | rules/continuous-remember.md | When and how to call remember() immediately (trigger events, format, anti-patterns) | | rules/memory-protocol-reference.md | Full tool reference with parameter tables and call examples | | rules/serena-conventions.md | Serena vs Memory MCP routing guide and end-to-end editing workflow | | settings.json | Registers the hooks in Claude Code (commit this) | | settings.local.json | Local hook overrides — add to .gitignore |

Commit .claude/hooks/, .claude/rules/, and .claude/settings.json to share the workflow with your team.

Configuration

Create .memory.json in your project root (auto-discovered if present):

{
  "project-id": "my-project",
  "project-root": ".",
  "qdrant-url": "http://localhost:6333",
  "embed-provider": "ollama",
  "embed-model": "embeddinggemma:300m",
  "ollama-url": "http://localhost:11434"
}

Full config reference

| Key | Default | Description | |-----|---------|-------------| | project-id | "default" | Isolates memories and code index per project | | project-root | config file directory | Root path for code indexing | | qdrant-url | http://localhost:6333 | Qdrant REST API URL | | embed-provider | "ollama" | Embedding provider: ollama, openai, voyage | | embed-model | provider default¹ | Embedding model name | | embed-dim | 1024 | Embedding vector dimension | | embed-api-key | "" | API key for OpenAI / Voyage embed providers — falls back to OPENAI_API_KEY / VOYAGE_API_KEY env var | | embed-url | "" | Custom embedding API endpoint | | ollama-url | http://localhost:11434 | Ollama API URL | | agent-id | "default" | Agent identifier (for multi-agent setups) | | llm-provider | "ollama" | LLM provider: ollama, anthropic, openai | | llm-model | provider default² | LLM model for reranking / description generation | | llm-api-key | "" | API key for Anthropic / OpenAI LLM providers — falls back to ANTHROPIC_API_KEY / OPENAI_API_KEY env var | | llm-url | "" | Custom LLM API endpoint | | include-paths | [] | Glob patterns to limit indexing scope (monorepos) | | generate-descriptions | false | Auto-generate LLM descriptions for code chunks (slow) | | dashboard | true | Enable the live dashboard HTTP server | | dashboard-port | 0 | Dashboard HTTP port; 0 lets the OS pick a random port | | collection-prefix | "" | String prepended to all Qdrant collection names (useful on shared Qdrant instances) | | no-watch | false | Disable automatic file re-indexing when files change (applies during serve) |

¹ embed-model defaults: ollama → embeddinggemma:300m, openai → text-embedding-3-small, voyage → voyage-code-3
² llm-model defaults: ollama → gemma3n:e2b, anthropic → claude-haiku-4-5-20251001, openai → gpt-4o-mini
Resolution order (highest to lowest priority): CLI flag → .memory.json value → environment variable → built-in default.
API key environment variables are provider-specific: | Provider | embed-api-key env var | llm-api-key env var | |----------|------------------------|-----------------------| | openai | OPENAI_API_KEY | OPENAI_API_KEY | | voyage | VOYAGE_API_KEY | — | | anthropic | — | ANTHROPIC_API_KEY |
All other keys can also be passed as CLI flags (e.g. --project-id foo). CLI flags override config file values. include-paths is config-file only.

Indexing Your Codebase

Before search_code and get_file_context tools return results, index the project:

# Index once
npx @13w/local-rag index . --config .memory.json

# Watch mode — re-indexes on file changes
npx @13w/local-rag watch . --config .memory.json

# If installed globally
local-rag index . --config .memory.json
local-rag watch . --config .memory.json

Other indexer commands:

local-rag clear --config .memory.json    # remove all indexed chunks
local-rag stats --config .memory.json    # show collection statistics
local-rag file <abs-path> <root>         # index a single file

Live Dashboard

local-rag serve automatically opens a browser dashboard on a local HTTP port. It displays real-time tool call statistics (calls, bytes, latency, errors per tool), a scrolling request log, a server info bar (project, branch, version, watch status), and an interactive tool playground for testing calls manually.

The port is OS-assigned by default (printed to stderr as [dashboard] http://localhost:PORT). To use a fixed port or disable the dashboard:

{ "dashboard-port": 4242 }
{ "dashboard": false }

Memory Types

| Type | Use for | Decay | |------|---------|-------| | episodic | Events, bugs, incidents | Time-decayed | | semantic | Facts, architecture, decisions | Long-lived | | procedural | Patterns, conventions, how-to | Long-lived |

Agent Protocol

Run local-rag init (see Agent workflow setup) to install the full RECALL → SEARCH_CODE → THINK → ACT → REMEMBER protocol into your project. The hooks fire automatically — no manual prompting required.