memory-safe
v0.1.0
Published
MCP memory server for Claude Code — knowledge graph in SQLite, local ONNX embeddings, token-budgeted retrieval
Maintainers
Readme
memory-safe
MCP memory server for Claude Code. Knowledge graph in SQLite, local ONNX embeddings, token-budgeted retrieval.
Everything runs on your machine. No API keys, no network calls, no data leaves your system.
Why
Claude Code forgets everything between sessions. memory-safe gives it persistent memory that actually works:
- Semantic search — ask in natural language, get relevant results (94% Recall@1)
- Token budget — never blows up your context window (100% budget adherence)
- Knowledge graph — entities and relationships, not just flat text
- Memory decay — old memories compress automatically, no manual cleanup
- 1,019 tokens overhead — half the cost of
@modelcontextprotocol/server-memory(2,045 tokens)
Install
npm install memory-safeRequires Node.js 20+.
Setup with Claude Code
Add to your Claude Code MCP configuration (~/.claude/claude_desktop_config.json):
{
"mcpServers": {
"memory-safe": {
"command": "npx",
"args": ["-y", "memory-safe"]
}
}
}Or with a custom database path:
{
"mcpServers": {
"memory-safe": {
"command": "npx",
"args": ["-y", "memory-safe", "--db=/path/to/my/memory.db"]
}
}
}Default database location: ~/.memory-safe/memory.db
Tools
memory-safe exposes 11 MCP tools. Total overhead: ~1,019 tokens.
Core
| Tool | Description |
|------|-------------|
| remember | Store a memory with optional tags and entity links |
| recall | Semantic search with token budget. Returns most relevant memories within budget |
| forget | Delete or archive a memory by ID |
Knowledge Graph
| Tool | Description |
|------|-------------|
| relate_entities | Create typed relationships between entities (e.g. React --uses--> JSX) |
| query_graph | BFS traversal from an entity (depth 2, max 30 results) |
System
| Tool | Description |
|------|-------------|
| status | Memory count, DB size, entity/relation counts, embedding readiness |
| configure | Get/set token_budget, decay_threshold_days, safety_multiplier |
| decay_now | Trigger memory decay/compression cycle |
| export_memories | Export all data as JSON |
| import_memories | Import from JSON export |
| healthcheck | Version, DB path, model state, diagnostics |
How It Works
Retrieval
Recall uses a multi-signal scoring formula:
score = 0.60 * similarity # cosine similarity via ONNX embeddings
+ 0.20 * recency # days since last access (0-30 day window)
+ 0.10 * frequency # access count (normalized)
+ 0.10 * connectivity # entity link count in knowledge graph
- decay_penalty # 0.1 per decay levelResults are packed greedily into the token budget with a 0.85 safety multiplier. 15% of the budget is reserved for knowledge graph context.
Embeddings
Model: Xenova/all-MiniLM-L6-v2 (384 dimensions), running locally via ONNX through @huggingface/transformers. Lazily loaded on first use — tools like status and configure respond instantly without waiting for model download.
If the model fails to load, recall falls back to keyword similarity.
Memory Decay
Automatic compression based on access patterns. No LLM calls — pure extractive summarization:
| Level | Trigger | Action | |-------|---------|--------| | 0 → 1 | Low access, low importance | Keep top 30% of sentences by score | | 1 → 2 | Continued low access | First sentence + entity names only | | 2 → 3 | Long-term neglect | Archive (excluded from all recall) |
Decay pressure is compared against an importance score (access frequency + entity links + recency). Only memories with significantly more pressure than importance get compressed.
Token Budget
Every recall respects the token budget (default: 2,000 tokens):
- Nominal budget * 0.85 safety multiplier = effective budget
- 85% for memory content, 15% reserved for graph context
- Greedy packing: highest-scored memories first, stop when budget exhausted
- Always returns at least one memory
Benchmarks
Solo Performance
50 diverse test memories, 10 queries, token budget 2,000:
| Metric | Result | |--------|--------| | Top-1 relevance | 100% (10/10) | | Budget utilization | 85% | | Budget respected | 100% | | Avg latency | 26ms | | Avg tokens/memory | 34 |
Comparison vs @modelcontextprotocol/server-memory
50 memories, 20 queries x 5 runs. Two query types: Semantic (natural language) and Keyword (exact substring):
| Metric | memory-safe | server-memory | |--------|-------------|---------------| | Semantic Recall@1 | 94% | 0% | | Semantic Recall@3 | 100% | 0% | | Keyword Recall@1 | 75% | 95% | | Keyword Recall@3 | 80% | 100% | | Tool overhead | 1,019 tok | 2,045 tok | | Cold start | 208ms | 1,438ms | | Token budget control | 100% | N/A | | Latency p50 | 8ms | 5ms |
Key takeaway: memory-safe finds the right memory 94% of the time from natural language queries. server-memory requires exact keyword matches — it returns empty results for natural language queries because it uses substring search, not semantic search.
server-memory is faster on keyword lookups because it does simple string matching. memory-safe's ONNX embedding model adds ~3ms overhead per query but enables understanding what you mean, not just what you typed.
Run Benchmarks
# Build first
npm run build
# Solo benchmark
npm run bench
# Multi-server comparison (all formats)
npm run bench:compare
npm run bench:compare -- --format=markdown
npm run bench:compare -- --format=json
# Single server, fewer runs
npm run bench:compare -- --servers=memory-safe --runs=1Architecture
src/
index.ts CLI entry point (--db=, --telemetry)
server.ts 11 MCP tool registrations
db/
connection.ts sql.js WASM singleton, debounced persistence
schema.ts 5 tables, migrations, timestamp helpers
memory/
store.ts Store + auto-extract entities from content
retrieve.ts Token-budgeted semantic recall (6-step pipeline)
decay.ts Extractive summarization decay engine
embeddings/
onnx.ts Lazy ONNX pipeline (all-MiniLM-L6-v2, 384 dims)
graph/
entities.ts Entity CRUD (upsert, search, link)
relations.ts Relation CRUD (typed, weighted, bidirectional)
query.ts BFS traversal (depth 2, max 30 nodes)
tokenizer/
counter.ts cl100k_base token counting + truncation
utils/
config.ts Key-value config store
serialization.ts Float32Array ↔ Buffer, cosine similarity
telemetry.ts Opt-in local telemetry (no network)
bench/
corpus.ts 50 test memories + 3 query sets
suite.ts Solo benchmark (precision, tokens, latency)
comparison.ts Multi-server MCP comparison runner
results.ts ASCII / Markdown / JSON formatters
adapters/
base.ts StdioClientTransport lifecycle
memory-safe.ts Adapter for memory-safe
mcp-server-memory.ts Adapter for @modelcontextprotocol/server-memoryDatabase
sql.js (WASM) — no native dependencies, works everywhere. Trade-off: ~2-3x slower than better-sqlite3, but zero compilation issues.
Tables: memories, entities, relations, memory_entities (junction), config.
Persistence: in-memory WASM DB exported to disk via debounced writes (1s after last mutation).
Known sql.js Limitations
- No
datetime('now')in DEFAULT — timestamps generated in code - No
json_patch()— manual JSON merge in code - No FTS5 — LIKE queries for keyword search
Configuration
| Key | Default | Description |
|-----|---------|-------------|
| token_budget | 2000 | Max tokens per recall response |
| decay_threshold_days | 7 | Days of inactivity before decay pressure builds |
| safety_multiplier | 0.85 | Budget undershoot factor (0.85 = target 85% of nominal) |
Set via the configure tool or directly in the config table.
CLI Options
memory-safe [options]
--db=/path/to/db.db Custom database path (default: ~/.memory-safe/memory.db)
--telemetry Enable local telemetry logging (~/.memory-safe/telemetry.log)Development
# Install dependencies
npm install
# Build
npm run build
# Watch mode
npm run dev
# Run tests
npm test
# Benchmarks
npm run bench # Solo
npm run bench:compare # Multi-server comparison
npm run bench:overhead # Tool token overhead measurementLicense
MIT
