openclaw-localmem
v5.1.0
Published
Local AI memory engine for OpenClaw — auto-capture, auto-recall, contradiction detection, 100% local, no cloud.
Readme
🧠 LocalMem
Give your AI a memory that actually works.
Local-first AI memory plugin for OpenClaw. Your conversations build persistent memory. 100% local. Zero cloud. Zero GPU.
Quick Start · How It Works · Benchmark · Docs · 中文文档
Why LocalMem?
LLMs are stateless — every conversation starts from scratch. RAG systems bolt on retrieval, but they're dumb pipes: they don't know what to remember, when to forget, or how much context is too much.
LocalMem solves this with three interconnected systems:
🧠 Memory Management — A complete memory lifecycle
Not a key-value store. Memories are born, verified, consolidated, aged, and archived — like human memory.
- Auto-capture: Extracts facts, preferences, and decisions from every conversation — quality-gated to skip noise
- Contradiction detection: "I use VS Code" → 30 days later "I switched to Cursor" → old memory automatically superseded
- Trust levels: Manual input = high trust. Auto-captured = medium. Web imports = low. Trust affects ranking
- Consolidation: Related episodes auto-merge into refined facts over time
- Auto-expiry: Temporary context expires after 7 days. Superseded memories purge after 30 days
🎯 Context Management — Every token on the attention budget counts
LLMs have finite attention. Anthropic calls it "context rot" — as tokens increase, recall accuracy degrades. LocalMem injects the minimal set of high-signal tokens:
- Profile + JIT hybrid: Static user profile pre-loaded, relevant memories retrieved just-in-time per query
- Three injection modes:
full(profile + memories),light(profile only),query(relevant memories only) - Budget-aware truncation: Stays within configurable limits to preserve attention quality
👁️ Adaptive Attention — Smart injection, not dumb top-k
Traditional memory systems retrieve a fixed top-k and call it a day. LocalMem adapts what, how many, and which memories to inject based on conversation dynamics:
- Multi-turn awareness: Uses recent conversation history as retrieval signal — short replies like "OK" don't derail recall
- Dynamic threshold: Adapts to result quality — strong matches get strict filtering, weak matches get lenient. Inspired by sparse attention mechanisms
- Adaptive count: Injects 2 when only 2 are relevant, not 5 with 3 pieces of noise
- Recency decay: Fresh memories naturally rank higher, old ones don't disappear
- Repeat suppression: Each turn surfaces different context — no redundant injections
🔌 Harness-Agnostic — Pure plugin, zero host modification
Operates entirely through OpenClaw's plugin SDK hooks. No source changes to OpenClaw. Survives host upgrades, works with any model provider, and doubles as an MCP Server for Claude Desktop, Cursor, and other clients.
How It Works
User message
↓
┌─ RECALL ────────────────────────────────────────────────┐
│ Multi-turn query (recent conversation context) │
│ → Hybrid search (semantic embeddings + TF-IDF) │
│ → Optional cross-encoder reranker │
│ → Recency boost → Dynamic threshold → Dedup │
│ → Inject profile + adaptive-k memories │
└─────────────────────────────────────────────────────────┘
↓
AI responds with memory context
↓
┌─ CAPTURE ───────────────────────────────────────────────┐
│ LLM extracts facts → Quality gate → Dedup │
│ → Contradiction detection → Trust tagging → Store │
└─────────────────────────────────────────────────────────┘
↓
┌─ STORAGE ───────────────────────────────────────────────┐
│ SQLite (~/.localmem/data/) │
│ memories │ relations │ documents │ profile │ TF-IDF │
└─────────────────────────────────────────────────────────┘Quick Start
Requirements: OpenClaw >= 2026.1.29, Python 3.9+
# One-line install (recommended)
openclaw plugins install github:wbavon/openclaw-localmem
openclaw gateway restartgit clone https://github.com/wbavon/openclaw-localmem.git
cd openclaw-localmem && npm install
openclaw plugins install --link .
openclaw gateway restartThe installer automatically handles all dependencies (numpy, jieba, scikit-learn, fastembed), registers the plugin, and disables conflicting memory plugins.
Optional — configure an LLM API key for auto-capture:
export LOCALMEM_API_KEY="your-api-key"Optional — enable PDF import:
pip3 install PyMuPDFThat's it. Chat normally. LocalMem remembers what matters.
What It Does
Day 1: "I use VS Code for everything"
→ stores: [preference] User uses VS Code as primary editor
Day 30: "I switched to Cursor last week"
→ stores: [preference] User switched to Cursor
→ detects contradiction → marks old memory as superseded
→ next conversation, AI knows you use CursorNo manual effort. No commands to run. Just chat.
Benchmark
Tested on LongMemEval — 500 multi-session questions requiring long-term memory retrieval:
| System | R@5 | R@1 | R@10 | NDCG@10 | |--------|-----|-----|------|---------| | LocalMem (reranked hybrid) | 98.4% | 90.8% | 99.4% | 0.946 | | LocalMem (reranked raw) | 97.6% | 92.4% | 98.6% | 0.953 | | LocalMem (hybrid, no reranker) | 98.2% | 88.6% | 98.8% | 0.934 | | MemPalace hybrid_v4 (reference) | 98.4% | — | — | 0.889 | | LocalMem TF-IDF only | 95.4% | — | 97.0% | 0.904 |
| Category | R@5 | n | |----------|-----|---| | knowledge-update | 100% | 78 | | multi-session | 100% | 133 | | single-session-user | 100% | 70 | | temporal-reasoning | 96.2% | 133 | | single-session-assistant | 96.4% | 56 | | single-session-preference | 83.3% | 30 |
Key Capabilities
| Area | What it does | |------|-------------| | Retrieval | Hybrid semantic + keyword search, optional cross-encoder reranker, trust-boosted ranking | | Memory types | Facts, preferences, episodes, procedural knowledge, derived insights | | Lifecycle | Auto-expiry, consolidation, drift detection, archive & purge | | Knowledge base | Import text, markdown, URLs, PDFs — searchable alongside memories | | AI tools | 5 tools for the AI to directly store, search, forget, and query memories | | MCP Server | Standard MCP protocol — works with Claude Desktop, Cursor, Windsurf, VS Code | | User profile | Auto-generated summary of stable facts and recent context | | Privacy | 100% local. SQLite single file. Data never leaves your machine |
CLI
# Store & search
python3 engine/engine.py store "Prefers dark mode" --type preference
python3 engine/engine.py recall "user preferences" --limit 10
python3 engine/engine.py search "deep learning" # hybrid search (memories + docs)
# Context injection (used internally by plugin, useful for debugging)
python3 engine/engine.py context "current topic"
python3 engine/engine.py context "current topic" --mode light # profile only
# Documents
python3 engine/engine.py doc-add --file paper.pdf --title "Research Paper"
python3 engine/engine.py doc-add --url https://example.com --title "Web Page"
python3 engine/engine.py doc-list
# Maintenance
python3 engine/engine.py stats
python3 engine/engine.py maintain --profile --purge
python3 engine/engine.py rebuild # rebuild all embeddings
python3 engine/engine.py export --file backup.jsonFull command reference → docs/manual/cli.md
MCP Server
Compatible with Claude Desktop, Cursor, Windsurf, VS Code, and any MCP client:
{
"mcpServers": {
"localmem": {
"command": "python3",
"args": ["~/.openclaw/extensions/localmem/mcp_server.py"],
"env": { "LOCALMEM_API_KEY": "your-api-key" }
}
}
}Configuration
The plugin works out of the box. For customization, see the full configuration reference:
→ Configuration Guide — all plugin options, environment variables, model selection, and tuning parameters.
Key options:
| Option | Default | Description |
|--------|---------|-------------|
| autoRecall | true | Inject memories before each AI turn |
| autoCapture | true | Extract memories after each turn |
| embeddingBackend | fastembed | fastembed (local), api (cloud), tfidf (keyword only) |
| rerankerEnabled | false | Cross-encoder reranker (precision +0.2pp, latency +0.3s) |
| debug | false | Verbose logging |
Any OpenAI-compatible LLM works for extraction: Anthropic, OpenAI, DeepSeek, Kimi, GLM, OpenRouter, etc.
Documentation
- Quick Start — get running in 2 minutes
- Configuration — all options and environment variables
- How It Works — architecture and data flow
- Retrieval Engine — hybrid fusion, reranker, attention mechanisms
- Memory Types — facts, preferences, episodes, procedural, derived
- CLI Reference — full command documentation
- Benchmark — LongMemEval methodology and results
- Changelog — version history
License
MIT
