openclaw-localmem

v5.1.0

Published

3 months ago

Local AI memory engine for OpenClaw — auto-capture, auto-recall, contradiction detection, 100% local, no cloud.

0High
0Medium
0Low

wbtiger

openclaw memory local ai plugin

🧠 LocalMem

Give your AI a memory that actually works.

Local-first AI memory plugin for OpenClaw. Your conversations build persistent memory. 100% local. Zero cloud. Zero GPU.

Quick Start · How It Works · Benchmark · Docs · 中文文档

Why LocalMem?

LLMs are stateless — every conversation starts from scratch. RAG systems bolt on retrieval, but they're dumb pipes: they don't know what to remember, when to forget, or how much context is too much.

LocalMem solves this with three interconnected systems:

🧠 Memory Management — A complete memory lifecycle

Not a key-value store. Memories are born, verified, consolidated, aged, and archived — like human memory.

Auto-capture: Extracts facts, preferences, and decisions from every conversation — quality-gated to skip noise
Contradiction detection: "I use VS Code" → 30 days later "I switched to Cursor" → old memory automatically superseded
Trust levels: Manual input = high trust. Auto-captured = medium. Web imports = low. Trust affects ranking
Consolidation: Related episodes auto-merge into refined facts over time
Auto-expiry: Temporary context expires after 7 days. Superseded memories purge after 30 days

🎯 Context Management — Every token on the attention budget counts

LLMs have finite attention. Anthropic calls it "context rot" — as tokens increase, recall accuracy degrades. LocalMem injects the minimal set of high-signal tokens:

Profile + JIT hybrid: Static user profile pre-loaded, relevant memories retrieved just-in-time per query
Three injection modes: full (profile + memories), light (profile only), query (relevant memories only)
Budget-aware truncation: Stays within configurable limits to preserve attention quality

👁️ Adaptive Attention — Smart injection, not dumb top-k

Traditional memory systems retrieve a fixed top-k and call it a day. LocalMem adapts what, how many, and which memories to inject based on conversation dynamics:

Multi-turn awareness: Uses recent conversation history as retrieval signal — short replies like "OK" don't derail recall
Dynamic threshold: Adapts to result quality — strong matches get strict filtering, weak matches get lenient. Inspired by sparse attention mechanisms
Adaptive count: Injects 2 when only 2 are relevant, not 5 with 3 pieces of noise
Recency decay: Fresh memories naturally rank higher, old ones don't disappear
Repeat suppression: Each turn surfaces different context — no redundant injections

🔌 Harness-Agnostic — Pure plugin, zero host modification

Operates entirely through OpenClaw's plugin SDK hooks. No source changes to OpenClaw. Survives host upgrades, works with any model provider, and doubles as an MCP Server for Claude Desktop, Cursor, and other clients.

How It Works

User message
  ↓
┌─ RECALL ────────────────────────────────────────────────┐
│ Multi-turn query (recent conversation context)          │
│   → Hybrid search (semantic embeddings + TF-IDF)        │
│     → Optional cross-encoder reranker                   │
│       → Recency boost → Dynamic threshold → Dedup       │
│         → Inject profile + adaptive-k memories          │
└─────────────────────────────────────────────────────────┘
  ↓
AI responds with memory context
  ↓
┌─ CAPTURE ───────────────────────────────────────────────┐
│ LLM extracts facts → Quality gate → Dedup              │
│   → Contradiction detection → Trust tagging → Store     │
└─────────────────────────────────────────────────────────┘
  ↓
┌─ STORAGE ───────────────────────────────────────────────┐
│ SQLite (~/.localmem/data/)                              │
│ memories │ relations │ documents │ profile │ TF-IDF     │
└─────────────────────────────────────────────────────────┘

Quick Start

Requirements: OpenClaw >= 2026.1.29, Python 3.9+

# One-line install (recommended)
openclaw plugins install github:wbavon/openclaw-localmem
openclaw gateway restart

git clone https://github.com/wbavon/openclaw-localmem.git
cd openclaw-localmem && npm install
openclaw plugins install --link .
openclaw gateway restart

The installer automatically handles all dependencies (numpy, jieba, scikit-learn, fastembed), registers the plugin, and disables conflicting memory plugins.

Optional — configure an LLM API key for auto-capture:

export LOCALMEM_API_KEY="your-api-key"

Optional — enable PDF import:

pip3 install PyMuPDF

That's it. Chat normally. LocalMem remembers what matters.

What It Does

Day 1: "I use VS Code for everything"
  → stores: [preference] User uses VS Code as primary editor

Day 30: "I switched to Cursor last week"
  → stores: [preference] User switched to Cursor
  → detects contradiction → marks old memory as superseded
  → next conversation, AI knows you use Cursor

No manual effort. No commands to run. Just chat.

Benchmark

Tested on LongMemEval — 500 multi-session questions requiring long-term memory retrieval:

| System | R@5 | R@1 | R@10 | NDCG@10 | |--------|-----|-----|------|---------| | LocalMem (reranked hybrid) | 98.4% | 90.8% | 99.4% | 0.946 | | LocalMem (reranked raw) | 97.6% | 92.4% | 98.6% | 0.953 | | LocalMem (hybrid, no reranker) | 98.2% | 88.6% | 98.8% | 0.934 | | MemPalace hybrid_v4 (reference) | 98.4% | — | — | 0.889 | | LocalMem TF-IDF only | 95.4% | — | 97.0% | 0.904 |

| Category | R@5 | n | |----------|-----|---| | knowledge-update | 100% | 78 | | multi-session | 100% | 133 | | single-session-user | 100% | 70 | | temporal-reasoning | 96.2% | 133 | | single-session-assistant | 96.4% | 56 | | single-session-preference | 83.3% | 30 |

Key Capabilities

| Area | What it does | |------|-------------| | Retrieval | Hybrid semantic + keyword search, optional cross-encoder reranker, trust-boosted ranking | | Memory types | Facts, preferences, episodes, procedural knowledge, derived insights | | Lifecycle | Auto-expiry, consolidation, drift detection, archive & purge | | Knowledge base | Import text, markdown, URLs, PDFs — searchable alongside memories | | AI tools | 5 tools for the AI to directly store, search, forget, and query memories | | MCP Server | Standard MCP protocol — works with Claude Desktop, Cursor, Windsurf, VS Code | | User profile | Auto-generated summary of stable facts and recent context | | Privacy | 100% local. SQLite single file. Data never leaves your machine |

CLI

# Store & search
python3 engine/engine.py store "Prefers dark mode" --type preference
python3 engine/engine.py recall "user preferences" --limit 10
python3 engine/engine.py search "deep learning"               # hybrid search (memories + docs)

# Context injection (used internally by plugin, useful for debugging)
python3 engine/engine.py context "current topic"
python3 engine/engine.py context "current topic" --mode light  # profile only

# Documents
python3 engine/engine.py doc-add --file paper.pdf --title "Research Paper"
python3 engine/engine.py doc-add --url https://example.com --title "Web Page"
python3 engine/engine.py doc-list

# Maintenance
python3 engine/engine.py stats
python3 engine/engine.py maintain --profile --purge
python3 engine/engine.py rebuild                  # rebuild all embeddings
python3 engine/engine.py export --file backup.json

Full command reference → docs/manual/cli.md

MCP Server

Compatible with Claude Desktop, Cursor, Windsurf, VS Code, and any MCP client:

{
  "mcpServers": {
    "localmem": {
      "command": "python3",
      "args": ["~/.openclaw/extensions/localmem/mcp_server.py"],
      "env": { "LOCALMEM_API_KEY": "your-api-key" }
    }
  }
}

Configuration

The plugin works out of the box. For customization, see the full configuration reference:

→ Configuration Guide — all plugin options, environment variables, model selection, and tuning parameters.

Key options:

| Option | Default | Description | |--------|---------|-------------| | autoRecall | true | Inject memories before each AI turn | | autoCapture | true | Extract memories after each turn | | embeddingBackend | fastembed | fastembed (local), api (cloud), tfidf (keyword only) | | rerankerEnabled | false | Cross-encoder reranker (precision +0.2pp, latency +0.3s) | | debug | false | Verbose logging |

Any OpenAI-compatible LLM works for extraction: Anthropic, OpenAI, DeepSeek, Kimi, GLM, OpenRouter, etc.

Documentation

Quick Start — get running in 2 minutes
Configuration — all options and environment variables
How It Works — architecture and data flow
Retrieval Engine — hybrid fusion, reranker, attention mechanisms
Memory Types — facts, preferences, episodes, procedural, derived
CLI Reference — full command documentation
Benchmark — LongMemEval methodology and results
Changelog — version history

License

MIT