@kavinbmittal/lia-memory-engine
v1.0.0
Published
Lia-style context engine for OpenClaw — structured compaction, auto-flush, auto-retrieval
Maintainers
Readme
Lia Memory Engine
Lia Memory Engine gives OpenClaw agents the kind of memory that actually works in practice — decisions made three sessions ago surface automatically, context doesn’t silently disappear when conversations get long, and nothing is ever lost.
Two Parts
1. Compaction Upgrade: Structured Memory
OpenClaw’s built-in compaction throws away a lot once the context gets full so your agents sometimes run around like headless chickens. Lia’s Memory Engine brings a meaningful upgrade that solves this problem in a thoughtful and simple way:
- When the context window is genuinely near capacity (default 80%), the engine compresses the older half of messages into a summary that explicitly preserves decisions, commitments, open questions, Q&A pairs, and preferences
- Token usage is estimated from the live conversation snapshot passed by OpenClaw each turn, so the threshold check is always accurate and compaction only fires when the context is actually full
- This structured summarization is generated via Claude Haiku and is kept in context. You keep everything that matters
- The engine introduces auto-flush, where every message is written to disk immediately so you also have a full transcript to search against, so nothing is truly gone even after compaction
2. QMD Memory Retrieval
QMD is a framework built by Tobi Lütke. QMD isn’t a search box, it’s a full retrieval pipeline:
- BM25 keyword search
- Vector semantic search and
- LLM reranking running together on-device The impact of this in practice is quite significant: basic search finds the memory that matches your words, QMD finds the memory that matches your intent. A query like “auth decision” will surface the conversation where you chose Supabase Auth because of RLS — not just any file that mentions authentication.
Combined together you have a powerful upgrade where OpenClaw never forgets anything. This is the engine powering Lia, the world’s first AI Chief of Staff.
How it works
Key design principle: OpenClaw owns the conversation. The engine never stores or replaces OpenClaw’s messages. OpenClaw loads conversations from its JSONL session files and passes them to the engine on every turn. The engine reads those messages but never maintains its own copy — it uses a lightweight counter to track what’s been flushed to transcript.
Compaction via Haiku — when context genuinely reaches the threshold (default 80%), the engine takes OpenClaw’s messages, splits at the midpoint, summarizes the older half, and returns the compacted result. OpenClaw replaces its messages with the compacted version. Preserves Q&A pairs, decisions, commitments, open questions, preferences, and emotional context.
Auto-flush every turn — after each turn, the engine identifies new messages (using a counter, not by diffing arrays) and writes them to
memory/daily/YYYY-MM-DD.md. Nothing is ever lost.Auto-retrieval — before every model run, QMD runs a hybrid search (BM25 + vector + LLM reranking) using the last user message as the query. Relevant past context is injected silently into the system prompt. 500ms timeout so it never blocks.
After each message is written to the transcript, the index is updated in the background. This means messages are searchable immediately within the same session — not just from the next session onward.
memory_searchtool — agents can explicitly search conversation history. Uses full hybrid search with HyDE reranking for maximum quality.
The plugin connects to a local QMD HTTP daemon at localhost:8181. On bootstrap, it checks if the daemon is running — if not, it spawns qmd mcp --http --daemon in the background. The daemon stays alive between sessions, keeping embedding models warm in memory. If the daemon isn’t available (QMD not installed, model not downloaded yet), the plugin falls back to QMD’s CLI BM25 search. If that’s also unavailable, auto-retrieval is silently skipped — the agent still works, just without memory context.
Requirements
- Node.js 18+
- OpenClaw v2026.3.x+
- QMD — the on-device search engine that powers memory retrieval
Linux / Railway
QMD installs node-llama-cpp as a dependency, which compiles llama.cpp from C++ source at runtime. On macOS, Xcode command line tools include everything needed and this is invisible. On a fresh Linux container (including Railway), you need three things:
apt-get update && apt-get install -y git cmake build-essentialgit—node-llama-cppusesgit cloneto pull thellama.cppsource. Without it, the build enters an infinite retry loop with no error message.cmake— required to compilellama.cppfrom source.build-essential— C/C++ compiler toolchain.
Add this to your Railway Dockerfile or nixpacks.toml before the npm install step.
Bun is not supported
QMD must run under Node.js, not Bun. node-llama-cpp ships pre-built native binaries for Node.js only. Under Bun, the native addon crashes silently on model load — no error, no warning, search just returns empty. Everything else (HTTP server, BM25 indexing, SQLite) works fine under Bun, so it looks healthy until you test an actual search query.
If your stack uses Bun, run QMD in a separate process under Node.js.
GPU acceleration
On servers without a GPU (including Railway), set NODE_LLAMA_CPP_GPU=false before starting QMD. Without this, node-llama-cpp tries to compile a Vulkan variant at runtime, which fails and falls back to CPU — but wastes 5+ minutes of build time on every container start.
Cold start
The first search after a fresh deploy takes 10-20s on CPU while QMD loads embedding and reranking models into memory (~2.5GB total). Subsequent searches are fast (<1s). If you're wrapping QMD in a search endpoint, set your timeout to at least 30s to handle the cold start. Don't assume search is broken because the first call is slow.
Setup
1. Install QMD
npm install -g @tobilu/qmdFirst run downloads the GGUF embedding model (~400MB). This only happens once.
2. Install the plugin
From npm (recommended):
openclaw plugins install @kavinbmittal/lia-memory-engineFrom source:
cd ~/.openclaw/extensions
git clone <this-repo> lia-memory-engine
cd lia-memory-engine
npm install
npm run buildWhen installing from npm, OpenClaw discovers the plugin automatically — skip step 5 (the plugins.load.paths config).
3. Register your memory collection
Point QMD at the directory where Lia writes transcripts. By default this is memory/ inside your agent’s workspace:
qmd collection add /path/to/your/workspace/memory --name lia-memoryRun this once per workspace. If you’re not sure where your workspace is, check your OpenClaw config — the agent’s working directory is the workspace.
4. Index existing transcripts
qmd embed -c lia-memoryIf you’re starting fresh with no prior transcripts, skip this — the plugin will handle it on bootstrap.
5. Add the Engine to OpenClaw Config
In ~/.openclaw/openclaw.json, add the minimum viable config:
{
“plugins”: {
“load”: {
“paths”: [“~/.openclaw/extensions/lia-memory-engine”]
},
“slots”: {
“contextEngine”: “lia-memory-engine”
},
“entries”: {
“lia-memory-engine”: {
“enabled”: true
}
}
}
}The plugins.slots.contextEngine line is required. Without it, the plugin installs and shows enabled: true, but OpenClaw silently falls back to its built-in safeguard compaction. The memory_search tool registers, but none of the engine lifecycle methods fire — no assemble(), no ingest(), no compact(), no auto-flush. There’s no error or warning in older versions (v1.1+ logs a warning).
On first session start, the plugin starts the QMD daemon automatically. Models stay warm across sessions — no loading penalty after the first one.
See Configuration for all available options and recommended session settings.
6. Set Engine Parameters in OpenClaw Config
All options go under plugins.entries.lia-memory-engine.config in openclaw.json:
“lia-memory-engine”: {
“enabled”: true,
“config”: {
“compactionThreshold”: 0.80,
“compactionModel”: “anthropic/claude-haiku-4-5”,
“autoRetrieval”: false,
“autoRetrievalTimeoutMs”: 500,
“transcriptRetentionDays”: 180
}
}| Key | Type | Default | Description |
|-----|------|---------|-------------|
| enabled | boolean | true | Enable/disable the entire plugin |
| compactionThreshold | number | 0.80 | Fraction of context window that triggers compaction (0.1–1.0). Measured against the live conversation snapshot each turn. At this threshold, the engine splits messages at the midpoint and summarizes the older half |
| compactionModel | string | anthropic/claude-haiku-4-5 | Model used for compaction summarization. Must be a fast model — it runs synchronously during compaction |
| autoRetrieval | boolean | false | Automatically search memory files and inject relevant context before every model turn. Uses the last user message as the search query. Disabled by default — breaks prompt cache when enabled |
| autoRetrievalTimeoutMs | number | 500 | Maximum time in ms to wait for auto-retrieval results. Keeps the agent responsive — if QMD doesn’t respond in time, the turn proceeds without memory context |
| transcriptRetentionDays | number | 180 | Days to keep daily transcript files before cleanup. Set higher if you want longer memory recall |
| qmdHost | string | localhost | QMD HTTP daemon hostname |
| qmdPort | number | 8181 | QMD HTTP daemon port |
| qmdCollectionName | string | lia-memory | QMD collection name. Change this if you run multiple agents with separate memory pools |
| enableVectorSearch | boolean | true | Enable vector semantic search + LLM reranking. Requires a ~400MB GGUF model download on first run. When false, only BM25 keyword search is used |
To disable vector search and use BM25 only (no model download required):
{ “enableVectorSearch”: false }7. Make a note of OpenClaw Session Reset Config
The plugin handles compaction (in-place summarization, no reset), but OpenClaw’s session reset policy is separate. Without configuring it, sessions may reset unexpectedly and lose context that the plugin has been carefully preserving.
{
“agents”: {
“defaults”: {
“session”: {
“reset”: {
“mode”: “idle”,
“idleMinutes”: 10080
}
}
}
}
}This sets sessions to reset only after 7 days of inactivity (10080 minutes). Since the plugin’s compaction keeps context usable indefinitely, you don’t need aggressive session resets.
Verify it’s working
After setup, confirm the plugin is actually active as the context engine:
Check gateway logs on startup. Look for:
[lia-memory-engine] Registered as context engineIf you see
WARNING: Plugin loaded but not assigned as context engineinstead, the slot assignment is missing — go back to step 5.Send a message, then check the transcript. After your first message in a session, verify that
memory/daily/YYYY-MM-DD.mdexists in your workspace and contains conversation entries (format:## HH:MMwith**User:**and**Agent:**sections). If the file doesn’t exist, the engine’safterTurn()is not being called.Check
/statusoutput. Compaction events should showlia-memory-engineas the source. If you see “safeguard mode” or no compaction source, the plugin isn’t slotted.Verify QMD search returns results. After a few messages, ask your agent to use the
memory_searchtool (e.g. “search your memory for [something you just discussed]”). If it returns actual matches with snippets, retrieval is working end-to-end. If it returns “No results found” despite having transcripts on disk, QMD’s search pipeline is broken — check for the issues below. This is the most important step. Without it, you won’t know if memory retrieval silently failed — auto-flush and registration can succeed while search returns nothing.Check for the infinite clone loop (Linux/Docker). If your logs show this repeating endlessly:
[node-llama-cpp] Cloning ggml-org/llama.cpp (local bundle) 0% [node-llama-cpp] Cloning ggml-org/llama.cpp (GitHub) 0%cmakeis not installed.node-llama-cppneeds it to compilellama.cppfrom source, and without it enters an infinite retry loop with no error message. See the Linux / Railway section. This won’t happen on macOS (Xcode includes cmake), only on Linux containers.
Architecture
The engine implements OpenClaw's ContextEngine interface with ownsCompaction: true. It never stores messages — OpenClaw's JSONL session files are the source of truth.
| Hook | When | What it does |
|------|------|-------------|
| bootstrap() | Session start | Creates memory dirs, starts QMD daemon |
| assemble() | Before each model run | Passes through OpenClaw's messages, adds QMD auto-retrieval context to system prompt |
| afterTurn() | After each turn | Flushes new messages to transcript (counter-based), checks compaction threshold |
| compact() | When threshold hit | Takes OpenClaw's messages, summarizes older half via Haiku, returns compacted result |
| search() | memory_search tool | Full hybrid QMD search |
| dispose() | Shutdown | Clears session trackers (no message data to lose) |
index.ts Plugin entry point — register(), configSchema, tool registration
src/
engine.ts LiaContextEngine — implements ContextEngine interface (stateless, no message storage)
compact.ts Compaction logic — midpoint split, Haiku summarization
auto-flush.ts Transcript formatting and daily file writes
search.ts Search functions — auto-retrieval and memory_search
qmd-client.ts QMD HTTP daemon client — hybrid search, daemon lifecycle
types.ts Type definitions and config defaultsLLM Access
The plugin needs LLM access for compaction. It tries three methods in order:
api.completeSimple()— if exposed by OpenClaw’s plugin API@mariozechner/pi-ai— dynamic import (OpenClaw’s internal LLM router)@anthropic-ai/sdk— direct Anthropic SDK (requiresANTHROPIC_API_KEYenv var)
