mnemonic-memory
v1.0.0
Published
Agentic memory engine for AI coding agents. Auto-learns from tool outcomes, pairs errors with fixes, surfaces patterns via semantic search. MCP server.
Maintainers
Readme
MCP server that silently learns from your AI agent's tool calls. Features memory decay, pinning, entity resolution, consolidation, temporal queries, and reflection — giving every new session full continuity with zero manual effort.
Works with Claude Code, Cursor, Windsurf, or any MCP-compatible client.
Quick Start
# 1. Install globally
npm install -g mnemonic-memory
# 2. Initialize in your project
cd your-project
mnemonic initYou'll need a free Gemini API key from aistudio.google.com/apikey
That's it. Every tool call is now auto-captured. No agent configuration needed.
What It Learns
| Type | Example |
|------|---------|
| Gotchas | flush_and_resume() breaks GPU command batching — 10x slowdown |
| Patterns | cargo test needs --test-threads=1 for DB tests |
| Error fixes | error[E0308] mismatched types → use .parse().unwrap() |
| File hotspots | src/gpu/gpu_context.cpp touched 158 times |
| Entities | Links facts to specific files, modules, services, libraries |
| Insights | Synthesized analysis of patterns and repeated mistakes |
Key Features
Memory Decay + Strengthening
Facts naturally fade if unused (strength *= 0.95^days). Frequently searched facts get stronger. No more drowning in stale grep patterns — important knowledge rises, noise sinks.
Pinned Memories
Critical knowledge (architecture decisions, dangerous gotchas) can be pinned — exempt from decay, always shown in context. Use save_memory with pinned: true or pin_memory to pin existing facts.
Entity Resolution
Entities (files, modules, services) are auto-extracted from facts via Gemini and linked. Search search_entity("gpu_context") to find every memory mentioning that entity.
Sleep-like Consolidation
Run consolidate_memory to:
- Prune weak facts (strength < 0.05) to archive
- Merge near-duplicate facts via Gemini (cosine > 0.92)
- Boost recently accessed facts
- Dedup redundant commands
Also runs automatically every 24 hours in the background.
Temporal Queries
Search with natural language time expressions:
search_memory("errors last 3 days")
search_memory("GPU changes this week")
search_memory("what happened yesterday")Reflect
Synthesize insights from related memories:
reflect("GPU performance")
→ Patterns, repeated mistakes, architectural insights, recommendationsOptionally saves insights as new facts for future sessions.
Context Budget
get_context respects a token budget (default 25,000). Outputs in priority order: pinned facts → handoff → gotchas → key files → patterns → debug → commands. Never blows your context window.
Multi-Stage Search
Parallel FTS5 + vector search with weighted scoring:
- Vector similarity (0.7) + FTS rank (0.15) + strength (0.15)
- Pinned facts get 1.5x ranking boost
- Accessed facts auto-strengthen on search hits
MCP Tools (9)
| Tool | Description |
|:-----|:------------|
| get_context | Full memory dump with budget — call at session start |
| search_memory | Multi-stage search with temporal queries |
| save_memory | Store a fact with category, confidence, and optional pinning |
| observe_tool_call | Auto-learn from tool outcomes |
| pin_memory | Pin a fact — never decays, always in context |
| unpin_memory | Unpin a fact — resumes normal decay |
| search_entity | Find all facts linked to an entity |
| consolidate_memory | Run sleep-like memory consolidation |
| reflect | Synthesize insights from related memories via Gemini |
Agent Setup
Add this to your project's CLAUDE.md so the agent uses memory proactively:
## MemoryYou have a persistent memory server (mnemonic) via MCP with 9 tools.
- At session start, call
get_contextto load previous session knowledge- When you hit an error, call
search_memorywith the error message to check for known fixes- After learning something critical, call
save_memorywithpinned: true- Use
search_entityto find all memories about a specific file or module- Use
reflectto analyze patterns in a topic area before starting complex work- Periodically call
consolidate_memoryto clean up stale memoriesCategories:
gotcha·pattern·preference·convention·environment·failure·insight
Configuration
3-layer config with merge semantics — each layer only overrides what it specifies:
Defaults → ~/.mnemonic/config.toml → .mnemonic.toml → Environment variables# .mnemonic.toml
[search]
min_similarity = 0.4
default_limit = 5
[decay]
rate = 0.95 # daily decay multiplier
archive_threshold = 0.05 # prune below this strength
strength_boost = 0.1 # boost per search hit
max_strength = 2.0 # strength cap
[context]
max_context_tokens = 25000
recent_commands = 20
recent_debug = 20
[consolidation]
interval_hours = 24
merge_similarity_threshold = 0.92
[limits]
max_commands = 200
max_debug = 100
fact_expiry_days = 30
[observe]
extract_facts = true
[sync]
enabled = true
bucket = "my-bucket"
region = "us-east-1"
sync_interval_secs = 300Supported Platforms
| Platform | Architecture | Status | |:---------|:------------|:-------| | macOS | Apple Silicon (arm64) | Available | | Linux | arm64 | Available (with S3 sync) | | Linux | x64 | Available | | macOS | Intel (x64) | CI | | Windows | x64 | CI |
How It Compares
| Feature | mnemonic | Memorix | Engram | Hindsight | OpenMemory | |---------|:--------:|:-------:|:------:|:---------:|:----------:| | Auto-capture (zero config) | Yes | No | No | No | No | | Memory decay + strengthening | Yes | No | Yes | No | No | | Pinned/permanent memories | Yes | Yes | No | No | No | | Entity resolution | Yes | No | No | Yes | No | | Sleep-like consolidation | Yes | No | Yes | No | No | | Temporal queries | Yes | No | No | Yes | No | | Reflect/synthesize | Yes | No | No | Yes | No | | Context budget management | Yes | No | No | No | No | | Multi-stage retrieval | Yes | Yes | No | Yes | No | | S3 cloud sync | Yes | No | No | No | No | | Error→fix pairing | Yes | No | No | No | No | | npm installable | Yes | No | Yes | No | No |
Storage
Per-project SQLite database at ~/.mnemonic/<project-hash>/memory.db. Each git repo gets isolated memory. All data stays local — no telemetry, no cloud unless you opt into S3 sync.
License
Apache-2.0
