agent-working-memory
v0.5.0
Published
Cognitive memory layer for AI agents — activation-based retrieval, salience filtering, associative connections
Readme
AgentWorkingMemory (AWM)
Persistent working memory for AI agents.
AWM helps agents retain important project knowledge across conversations and sessions. Instead of storing everything and retrieving by similarity alone, it filters for salience, builds associative links between related memories, and periodically consolidates useful knowledge while letting noise fade.
Use it through Claude Code via MCP or as a local HTTP service for custom agents. Everything runs locally: SQLite + ONNX models + Node.js. No cloud, no API keys.
Without AWM
- Agent forgets earlier architecture decision
- Suggests Redux after project standardized on Zustand
- Repeats discussion already settled three days ago
- Every new conversation starts from scratch
With AWM
- Recalls prior state-management decision and rationale
- Surfaces related implementation patterns from past sessions
- Continues work without re-asking for context
- Gets more consistent the longer you use it
Quick Start
Node.js 20+ required — check with node --version.
npm install -g agent-working-memory
awm setup --globalRestart Claude Code. That's it — 13 memory tools appear automatically.
First conversation will be ~30 seconds slower while ML models download (~124MB, cached locally). After that, everything runs on your machine.
For isolated memory per folder, see Separate Memory Pools. For team onboarding, see docs/quickstart.md.
Who this is for
- Long-running coding agents that need cross-session project knowledge
- Multi-agent workflows where specialized agents share a common memory
- Local-first setups where cloud memory is not acceptable
- Teams using Claude Code who want persistent context without manual notes
What this is not
- Not a chatbot UI
- Not a hosted SaaS
- Not a generic vector database
- Not a replacement for your source of truth (code, docs, tickets)
Why it's different
Most "memory for AI" projects are vector databases with a retrieval wrapper. AWM goes further:
| | Typical RAG / Vector Store | AWM | |---|---|---| | Storage | Everything | Only novel, salient events (77% filtered at write time) | | Retrieval | Cosine similarity | 10-phase pipeline: BM25 + vectors + reranking + graph walk + decay | | Connections | None | Hebbian edges that strengthen when memories co-activate | | Over time | Grows forever, gets noisier | Consolidation: strengthens clusters, prunes noise, builds bridges | | Forgetting | Manual cleanup | Cognitive forgetting: unused memories fade, confirmed knowledge persists | | Feedback | None | Useful/not-useful signals tune confidence and retrieval rank | | Correction | Delete and re-insert | Retraction: wrong memories invalidated, corrections linked, penalties propagate |
The design is based on cognitive science — ACT-R activation decay, Hebbian learning, complementary learning systems, and synaptic homeostasis — rather than ad-hoc heuristics. See How It Works and docs/cognitive-model.md for details.
Benchmarks
| Eval | Score | What it tests | |------|-------|---------------| | Edge Cases | 100% (34/34) | 9 failure modes: hub toxicity, flashbulb distortion, narcissistic interference, identity collision, noise forgetting benefit | | Stress Test | 92.3% (48/52) | 500 memories, 100 sleep cycles, catastrophic forgetting, adversarial spam | | A/B Test | AWM 100% vs Baseline 83% | 100 project events, 24 recall questions | | Self-Test | 97.4% | 31 pipeline component checks | | Workday | 86.7% | 43 memories across 4 simulated work sessions | | Real-World | 93.1% | 300 code chunks from a 71K-line production monorepo | | Token Savings | 64.5% savings | Memory-guided context vs full conversation history |
All evals are reproducible: npm run test:self, npm run test:edge, npm run test:stress, etc. See Testing & Evaluation and docs/benchmarks.md for full details.
Features
Memory Tools (13)
| Tool | Purpose |
|------|---------|
| memory_write | Store a memory (salience filter decides disposition) |
| memory_recall | Retrieve relevant memories by context |
| memory_feedback | Report whether a recalled memory was useful |
| memory_retract | Invalidate a wrong memory with optional correction |
| memory_stats | View memory health metrics and activity |
| memory_checkpoint | Save execution state (survives context compaction) |
| memory_restore | Recover state + relevant context at session start |
| memory_task_add | Create a prioritized task |
| memory_task_update | Change task status/priority |
| memory_task_list | List tasks by status |
| memory_task_next | Get the highest-priority actionable task |
| memory_task_begin | Start a task — auto-checkpoints and recalls context |
| memory_task_end | End a task — writes summary and checkpoints |
Separate Memory Pools
By default, all projects share one memory pool. For isolated pools per folder, place a .mcp.json in each parent folder with a different AWM_AGENT_ID:
C:\Users\you\work\.mcp.json → AWM_AGENT_ID: "work"
C:\Users\you\personal\.mcp.json → AWM_AGENT_ID: "personal"Claude Code uses the closest .mcp.json ancestor. Same database, isolation by agent ID.
Incognito Mode
AWM_INCOGNITO=1 claudeRegisters zero tools — Claude doesn't see memory at all. All other tools and MCP servers work normally.
Auto-Checkpoint Hooks
Installed by awm setup --global:
- Stop — reminds Claude to write/recall after each response
- PreCompact — auto-checkpoints before context compression
- SessionEnd — auto-checkpoints and consolidates on close
- 15-min timer — silent auto-checkpoint while session is active
Activity Log
tail -f "$(npm root -g)/agent-working-memory/data/awm.log"Real-time: writes, recalls, checkpoints, consolidation, hook events.
Activity Stats
curl http://127.0.0.1:8401/statsReturns daily counts: {"writes": 8, "recalls": 9, "hooks": 3, "total": 25}
Memory Invocation Strategy
AWM combines deterministic hooks for guaranteed memory operations at lifecycle transitions with agent-directed usage during active work.
Deterministic triggers (always happen)
| Event | Action |
|-------|--------|
| Session start | memory_restore — recover state + recall context |
| Pre-compaction | Auto-checkpoint via hook sidecar |
| Session end | Auto-checkpoint + full consolidation |
| Every 15 min | Silent auto-checkpoint (if active) |
| Task start | memory_task_begin — checkpoint + recall |
| Task end | memory_task_end — summary + checkpoint |
Agent-directed triggers (when these situations occur)
Write memory when:
- A project decision is made or changed
- A root cause is discovered
- A reusable implementation pattern is established
- A preference, constraint, or requirement is clarified
- A prior assumption is found to be wrong
Recall memory when:
- Starting work on a new task or subsystem
- Re-entering code you haven't touched recently
- After context compaction
- After a failed attempt (check if there's prior knowledge)
- Before refactoring or making architectural changes
Retract when:
- A stored memory turns out to be wrong or outdated
Feedback when:
- A recalled memory was used (useful) or irrelevant (not useful)
HTTP API
For custom agents, scripts, or non-Claude-Code workflows:
awm serve # From npm install
npx tsx src/index.ts # From sourceWrite a memory:
curl -X POST http://localhost:8400/memory/write \
-H "Content-Type: application/json" \
-d '{
"agentId": "my-agent",
"concept": "Express error handling",
"content": "Use centralized error middleware as the last app.use()",
"eventType": "causal",
"surprise": 0.5,
"causalDepth": 0.7
}'Recall:
curl -X POST http://localhost:8400/memory/activate \
-H "Content-Type: application/json" \
-d '{
"agentId": "my-agent",
"context": "How should I handle errors in my Express API?"
}'How It Works
The Memory Lifecycle
Write — Salience scoring evaluates novelty, surprise, causal depth, and effort. High-salience memories go active; borderline ones enter staging; noise is discarded.
Connect — Vector embedding (MiniLM-L6-v2, 384d). Temporal edges link to recent memories. Hebbian edges form between co-retrieved memories.
Retrieve — 10-phase pipeline: BM25 + semantic search + cross-encoder reranking + temporal decay (ACT-R) + graph walks + confidence gating.
Consolidate — 7-phase sleep cycle: replay clusters, strengthen edges, bridge cross-topic, decay unused, normalize hubs, forget noise, sweep staging.
Feedback — Useful/not-useful signals adjust confidence, affecting retrieval rank and forgetting resistance.
Cognitive Foundations
- ACT-R activation decay (Anderson 1993) — memories decay with time, strengthen with use
- Hebbian learning — co-retrieved memories form stronger associative edges
- Complementary Learning Systems — fast capture (salience + staging) + slow consolidation (sleep cycle)
- Synaptic homeostasis — edge weight normalization prevents hub domination
- Forgetting as feature — noise removal improves signal-to-noise for connected memories
Architecture
src/
core/ # Cognitive primitives
embeddings.ts - Local vector embeddings (MiniLM-L6-v2, 384d)
reranker.ts - Cross-encoder passage scoring (ms-marco-MiniLM)
query-expander.ts - Synonym expansion (flan-t5-small)
salience.ts - Write-time importance scoring (novelty + salience)
decay.ts - ACT-R temporal activation decay
hebbian.ts - Association strengthening/weakening
logger.ts - Append-only activity log (data/awm.log)
engine/ # Processing pipelines
activation.ts - 10-phase retrieval pipeline
consolidation.ts - 7-phase sleep cycle consolidation
connections.ts - Discover links between memories
staging.ts - Weak signal buffer (promote or discard)
retraction.ts - Negative memory / corrections
eviction.ts - Capacity enforcement
hooks/
sidecar.ts - Hook HTTP server (auto-checkpoint, stats, timer)
storage/
sqlite.ts - SQLite + FTS5 persistence layer
api/
routes.ts - HTTP endpoints (memory + task + system)
mcp.ts - MCP server (13 tools, incognito support)
cli.ts - CLI (setup, serve, hook config)
index.ts - HTTP server entry pointFor detailed architecture including pipeline phases, database schema, and system diagrams, see docs/architecture.md. For an implementation plan to improve memory precision and stale-context suppression, see docs/memory-quality-hardening-rfc.md.
Testing & Evaluation
Unit Tests
npx vitest run # 68 testsEval Suites
| Command | What it tests | Score |
|---------|--------------|-------|
| npm run test:self | 31 pipeline checks: embeddings, BM25, reranker, decay, confidence, Hebbian, graph walks, staging | 97.4% |
| npm run test:edge | 9 adversarial failure modes: context collapse, hub toxicity, flashbulb distortion, narcissistic interference, identity collision, contradiction, bridge overshoot, noise benefit | 100% |
| npm run test:stress | 500 memories, 100 sleep cycles, catastrophic forgetting, adversarial spam, recovery | 92.3% |
| npm run test:workday | 43 memories across 4 projects, 14 recall challenges | 86.7% |
| npm run test:ab | AWM vs keyword baseline, 100 events, 24 questions | AWM 100% vs 83% |
| npm run test:tokens | Token savings vs full conversation history | 64.5% |
| npm run test:realworld | 300 chunks from 71K-line monorepo, 16 challenges | 93.1% |
Environment Variables
| Variable | Default | Purpose |
|----------|---------|---------|
| AWM_PORT | 8400 | HTTP server port |
| AWM_DB_PATH | memory.db | SQLite database path |
| AWM_AGENT_ID | claude-code | Agent ID (memory namespace) |
| AWM_EMBED_MODEL | Xenova/all-MiniLM-L6-v2 | Embedding model |
| AWM_EMBED_DIMS | 384 | Embedding dimensions |
| AWM_RERANKER_MODEL | Xenova/ms-marco-MiniLM-L-6-v2 | Reranker model |
| AWM_HOOK_PORT | 8401 | Hook sidecar port |
| AWM_HOOK_SECRET | (none) | Bearer token for hook auth |
| AWM_INCOGNITO | (unset) | Set to 1 to disable all tools |
Tech Stack
| Component | Technology | |-----------|-----------| | Language | TypeScript (ES2022, strict) | | Database | SQLite via better-sqlite3 + FTS5 | | HTTP | Fastify 5 | | MCP | @modelcontextprotocol/sdk | | ML Runtime | @huggingface/transformers (local ONNX) | | Tests | Vitest 4 | | Validation | Zod 4 |
All three ML models run locally via ONNX. No external API calls for retrieval. The entire system is a single SQLite file + a Node.js process.
Project Status
AWM is in active development (v0.4.x). The core memory pipeline, consolidation system, and MCP integration are stable and used daily in production coding workflows.
- Core retrieval and consolidation: stable
- MCP tools and Claude Code integration: stable
- Task management: stable
- Hook sidecar and auto-checkpoint: stable
- HTTP API: stable (for custom agents)
See CHANGELOG.md for version history.
