agent-working-memory
v0.7.17
Published
Cognitive memory layer for AI agents — activation-based retrieval, salience filtering, associative connections
Readme
AgentWorkingMemory (AWM)
Persistent working memory for AI agents.
AWM helps agents retain important project knowledge across conversations and sessions. Instead of storing everything and retrieving by similarity alone, it filters for salience, builds associative links between related memories, and periodically consolidates useful knowledge while letting noise fade.
Use it through Claude Code via MCP or as a local HTTP service for custom agents. Everything runs locally: SQLite + ONNX models + Node.js. No cloud, no API keys.
Without AWM
- Agent forgets earlier architecture decision
- Suggests Redux after project standardized on Zustand
- Repeats discussion already settled three days ago
- Every new conversation starts from scratch
With AWM
- Recalls prior state-management decision and rationale
- Surfaces related implementation patterns from past sessions
- Continues work without re-asking for context
- Gets more consistent the longer you use it
Quick Start
Node.js 20+ required — check with node --version.
npm install -g agent-working-memory
awm setup --globalRestart Claude Code. That's it — 14 memory tools appear automatically.
Upgrading
npm install -g agent-working-memory@latest
awm setup --global # Updates MCP config, CLAUDE.md instructions, and hooksRestart Claude Code after upgrading. Your existing memory database is preserved — all upgrades are backward compatible. New features (metadata tags, workspace recall, synthesis) are opt-in.
From v0.6.x → v0.7.x: The
memory_writetool now accepts optional metadata parameters (project,topic,session_id, etc.) that improve recall quality. Re-runningawm setup --globalupdates your CLAUDE.md with instructions for the agent to use them.
First conversation will be ~30 seconds slower while ML models download (~200MB total, cached locally). After that, everything runs on your machine.
For isolated memory per folder, see Separate Memory Pools. For team onboarding, see docs/quickstart.md.
Who this is for
- Long-running coding agents that need cross-session project knowledge
- Multi-agent workflows where specialized agents share a common memory
- Local-first setups where cloud memory is not acceptable
- Teams using Claude Code who want persistent context without manual notes
What this is not
- Not a chatbot UI
- Not a hosted SaaS
- Not a generic vector database
- Not a replacement for your source of truth (code, docs, tickets)
Why it's different
Most "memory for AI" projects are vector databases with a retrieval wrapper. AWM goes further:
| | Typical RAG / Vector Store | AWM | |---|---|---| | Storage | Everything | Salience-filtered with low-confidence fallback (novel events go active, borderline enter staging, low-salience stored at reduced confidence) | | Retrieval | Cosine similarity | 10-phase pipeline: dual BM25 (keyword + expanded) + vectors + reranking + graph walk + decay + coref expansion | | Connections | None | Hebbian edges that strengthen when memories co-activate | | Over time | Grows forever, gets noisier | Consolidation: diameter-enforced clustering, cross-topic bridges, synaptic-tagged decay | | Forgetting | Manual cleanup | Cognitive forgetting: unused memories fade, reinforced knowledge persists (access-count modulated) | | Feedback | None | Useful/not-useful signals tune confidence and retrieval rank | | Correction | Delete and re-insert | Retraction: wrong memories invalidated, corrections linked, penalties propagate (depth 2, decaying) | | Graph | None or single graph | Multi-graph: semantic, temporal, causal, entity — independent traversal with fused scoring | | Learning | Unconditional co-activation | Validation-gated: edges strengthen only on positive feedback (Kairos-inspired) | | Noise rejection | None | Multi-channel agreement gate: requires 2+ retrieval channels to agree before returning results | | Duplicates | Stored repeatedly | Reinforce-on-duplicate: near-exact matches boost existing memory instead of creating copies |
The design is based on cognitive science — ACT-R activation decay, Hebbian learning, complementary learning systems, synaptic homeostasis, and synaptic tagging — rather than ad-hoc heuristics. See How It Works and docs/cognitive-model.md for details.
Benchmarks (v0.6.0)
Eval Harness (new in v0.6.0)
| Suite | Score | Threshold | What it tests | |-------|-------|-----------|---------------| | Retrieval | Recall@5 = 0.800 | >= 0.80 | 200 facts, 50 queries — BM25 + vector + reranker pipeline precision | | Associative | success@10 = 1.000 | >= 0.70 | 20 multi-hop causal chains — graph walk finds non-obvious connections | | Redundancy | dedup F1 = 0.966 | >= 0.80 | 50 clusters × 4 paraphrases — consolidation removes duplicates correctly | | Temporal | Spearman = 0.932 | >= 0.75 | 25 facts with controlled age/access — ACT-R decay ranking accuracy |
Key finding: consolidation improves retrieval by 30% — post-consolidation recall (0.950) exceeds pre-consolidation (0.650). Removing redundant noise helps ranking.
Full Test Suite
| Command | Score | What it tests |
|---------|-------|---------------|
| npm run eval | 4/4 suites pass | Retrieval, associative, redundancy, temporal benchmarks with ablation support |
| npm run test:run | 77/77 tests | Unit tests: salience, decay, hebbian, supersession, coordination |
| npm run test:mcp | 5/5 pass | MCP protocol: write, recall, feedback, retract, stats |
| npm run test:self | 94.1% EXCELLENT | Pipeline component checks across all cognitive subsystems |
| npm run test:edge | All pass | 9 failure modes: narcissistic interference, identity collision, contradiction trapping, bridge overshoot, noise forgetting |
| npm run test:stress | 96.2% (50/52) | 500 memories, 100 sleep cycles, catastrophic forgetting, adversarial spam, recovery |
| npm run test:workday | 93.3% EXCELLENT | 43 memories across 4 projects, cross-cutting queries, noise filtering |
| npm run test:ab | AWM 20/22 vs Baseline 18/22 | AWM outperforms keyword baseline on architecture + testing topics |
| npm run test:sleep | 71.4% | 60 memories, 4 topic clusters, consolidation impact across 3 cycles |
| npm run test:tokens | 56.3% savings, 2.3x efficiency | Memory-guided context vs full history, keyword accuracy 72.5% |
| npm run test:pilot | 14/15 pass | Production-like queries with noise rejection (5/5 noise rejected) |
| npm run test:locomo | 28.2% | Industry-standard LoCoMo conversational memory benchmark (1,986 QA pairs) |
Consolidation Health (v0.6.0)
| Metric | Value | |--------|-------| | Topic clusters formed | 10 per consolidation cycle | | Cross-topic bridges | 20 in first cycle | | Edges strengthened | 135 per cycle (access-weighted) | | Graph size at scale | 3,000-4,500 edges (500 memories) | | Recall after 100 cycles | 90% stable | | Catastrophic forgetting survival | 5/5 (100%) | | Post-dedup retrieval | 0.950 (consolidation improves recall) |
All evals are reproducible. See Testing & Evaluation.
Features
Memory Tools (14)
| Tool | Purpose |
|------|---------|
| memory_write | Store a memory (salience filter + reinforce-on-duplicate) |
| memory_recall | Retrieve relevant memories by context (dual BM25 + coref expansion) |
| memory_feedback | Report whether a recalled memory was useful |
| memory_retract | Invalidate a wrong memory with optional correction |
| memory_supersede | Replace outdated memory with current version |
| memory_stats | View memory health metrics and activity |
| memory_checkpoint | Save execution state (survives context compaction) |
| memory_restore | Recover state + relevant context at session start |
| memory_task_add | Create a prioritized task |
| memory_task_update | Change task status/priority |
| memory_task_list | List tasks by status |
| memory_task_next | Get the highest-priority actionable task |
| memory_task_begin | Start a task — auto-checkpoints and recalls context |
| memory_task_end | End a task — writes summary and checkpoints |
Separate Memory Pools
By default, all projects share one memory pool. For isolated pools per folder, place a .mcp.json in each parent folder with a different AWM_AGENT_ID:
C:\Users\you\work\.mcp.json -> AWM_AGENT_ID: "work"
C:\Users\you\personal\.mcp.json -> AWM_AGENT_ID: "personal"Claude Code uses the closest .mcp.json ancestor. Same database, isolation by agent ID.
Incognito Mode
AWM_INCOGNITO=1 claudeRegisters zero tools — Claude doesn't see memory at all. All other tools and MCP servers work normally.
Auto-Checkpoint Hooks
Installed by awm setup --global:
- Stop — reminds Claude to write/recall after each response
- PreCompact — auto-checkpoints before context compression
- SessionEnd — auto-checkpoints and consolidates on close
- 15-min timer — silent auto-checkpoint while session is active
Auto-Backup
The HTTP server automatically copies the database to a backups/ directory on startup with a timestamp. Cheap insurance against data loss.
Activity Log
tail -f "$(npm root -g)/agent-working-memory/data/awm.log"Real-time: writes, recalls, reinforcements, checkpoints, consolidation, hook events.
Activity Stats
curl http://127.0.0.1:8401/statsReturns daily counts: {"writes": 8, "recalls": 9, "hooks": 3, "total": 25}
Memory Invocation Strategy
AWM combines deterministic hooks for guaranteed memory operations at lifecycle transitions with agent-directed usage during active work.
Deterministic triggers (always happen)
| Event | Action |
|-------|--------|
| Session start | memory_restore — recover state + recall context |
| Pre-compaction | Auto-checkpoint via hook sidecar |
| Session end | Auto-checkpoint + full consolidation |
| Every 15 min | Silent auto-checkpoint (if active) |
| Task start | memory_task_begin — checkpoint + recall |
| Task end | memory_task_end — summary + checkpoint |
Agent-directed triggers (when these situations occur)
Write memory when:
- A project decision is made or changed
- A root cause is discovered
- A reusable implementation pattern is established
- A preference, constraint, or requirement is clarified
- A prior assumption is found to be wrong
Recall memory when:
- Starting work on a new task or subsystem
- Re-entering code you haven't touched recently
- After context compaction
- After a failed attempt (check if there's prior knowledge)
- Before refactoring or making architectural changes
Retract when:
- A stored memory turns out to be wrong or outdated
Feedback when:
- A recalled memory was used (useful) or irrelevant (not useful)
HTTP API
For custom agents, scripts, or non-Claude-Code workflows:
awm serve # From npm install
npx tsx src/index.ts # From sourceWrite a memory:
curl -X POST http://localhost:8400/memory/write \
-H "Content-Type: application/json" \
-d '{
"agentId": "my-agent",
"concept": "Express error handling",
"content": "Use centralized error middleware as the last app.use()",
"eventType": "causal",
"surprise": 0.5,
"causalDepth": 0.7
}'Recall:
curl -X POST http://localhost:8400/memory/activate \
-H "Content-Type: application/json" \
-d '{
"agentId": "my-agent",
"context": "How should I handle errors in my Express API?"
}'How It Works
The Memory Lifecycle
Write — Salience scoring evaluates novelty, surprise, causal depth, and effort. High-salience memories go active; borderline ones enter staging; low-salience stored at reduced confidence for recall fallback. Near-duplicates reinforce existing memories instead of creating copies.
Connect — Vector embedding (BGE-small-en-v1.5, 384d). Temporal edges link to recent memories. Hebbian edges form between co-retrieved memories. Coref expansion resolves pronouns to entity names.
Retrieve — 10-phase pipeline: coref expansion + query expansion + dual BM25 (keyword-stripped + expanded) + semantic vectors + Rocchio pseudo-relevance feedback + ACT-R temporal decay (synaptic-tagged) + Hebbian boost + entity-bridge boost + graph walk + cross-encoder reranking + multi-channel agreement gate.
Consolidate — 7-phase sleep cycle: diameter-enforced clustering (prevents chaining), edge strengthening (access-weighted), cross-topic bridge formation (direct closest-pair), confidence-modulated decay (synaptic tagging extends half-life), synaptic homeostasis, cognitive forgetting, staging sweep. Embedding backfill ensures all memories are clusterable.
Feedback — Useful/not-useful signals adjust confidence, affecting retrieval rank and forgetting resistance.
Cognitive Foundations
- ACT-R activation decay (Anderson 1993) — memories decay with time, strengthen with use. Synaptic tagging: heavily-accessed memories decay slower (log-scaled).
- Hebbian learning — co-retrieved memories form stronger associative edges
- Complementary Learning Systems — fast capture (salience + staging) + slow consolidation (sleep cycle)
- Synaptic homeostasis — edge weight normalization prevents hub domination
- Forgetting as feature — noise removal improves signal-to-noise for connected memories
- Diameter-enforced clustering — prevents semantic chaining (e.g., physics->biophysics->cooking = 1 cluster)
- Multi-channel agreement — OOD detection requires multiple retrieval channels to agree
Architecture
src/
core/ # Cognitive primitives
embeddings.ts - Local vector embeddings (BGE-small-en-v1.5, 384d)
reranker.ts - Cross-encoder passage scoring (ms-marco-MiniLM)
query-expander.ts - Synonym expansion (flan-t5-small)
salience.ts - Write-time importance scoring (novelty + salience + reinforce-on-duplicate)
decay.ts - ACT-R temporal activation decay
hebbian.ts - Association strengthening/weakening
logger.ts - Append-only activity log (data/awm.log)
engine/ # Processing pipelines
activation.ts - 10-phase retrieval pipeline (dual BM25, coref, agreement gate)
consolidation.ts - 7-phase sleep cycle (diameter clustering, direct bridging, synaptic tagging)
connections.ts - Discover links between memories
staging.ts - Weak signal buffer (promote or discard)
retraction.ts - Negative memory / corrections
eviction.ts - Capacity enforcement
hooks/
sidecar.ts - Hook HTTP server (auto-checkpoint, stats, timer)
storage/
sqlite.ts - SQLite + FTS5 persistence layer
api/
routes.ts - HTTP endpoints (memory + task + system)
mcp.ts - MCP server (14 tools, incognito support)
cli.ts - CLI (setup, serve, hook config)
index.ts - HTTP server entry point (auto-backup on startup)For detailed architecture including pipeline phases, database schema, and system diagrams, see docs/architecture.md.
Testing & Evaluation
Unit Tests
npx vitest run # 77 tests (salience, decay, hebbian, supersession)Eval Harness (v0.6.0)
npm run eval # All 4 benchmark suites
npm run eval -- --suite=retrieval # Single suite
npm run eval -- --bm25-only # Ablation: BM25 only
npm run eval -- --no-graph-walk # Ablation: disable graph walkSuites: retrieval (Recall@5), associative (multi-hop), redundancy (dedup F1), temporal (Spearman vs ACT-R). Ablation flags isolate each pipeline component's contribution.
Full Test Suite
npm run test:mcp # MCP protocol smoke test (5/5)
npm run test:self # Pipeline component checks (94.1%)
npm run test:edge # 9 adversarial failure modes
npm run test:stress # 500 memories, 100 consolidation cycles (96.2%)
npm run test:workday # 4-session production simulation (93.3%)
npm run test:ab # AWM vs baseline comparison
npm run test:sleep # Consolidation impact measurement
npm run test:tokens # Token savings analysis (56.3% savings)
npm run test:pilot # Production-like query validation (14/15)
npm run test:locomo # LoCoMo industry benchmark (28.2%)Environment Variables
| Variable | Default | Purpose |
|----------|---------|---------|
| AWM_PORT | 8400 | HTTP server port |
| AWM_DB_PATH | memory.db | SQLite database path |
| AWM_AGENT_ID | claude-code | Agent ID (memory namespace) |
| AWM_EMBED_MODEL | Xenova/bge-small-en-v1.5 | Embedding model (retrieval-optimized) |
| AWM_EMBED_DIMS | 384 | Embedding dimensions |
| AWM_RERANKER_MODEL | Xenova/ms-marco-MiniLM-L-6-v2 | Reranker model |
| AWM_HOOK_PORT | 8401 | Hook sidecar port |
| AWM_HOOK_SECRET | (none) | Bearer token for hook auth |
| AWM_API_KEY | (none) | Bearer token for HTTP API auth |
| AWM_INCOGNITO | (unset) | Set to 1 to disable all tools |
| AWM_COORDINATION | (unset) | Set to true to enable hive coordination endpoints |
| AWM_DISABLE_POOL_FILTER | (unset) | Set to 1 to disable the candidate pool reduction (0.7.7+). Reverts recall to scoring all active candidates — slower but useful for A/B testing if a recall regression appears |
| AWM_DISABLE_SLIM_CACHE | (unset) | Set to 1 to disable the in-memory slim cache (0.7.10+). Reverts to per-recall SQL fetch — slower but useful if cache invariants are suspected of drift |
| AWM_DISABLE_RERANK_SKIP | (unset) | Set to 1 to disable the reranker skip on clear-winner queries (0.7.10+). Forces every recall through the cross-encoder |
| AWM_DISABLE_EXPANSION_CACHE | (unset) | Set to 1 to disable the query expansion skip heuristic + LRU cache (0.7.11+). Forces every recall through the flan-t5-small expander |
| AWM_WORKSPACE | (unset) | Default workspace for cross-agent recall in hive setups |
Tech Stack
| Component | Technology | |-----------|-----------| | Language | TypeScript (ES2022, strict) | | Database | SQLite via better-sqlite3 + FTS5 | | HTTP | Fastify 5 | | MCP | @modelcontextprotocol/sdk | | ML Runtime | @huggingface/transformers (local ONNX) | | Embeddings | BGE-small-en-v1.5 (BAAI, retrieval-optimized, 384d) | | Reranker | ms-marco-MiniLM-L-6-v2 (cross-encoder) | | Query Expansion | flan-t5-small (synonym generation) | | Tests | Vitest 4 | | Validation | Zod 4 |
All three ML models run locally via ONNX. No external API calls for retrieval. The entire system is a single SQLite file + a Node.js process.
What's New in v0.7.16
awm setup --globaltemplate now teaches write quality. Two new sections inAWM_INSTRUCTION_CONTENT:- Writing for recall — explicit guidance that recall quality is determined at write time. Lead with the rule/fact, pick the most specific topic, include 2+ retrievable identifiers (file paths, function names, IDs), write in the vocabulary of the future query, reserve canonical for stable invariants, include the why for feedback memories.
- Recall strategy — formalizes the multi-query reformulation pattern observed in practice. When one query returns nothing, agents reformulate (synonyms, more specific nouns, exact identifiers). Recall is ~300ms — two-three reformulations cost less than one filesystem search. Cap at three to prevent loops.
These document the writer + reader behaviors AWM was always designed around but were previously implicit. No retriever change — pure system-prompt improvement. Run
npm install -g agent-working-memory@latest && awm setup --globalto apply.LongMemEval headline number updated. Re-running the benchmark on 0.7.16 (single-session-user, 50 questions, same adapter as the original 0.7.1 baseline): 68% accuracy with gpt-4o-mini, up from the original 40-50%. Recall latency 0.12s avg (was 7-11s on 0.7.2). Multi-tier reader sweep on the same memory inputs:
- gpt-4o-mini (cheap, non-thinking): 68%
- gpt-4o (strong, non-thinking): 68%
- o4-mini (cheap, thinking): 78%
- gpt-5-mini (mid, thinking): 80%
Non-thinking models cap at 68% on this category — the bottleneck is reasoning over recalled context, not raw scale. Thinking models add 10-12pp. Memory quality is fixed; reader determines the ceiling.
What's New in v0.7.15
- Documentation refresh —
awm setup --globalnow writes a CLAUDE.md template that documents all four perf env-var escape hatches (AWM_DISABLE_POOL_FILTER,AWM_DISABLE_SLIM_CACHE,AWM_DISABLE_RERANK_SKIP,AWM_DISABLE_EXPANSION_CACHE) instead of just the first one. Troubleshooting / quickstart / user-guide docs updated to reflect the current ~300ms recall floor. No code change — version bumped solely so the new template ships vianpm install -g agent-working-memory@latest.
What's New in v0.7.14
Recall latency 0.4-0.8s → 0.3-0.6s (~25-50% on top of 0.7.13) — three fixes:
- Batched cross-encoder inference — reranker now tokenizes + runs all query-passage pairs in one batched forward pass. 15-passage rerank: 210ms → 27ms (~7×).
- Truncate passages to 400 chars before rerank — cross-encoder has 512-token max anyway and pads to the longest passage; full content (5000+ chars) meant everything padded to max length. Truncation drops tokenization + inference 3-4× on long memory pools.
- Eager slim-cache populate at startup — first user recall no longer pays the ~600ms cache populate cost.
Recall quality A/B: 8/8 top-1, 4.50/5 top-5. Cumulative since 0.7.4 baseline: 11s → 0.3-0.6s (~25-37× faster).
What's New in v0.7.13
- Reranker pool size reduction — cross-encoder pool dropped from
max(limit*3, 30)tomax(limit*2, 15). For typical agent queries (limit=5 or 10), that's 15-20 candidates reranked instead of 30, halving the cross-encoder cost. Top-K quality preserved (8/8 top-1, identical top-5/top-10 overlap) — reranking the 21st-30th candidates was wasted when the user only wants top-5 anyway.
What's New in v0.7.12
- Recall latency 0.9s → 0.4-0.8s (~40-60% on top of 0.7.11) — phase-breakdown showed
getAssociationsForBatchover ~300 survivors was 222ms (25% of remaining floor) but the scoring loop only readscount+sumWeightfrom each engram's edges. NewgetAssociationStatsForBatchreturns scalar stats via a single GROUP BY aggregate. Graph walk still uses full associations, but only on top-N (~30) so its lookups are cheap. Recall quality A/B: 8/8 top-1, 4.50/5 top-5. Cumulative since 0.7.4 baseline: 11-23s → 0.4-0.8s (~25× faster median).
What's New in v0.7.11
- Query expansion skip + LRU cache — flan-t5-small was 164ms per recall (18% of post-0.7.10 floor). Two fixes in
core/query-expander.ts: (1) skip heuristic for long/specific queries (>50 chars OR ≥5 distinct meaningful tokens), and (2) 500-entry LRU cache for repeated queries. ~30% of typical agent recalls hit the skip; repeated recalls hit the cache. Avg savings: ~100-150ms per recall. Recall quality A/B: 8/8 top-1, 4.63/5 top-5. Disable viaAWM_DISABLE_EXPANSION_CACHE=1.
What's New in v0.7.10
Recall latency 1.4s → 0.9s median (~35% on top of 0.7.9) — two more fixes after phase-breakdown showed the slim fetch was still 310ms (Buffer→Float32Array on every recall) and the reranker was 354ms (40% of remaining cost):
- In-memory slim cache —
Map<id, SlimCacheEntry>populated once per process, mutated in lock-step with engram writes/updates/retracts. Slim fetch 306ms → 5ms with warm cache (~60×). Disable viaAWM_DISABLE_SLIM_CACHE=1. Memory cost ~15MB at 10K engrams. - Reranker skip on clear winners — when BM25 has a clear top-1 (textMatch ≥ 0.8, ≥1.5× the next score, small pool), skip the cross-encoder. Saves ~300ms on confident queries. Disable via
AWM_DISABLE_RERANK_SKIP=1.
Quality preserved: 8/8 top-1, 4.63/5 top-5, 9.75/10 top-10 on the A/B suite. Cumulative since 0.7.4 baseline: 11s → 0.9s (~12-15× faster).
- In-memory slim cache —
What's New in v0.7.9
- Recall latency 1.6s → 1.0s end-to-end (~30% on top of 0.7.7) — phase-breakdown showed the fullSELECT-over-10K-engrams was the new bottleneck (440ms / 40% of recall) due to row materialization of content/tags/JSON for rows the pre-filter doesn't read. Two-pass fetch: slim
(id, concept, embedding)for the cosine + filter pass, then hydrate only the survivors viagetEngramsByIds. Recall quality A/B verified 8/8 top-1 identical, top-K overlap slightly improved (4.75/5 vs 4.50). Cumulative since 0.7.4 baseline: 11-23s → 1.0-1.6s (~10-20× faster).
What's New in v0.7.8
- Install template updated for the 0.7.5/0.7.6/0.7.7 behaviors —
awm setupnow writes a richer CLAUDE.md that teaches agents about memory classes (canonical | working | ephemeral), salience auto-promotion patterns (detectUserFeedbackfor stakeholder quotes,detectVerifiedFindingfor operational records with action-verb + concrete IDs), and the new env-var escape hatches. Existing installs upgrade vianpm install -g agent-working-memory@latest && awm setup --globalthen restart Claude Code. No functional code change in this release — version bumped solely so the new template ships.
What's New in v0.7.7
- Recall latency 2.5s → 1.0s end-to-end (~50% on top of 0.7.6) — phase-breakdown spike showed that after the 0.7.6 BM25 fix, the new bottleneck was
getAssociationsForBatchover all ~10K candidates (68% of recall latency). Added a cheap pre-filter before deep scoring: candidates survive only if they have a BM25 hit, a cosine z-score above the gate, or concept-token overlap with the query. From ~10K candidates → typically 100-300 survivors. Graph-walk correctness preserved (it only boosts neighbors withtextMatch >= 0.05, which would also pass this filter). Recall quality A/B verified: 8/8 top-1 matches, 90% top-5 overlap, 94% top-10 overlap on diverse queries. SetAWM_DISABLE_POOL_FILTER=1to revert. Cumulative since 0.7.4 baseline: 11-23s → 0.9-1.6s (~10-15× faster).
What's New in v0.7.6
- Recall latency 11-23s → 2.5s end-to-end (~5× faster) — measurement spike found the slow path was a SQLite query-plan trap, not vector search. The BM25 query
JOIN engrams_fts ON e.rowid + WHERE MATCH + ORDER BY rank LIMIT Nmaterialized all matching rows (with 1.5KB embedding blobs) before the LIMIT applied. CTE prefilter forces FTS5 LIMIT first, then joins only the top-K rowids. Same SQLite, same data, same results — 567× faster for wide OR queries (3682ms → 6.5ms verified). Also addedgetAssociationsForBatchto replace the per-candidate N+1 in the activation scoring loop. Top-K results are byte-identical to the old query (verified by the equivalence test inspike/). - Salience filter — auto-promote verified operational records — operational batch summaries (e.g., "Submitted 6 events 2026-05-07 — IDs 18969, 18971…") were being discarded at salience 0.14 because BM25 novelty couldn't distinguish "useful new operational record" from "duplicate observation" when topic terminology repeated. New
detectVerifiedFinding()pattern detector parallel todetectUserFeedback(): requires action-verb header (Submitted/Finalized/Completed/Reconciled/Triaged/etc.) plus ≥2 concrete identifiers (ISO date or contextual numeric ID). Matched memories get a 0.45 salience floor (active disposition, not canonical). 7 new tests, 23 salience tests pass.
What's New in v0.7.4
- Channel push telemetry — new
GET /telemetry/channelsJSON endpoint and Prometheus counters (coord_channel_push_attempts_total,..._delivered_total,..._failed_total{reason},..._no_session_total,..._fallback_mailbox_total,..._session_disconnects_total). Surfaces real delivery rate so coordination reliability can be measured rather than guessed. - Role-based
/channel/pushaddressing — accepts{role, workspace, message}as alternative to{agentId, message}. Server resolves role+workspace to most-recently-seen alive agent. Lets workers notify the coordinator without hardcoding its UUID (which changes across coordinator restarts). Enables event-driven worker → coordinator hand-off in place of fragile coordinator self-polling. /checkinwrites role on every call — previously the UPDATE on existing rows preserved a stale role from initial registration; now agents can correct their own role via re-checkin./workersJOINs channel sessions —alivefield is nowrecent_pulse OR connected_channel_session. Stops false-dead duplicate-spawn loops where a busy worker's/pulsewent stale during long tool sequences while their channel-server stayed reachable.cleanupStaleruns on a 5-minute schedule — was only invoked manually; now zombie agents get marked dead automatically with a 600s threshold (forgiving for long edits).user_feedbacksalience event type — new event type with bonus 0.3 (highest of any). Auto-detect heuristic onmemory_writecontent matching^(Robert|Katherine|Nancy|...) (said|verbatim|directed|decided|...)forcesmemoryClass='canonical'so user-stated decisions can't be discarded by the BM25 novelty floor in populated DBs.
v0.7.3
- Salience filter production tuning — fixed BM25 novelty floor that was discarding ~17% salience for most writes in 10K+ engram DBs. Quadratic dampening curve (
max(0.05, 1 - topScore²)); concept-match penalty scoped to last 30 days; floor lowered 0.10 → 0.05. - Maintenance scripts for backup pruning + lme/bench database cleanup.
v0.7.2
- Workspace recall fix (was returning UUIDs not names in v0.7.1 release).
v0.7.1
- Agent-provided metadata tags —
memory_writeacceptsproject,topic,source,confidence_level,session_id,intent. Stored as searchable prefixed tags (proj=X,sid=Z). Session ID tags alone improved LongMemEval recall 3x. - Dual synthesis — consolidation creates two types of summary memories: session summaries (tag-based, for perfect recall) and pattern syntheses (cross-session, for novel recall/creative connections).
- Bulk write + supersession —
POST /memory/write-batchfor batch ingestion withPOST /memory/supersedefor knowledge updates. - LongMemEval benchmark — adapter built, baseline established at 40-50% with gpt-4o-mini.
v0.7.0
- Workspace-scoped recall, validation-gated Hebbian (Kairos), multi-graph traversal (MAGMA), power-law edge decay (DASH).
v0.6.1
- Embedding version tracking, batch backfill, deeper retraction propagation, retrieval timeouts, channel push delivery.
v0.6.0
- Memory taxonomy — memories classified as
episodic,semantic,procedural, orunclassified. Auto-classified on write. Filter by type on recall. - Query-adaptive retrieval — pipeline adapts to query type:
targeted|exploratory|balanced|auto. - Decision propagation — decisions broadcast to coordination layer for cross-agent discovery.
- Eval harness —
npm run evalbenchmarks retrieval, associative, redundancy, and temporal performance. - DB hardening — busy_timeout, integrity check on startup, hot backups every 10 min, WAL checkpoint on shutdown.
See CHANGELOG.md for full details.
Project Status
AWM is in active development (v0.7.15). The core memory pipeline, consolidation system, multi-agent coordination, and MCP integration are stable and used daily in production coding workflows.
- Core retrieval and consolidation: stable
- MCP tools and Claude Code integration: stable
- Multi-agent coordination: stable (v0.6.0)
- Task management: stable
- Hook sidecar and auto-checkpoint: stable
- HTTP API: stable (for custom agents)
- Eval harness: stable (v0.6.0)
See CHANGELOG.md for version history.
