npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

agent-working-memory

v0.7.17

Published

Cognitive memory layer for AI agents — activation-based retrieval, salience filtering, associative connections

Readme

AgentWorkingMemory (AWM)

Persistent working memory for AI agents.

AWM helps agents retain important project knowledge across conversations and sessions. Instead of storing everything and retrieving by similarity alone, it filters for salience, builds associative links between related memories, and periodically consolidates useful knowledge while letting noise fade.

Use it through Claude Code via MCP or as a local HTTP service for custom agents. Everything runs locally: SQLite + ONNX models + Node.js. No cloud, no API keys.

Without AWM

  • Agent forgets earlier architecture decision
  • Suggests Redux after project standardized on Zustand
  • Repeats discussion already settled three days ago
  • Every new conversation starts from scratch

With AWM

  • Recalls prior state-management decision and rationale
  • Surfaces related implementation patterns from past sessions
  • Continues work without re-asking for context
  • Gets more consistent the longer you use it

Quick Start

Node.js 20+ required — check with node --version.

npm install -g agent-working-memory
awm setup --global

Restart Claude Code. That's it — 14 memory tools appear automatically.

Upgrading

npm install -g agent-working-memory@latest
awm setup --global          # Updates MCP config, CLAUDE.md instructions, and hooks

Restart Claude Code after upgrading. Your existing memory database is preserved — all upgrades are backward compatible. New features (metadata tags, workspace recall, synthesis) are opt-in.

From v0.6.x → v0.7.x: The memory_write tool now accepts optional metadata parameters (project, topic, session_id, etc.) that improve recall quality. Re-running awm setup --global updates your CLAUDE.md with instructions for the agent to use them.

First conversation will be ~30 seconds slower while ML models download (~200MB total, cached locally). After that, everything runs on your machine.

For isolated memory per folder, see Separate Memory Pools. For team onboarding, see docs/quickstart.md.


Who this is for

  • Long-running coding agents that need cross-session project knowledge
  • Multi-agent workflows where specialized agents share a common memory
  • Local-first setups where cloud memory is not acceptable
  • Teams using Claude Code who want persistent context without manual notes

What this is not

  • Not a chatbot UI
  • Not a hosted SaaS
  • Not a generic vector database
  • Not a replacement for your source of truth (code, docs, tickets)

Why it's different

Most "memory for AI" projects are vector databases with a retrieval wrapper. AWM goes further:

| | Typical RAG / Vector Store | AWM | |---|---|---| | Storage | Everything | Salience-filtered with low-confidence fallback (novel events go active, borderline enter staging, low-salience stored at reduced confidence) | | Retrieval | Cosine similarity | 10-phase pipeline: dual BM25 (keyword + expanded) + vectors + reranking + graph walk + decay + coref expansion | | Connections | None | Hebbian edges that strengthen when memories co-activate | | Over time | Grows forever, gets noisier | Consolidation: diameter-enforced clustering, cross-topic bridges, synaptic-tagged decay | | Forgetting | Manual cleanup | Cognitive forgetting: unused memories fade, reinforced knowledge persists (access-count modulated) | | Feedback | None | Useful/not-useful signals tune confidence and retrieval rank | | Correction | Delete and re-insert | Retraction: wrong memories invalidated, corrections linked, penalties propagate (depth 2, decaying) | | Graph | None or single graph | Multi-graph: semantic, temporal, causal, entity — independent traversal with fused scoring | | Learning | Unconditional co-activation | Validation-gated: edges strengthen only on positive feedback (Kairos-inspired) | | Noise rejection | None | Multi-channel agreement gate: requires 2+ retrieval channels to agree before returning results | | Duplicates | Stored repeatedly | Reinforce-on-duplicate: near-exact matches boost existing memory instead of creating copies |

The design is based on cognitive science — ACT-R activation decay, Hebbian learning, complementary learning systems, synaptic homeostasis, and synaptic tagging — rather than ad-hoc heuristics. See How It Works and docs/cognitive-model.md for details.


Benchmarks (v0.6.0)

Eval Harness (new in v0.6.0)

| Suite | Score | Threshold | What it tests | |-------|-------|-----------|---------------| | Retrieval | Recall@5 = 0.800 | >= 0.80 | 200 facts, 50 queries — BM25 + vector + reranker pipeline precision | | Associative | success@10 = 1.000 | >= 0.70 | 20 multi-hop causal chains — graph walk finds non-obvious connections | | Redundancy | dedup F1 = 0.966 | >= 0.80 | 50 clusters × 4 paraphrases — consolidation removes duplicates correctly | | Temporal | Spearman = 0.932 | >= 0.75 | 25 facts with controlled age/access — ACT-R decay ranking accuracy |

Key finding: consolidation improves retrieval by 30% — post-consolidation recall (0.950) exceeds pre-consolidation (0.650). Removing redundant noise helps ranking.

Full Test Suite

| Command | Score | What it tests | |---------|-------|---------------| | npm run eval | 4/4 suites pass | Retrieval, associative, redundancy, temporal benchmarks with ablation support | | npm run test:run | 77/77 tests | Unit tests: salience, decay, hebbian, supersession, coordination | | npm run test:mcp | 5/5 pass | MCP protocol: write, recall, feedback, retract, stats | | npm run test:self | 94.1% EXCELLENT | Pipeline component checks across all cognitive subsystems | | npm run test:edge | All pass | 9 failure modes: narcissistic interference, identity collision, contradiction trapping, bridge overshoot, noise forgetting | | npm run test:stress | 96.2% (50/52) | 500 memories, 100 sleep cycles, catastrophic forgetting, adversarial spam, recovery | | npm run test:workday | 93.3% EXCELLENT | 43 memories across 4 projects, cross-cutting queries, noise filtering | | npm run test:ab | AWM 20/22 vs Baseline 18/22 | AWM outperforms keyword baseline on architecture + testing topics | | npm run test:sleep | 71.4% | 60 memories, 4 topic clusters, consolidation impact across 3 cycles | | npm run test:tokens | 56.3% savings, 2.3x efficiency | Memory-guided context vs full history, keyword accuracy 72.5% | | npm run test:pilot | 14/15 pass | Production-like queries with noise rejection (5/5 noise rejected) | | npm run test:locomo | 28.2% | Industry-standard LoCoMo conversational memory benchmark (1,986 QA pairs) |

Consolidation Health (v0.6.0)

| Metric | Value | |--------|-------| | Topic clusters formed | 10 per consolidation cycle | | Cross-topic bridges | 20 in first cycle | | Edges strengthened | 135 per cycle (access-weighted) | | Graph size at scale | 3,000-4,500 edges (500 memories) | | Recall after 100 cycles | 90% stable | | Catastrophic forgetting survival | 5/5 (100%) | | Post-dedup retrieval | 0.950 (consolidation improves recall) |

All evals are reproducible. See Testing & Evaluation.


Features

Memory Tools (14)

| Tool | Purpose | |------|---------| | memory_write | Store a memory (salience filter + reinforce-on-duplicate) | | memory_recall | Retrieve relevant memories by context (dual BM25 + coref expansion) | | memory_feedback | Report whether a recalled memory was useful | | memory_retract | Invalidate a wrong memory with optional correction | | memory_supersede | Replace outdated memory with current version | | memory_stats | View memory health metrics and activity | | memory_checkpoint | Save execution state (survives context compaction) | | memory_restore | Recover state + relevant context at session start | | memory_task_add | Create a prioritized task | | memory_task_update | Change task status/priority | | memory_task_list | List tasks by status | | memory_task_next | Get the highest-priority actionable task | | memory_task_begin | Start a task — auto-checkpoints and recalls context | | memory_task_end | End a task — writes summary and checkpoints |

Separate Memory Pools

By default, all projects share one memory pool. For isolated pools per folder, place a .mcp.json in each parent folder with a different AWM_AGENT_ID:

C:\Users\you\work\.mcp.json          -> AWM_AGENT_ID: "work"
C:\Users\you\personal\.mcp.json      -> AWM_AGENT_ID: "personal"

Claude Code uses the closest .mcp.json ancestor. Same database, isolation by agent ID.

Incognito Mode

AWM_INCOGNITO=1 claude

Registers zero tools — Claude doesn't see memory at all. All other tools and MCP servers work normally.

Auto-Checkpoint Hooks

Installed by awm setup --global:

  • Stop — reminds Claude to write/recall after each response
  • PreCompact — auto-checkpoints before context compression
  • SessionEnd — auto-checkpoints and consolidates on close
  • 15-min timer — silent auto-checkpoint while session is active

Auto-Backup

The HTTP server automatically copies the database to a backups/ directory on startup with a timestamp. Cheap insurance against data loss.

Activity Log

tail -f "$(npm root -g)/agent-working-memory/data/awm.log"

Real-time: writes, recalls, reinforcements, checkpoints, consolidation, hook events.

Activity Stats

curl http://127.0.0.1:8401/stats

Returns daily counts: {"writes": 8, "recalls": 9, "hooks": 3, "total": 25}


Memory Invocation Strategy

AWM combines deterministic hooks for guaranteed memory operations at lifecycle transitions with agent-directed usage during active work.

Deterministic triggers (always happen)

| Event | Action | |-------|--------| | Session start | memory_restore — recover state + recall context | | Pre-compaction | Auto-checkpoint via hook sidecar | | Session end | Auto-checkpoint + full consolidation | | Every 15 min | Silent auto-checkpoint (if active) | | Task start | memory_task_begin — checkpoint + recall | | Task end | memory_task_end — summary + checkpoint |

Agent-directed triggers (when these situations occur)

Write memory when:

  • A project decision is made or changed
  • A root cause is discovered
  • A reusable implementation pattern is established
  • A preference, constraint, or requirement is clarified
  • A prior assumption is found to be wrong

Recall memory when:

  • Starting work on a new task or subsystem
  • Re-entering code you haven't touched recently
  • After context compaction
  • After a failed attempt (check if there's prior knowledge)
  • Before refactoring or making architectural changes

Retract when:

  • A stored memory turns out to be wrong or outdated

Feedback when:

  • A recalled memory was used (useful) or irrelevant (not useful)

HTTP API

For custom agents, scripts, or non-Claude-Code workflows:

awm serve                    # From npm install
npx tsx src/index.ts         # From source

Write a memory:

curl -X POST http://localhost:8400/memory/write \
  -H "Content-Type: application/json" \
  -d '{
    "agentId": "my-agent",
    "concept": "Express error handling",
    "content": "Use centralized error middleware as the last app.use()",
    "eventType": "causal",
    "surprise": 0.5,
    "causalDepth": 0.7
  }'

Recall:

curl -X POST http://localhost:8400/memory/activate \
  -H "Content-Type: application/json" \
  -d '{
    "agentId": "my-agent",
    "context": "How should I handle errors in my Express API?"
  }'

How It Works

The Memory Lifecycle

  1. Write — Salience scoring evaluates novelty, surprise, causal depth, and effort. High-salience memories go active; borderline ones enter staging; low-salience stored at reduced confidence for recall fallback. Near-duplicates reinforce existing memories instead of creating copies.

  2. Connect — Vector embedding (BGE-small-en-v1.5, 384d). Temporal edges link to recent memories. Hebbian edges form between co-retrieved memories. Coref expansion resolves pronouns to entity names.

  3. Retrieve — 10-phase pipeline: coref expansion + query expansion + dual BM25 (keyword-stripped + expanded) + semantic vectors + Rocchio pseudo-relevance feedback + ACT-R temporal decay (synaptic-tagged) + Hebbian boost + entity-bridge boost + graph walk + cross-encoder reranking + multi-channel agreement gate.

  4. Consolidate — 7-phase sleep cycle: diameter-enforced clustering (prevents chaining), edge strengthening (access-weighted), cross-topic bridge formation (direct closest-pair), confidence-modulated decay (synaptic tagging extends half-life), synaptic homeostasis, cognitive forgetting, staging sweep. Embedding backfill ensures all memories are clusterable.

  5. Feedback — Useful/not-useful signals adjust confidence, affecting retrieval rank and forgetting resistance.

Cognitive Foundations

  • ACT-R activation decay (Anderson 1993) — memories decay with time, strengthen with use. Synaptic tagging: heavily-accessed memories decay slower (log-scaled).
  • Hebbian learning — co-retrieved memories form stronger associative edges
  • Complementary Learning Systems — fast capture (salience + staging) + slow consolidation (sleep cycle)
  • Synaptic homeostasis — edge weight normalization prevents hub domination
  • Forgetting as feature — noise removal improves signal-to-noise for connected memories
  • Diameter-enforced clustering — prevents semantic chaining (e.g., physics->biophysics->cooking = 1 cluster)
  • Multi-channel agreement — OOD detection requires multiple retrieval channels to agree

Architecture

src/
  core/             # Cognitive primitives
    embeddings.ts     - Local vector embeddings (BGE-small-en-v1.5, 384d)
    reranker.ts       - Cross-encoder passage scoring (ms-marco-MiniLM)
    query-expander.ts - Synonym expansion (flan-t5-small)
    salience.ts       - Write-time importance scoring (novelty + salience + reinforce-on-duplicate)
    decay.ts          - ACT-R temporal activation decay
    hebbian.ts        - Association strengthening/weakening
    logger.ts         - Append-only activity log (data/awm.log)
  engine/           # Processing pipelines
    activation.ts     - 10-phase retrieval pipeline (dual BM25, coref, agreement gate)
    consolidation.ts  - 7-phase sleep cycle (diameter clustering, direct bridging, synaptic tagging)
    connections.ts    - Discover links between memories
    staging.ts        - Weak signal buffer (promote or discard)
    retraction.ts     - Negative memory / corrections
    eviction.ts       - Capacity enforcement
  hooks/
    sidecar.ts        - Hook HTTP server (auto-checkpoint, stats, timer)
  storage/
    sqlite.ts         - SQLite + FTS5 persistence layer
  api/
    routes.ts         - HTTP endpoints (memory + task + system)
  mcp.ts            - MCP server (14 tools, incognito support)
  cli.ts            - CLI (setup, serve, hook config)
  index.ts          - HTTP server entry point (auto-backup on startup)

For detailed architecture including pipeline phases, database schema, and system diagrams, see docs/architecture.md.


Testing & Evaluation

Unit Tests

npx vitest run    # 77 tests (salience, decay, hebbian, supersession)

Eval Harness (v0.6.0)

npm run eval                        # All 4 benchmark suites
npm run eval -- --suite=retrieval   # Single suite
npm run eval -- --bm25-only         # Ablation: BM25 only
npm run eval -- --no-graph-walk     # Ablation: disable graph walk

Suites: retrieval (Recall@5), associative (multi-hop), redundancy (dedup F1), temporal (Spearman vs ACT-R). Ablation flags isolate each pipeline component's contribution.

Full Test Suite

npm run test:mcp      # MCP protocol smoke test (5/5)
npm run test:self     # Pipeline component checks (94.1%)
npm run test:edge     # 9 adversarial failure modes
npm run test:stress   # 500 memories, 100 consolidation cycles (96.2%)
npm run test:workday  # 4-session production simulation (93.3%)
npm run test:ab       # AWM vs baseline comparison
npm run test:sleep    # Consolidation impact measurement
npm run test:tokens   # Token savings analysis (56.3% savings)
npm run test:pilot    # Production-like query validation (14/15)
npm run test:locomo   # LoCoMo industry benchmark (28.2%)

Environment Variables

| Variable | Default | Purpose | |----------|---------|---------| | AWM_PORT | 8400 | HTTP server port | | AWM_DB_PATH | memory.db | SQLite database path | | AWM_AGENT_ID | claude-code | Agent ID (memory namespace) | | AWM_EMBED_MODEL | Xenova/bge-small-en-v1.5 | Embedding model (retrieval-optimized) | | AWM_EMBED_DIMS | 384 | Embedding dimensions | | AWM_RERANKER_MODEL | Xenova/ms-marco-MiniLM-L-6-v2 | Reranker model | | AWM_HOOK_PORT | 8401 | Hook sidecar port | | AWM_HOOK_SECRET | (none) | Bearer token for hook auth | | AWM_API_KEY | (none) | Bearer token for HTTP API auth | | AWM_INCOGNITO | (unset) | Set to 1 to disable all tools | | AWM_COORDINATION | (unset) | Set to true to enable hive coordination endpoints | | AWM_DISABLE_POOL_FILTER | (unset) | Set to 1 to disable the candidate pool reduction (0.7.7+). Reverts recall to scoring all active candidates — slower but useful for A/B testing if a recall regression appears | | AWM_DISABLE_SLIM_CACHE | (unset) | Set to 1 to disable the in-memory slim cache (0.7.10+). Reverts to per-recall SQL fetch — slower but useful if cache invariants are suspected of drift | | AWM_DISABLE_RERANK_SKIP | (unset) | Set to 1 to disable the reranker skip on clear-winner queries (0.7.10+). Forces every recall through the cross-encoder | | AWM_DISABLE_EXPANSION_CACHE | (unset) | Set to 1 to disable the query expansion skip heuristic + LRU cache (0.7.11+). Forces every recall through the flan-t5-small expander | | AWM_WORKSPACE | (unset) | Default workspace for cross-agent recall in hive setups |

Tech Stack

| Component | Technology | |-----------|-----------| | Language | TypeScript (ES2022, strict) | | Database | SQLite via better-sqlite3 + FTS5 | | HTTP | Fastify 5 | | MCP | @modelcontextprotocol/sdk | | ML Runtime | @huggingface/transformers (local ONNX) | | Embeddings | BGE-small-en-v1.5 (BAAI, retrieval-optimized, 384d) | | Reranker | ms-marco-MiniLM-L-6-v2 (cross-encoder) | | Query Expansion | flan-t5-small (synonym generation) | | Tests | Vitest 4 | | Validation | Zod 4 |

All three ML models run locally via ONNX. No external API calls for retrieval. The entire system is a single SQLite file + a Node.js process.

What's New in v0.7.16

  • awm setup --global template now teaches write quality. Two new sections in AWM_INSTRUCTION_CONTENT:

    • Writing for recall — explicit guidance that recall quality is determined at write time. Lead with the rule/fact, pick the most specific topic, include 2+ retrievable identifiers (file paths, function names, IDs), write in the vocabulary of the future query, reserve canonical for stable invariants, include the why for feedback memories.
    • Recall strategy — formalizes the multi-query reformulation pattern observed in practice. When one query returns nothing, agents reformulate (synonyms, more specific nouns, exact identifiers). Recall is ~300ms — two-three reformulations cost less than one filesystem search. Cap at three to prevent loops.

    These document the writer + reader behaviors AWM was always designed around but were previously implicit. No retriever change — pure system-prompt improvement. Run npm install -g agent-working-memory@latest && awm setup --global to apply.

  • LongMemEval headline number updated. Re-running the benchmark on 0.7.16 (single-session-user, 50 questions, same adapter as the original 0.7.1 baseline): 68% accuracy with gpt-4o-mini, up from the original 40-50%. Recall latency 0.12s avg (was 7-11s on 0.7.2). Multi-tier reader sweep on the same memory inputs:

    • gpt-4o-mini (cheap, non-thinking): 68%
    • gpt-4o (strong, non-thinking): 68%
    • o4-mini (cheap, thinking): 78%
    • gpt-5-mini (mid, thinking): 80%

    Non-thinking models cap at 68% on this category — the bottleneck is reasoning over recalled context, not raw scale. Thinking models add 10-12pp. Memory quality is fixed; reader determines the ceiling.

What's New in v0.7.15

  • Documentation refreshawm setup --global now writes a CLAUDE.md template that documents all four perf env-var escape hatches (AWM_DISABLE_POOL_FILTER, AWM_DISABLE_SLIM_CACHE, AWM_DISABLE_RERANK_SKIP, AWM_DISABLE_EXPANSION_CACHE) instead of just the first one. Troubleshooting / quickstart / user-guide docs updated to reflect the current ~300ms recall floor. No code change — version bumped solely so the new template ships via npm install -g agent-working-memory@latest.

What's New in v0.7.14

  • Recall latency 0.4-0.8s → 0.3-0.6s (~25-50% on top of 0.7.13) — three fixes:

    1. Batched cross-encoder inference — reranker now tokenizes + runs all query-passage pairs in one batched forward pass. 15-passage rerank: 210ms → 27ms (~7×).
    2. Truncate passages to 400 chars before rerank — cross-encoder has 512-token max anyway and pads to the longest passage; full content (5000+ chars) meant everything padded to max length. Truncation drops tokenization + inference 3-4× on long memory pools.
    3. Eager slim-cache populate at startup — first user recall no longer pays the ~600ms cache populate cost.

    Recall quality A/B: 8/8 top-1, 4.50/5 top-5. Cumulative since 0.7.4 baseline: 11s → 0.3-0.6s (~25-37× faster).

What's New in v0.7.13

  • Reranker pool size reduction — cross-encoder pool dropped from max(limit*3, 30) to max(limit*2, 15). For typical agent queries (limit=5 or 10), that's 15-20 candidates reranked instead of 30, halving the cross-encoder cost. Top-K quality preserved (8/8 top-1, identical top-5/top-10 overlap) — reranking the 21st-30th candidates was wasted when the user only wants top-5 anyway.

What's New in v0.7.12

  • Recall latency 0.9s → 0.4-0.8s (~40-60% on top of 0.7.11) — phase-breakdown showed getAssociationsForBatch over ~300 survivors was 222ms (25% of remaining floor) but the scoring loop only reads count + sumWeight from each engram's edges. New getAssociationStatsForBatch returns scalar stats via a single GROUP BY aggregate. Graph walk still uses full associations, but only on top-N (~30) so its lookups are cheap. Recall quality A/B: 8/8 top-1, 4.50/5 top-5. Cumulative since 0.7.4 baseline: 11-23s → 0.4-0.8s (~25× faster median).

What's New in v0.7.11

  • Query expansion skip + LRU cache — flan-t5-small was 164ms per recall (18% of post-0.7.10 floor). Two fixes in core/query-expander.ts: (1) skip heuristic for long/specific queries (>50 chars OR ≥5 distinct meaningful tokens), and (2) 500-entry LRU cache for repeated queries. ~30% of typical agent recalls hit the skip; repeated recalls hit the cache. Avg savings: ~100-150ms per recall. Recall quality A/B: 8/8 top-1, 4.63/5 top-5. Disable via AWM_DISABLE_EXPANSION_CACHE=1.

What's New in v0.7.10

  • Recall latency 1.4s → 0.9s median (~35% on top of 0.7.9) — two more fixes after phase-breakdown showed the slim fetch was still 310ms (Buffer→Float32Array on every recall) and the reranker was 354ms (40% of remaining cost):

    1. In-memory slim cacheMap<id, SlimCacheEntry> populated once per process, mutated in lock-step with engram writes/updates/retracts. Slim fetch 306ms → 5ms with warm cache (~60×). Disable via AWM_DISABLE_SLIM_CACHE=1. Memory cost ~15MB at 10K engrams.
    2. Reranker skip on clear winners — when BM25 has a clear top-1 (textMatch ≥ 0.8, ≥1.5× the next score, small pool), skip the cross-encoder. Saves ~300ms on confident queries. Disable via AWM_DISABLE_RERANK_SKIP=1.

    Quality preserved: 8/8 top-1, 4.63/5 top-5, 9.75/10 top-10 on the A/B suite. Cumulative since 0.7.4 baseline: 11s → 0.9s (~12-15× faster).

What's New in v0.7.9

  • Recall latency 1.6s → 1.0s end-to-end (~30% on top of 0.7.7) — phase-breakdown showed the fullSELECT-over-10K-engrams was the new bottleneck (440ms / 40% of recall) due to row materialization of content/tags/JSON for rows the pre-filter doesn't read. Two-pass fetch: slim (id, concept, embedding) for the cosine + filter pass, then hydrate only the survivors via getEngramsByIds. Recall quality A/B verified 8/8 top-1 identical, top-K overlap slightly improved (4.75/5 vs 4.50). Cumulative since 0.7.4 baseline: 11-23s → 1.0-1.6s (~10-20× faster).

What's New in v0.7.8

  • Install template updated for the 0.7.5/0.7.6/0.7.7 behaviorsawm setup now writes a richer CLAUDE.md that teaches agents about memory classes (canonical | working | ephemeral), salience auto-promotion patterns (detectUserFeedback for stakeholder quotes, detectVerifiedFinding for operational records with action-verb + concrete IDs), and the new env-var escape hatches. Existing installs upgrade via npm install -g agent-working-memory@latest && awm setup --global then restart Claude Code. No functional code change in this release — version bumped solely so the new template ships.

What's New in v0.7.7

  • Recall latency 2.5s → 1.0s end-to-end (~50% on top of 0.7.6) — phase-breakdown spike showed that after the 0.7.6 BM25 fix, the new bottleneck was getAssociationsForBatch over all ~10K candidates (68% of recall latency). Added a cheap pre-filter before deep scoring: candidates survive only if they have a BM25 hit, a cosine z-score above the gate, or concept-token overlap with the query. From ~10K candidates → typically 100-300 survivors. Graph-walk correctness preserved (it only boosts neighbors with textMatch >= 0.05, which would also pass this filter). Recall quality A/B verified: 8/8 top-1 matches, 90% top-5 overlap, 94% top-10 overlap on diverse queries. Set AWM_DISABLE_POOL_FILTER=1 to revert. Cumulative since 0.7.4 baseline: 11-23s → 0.9-1.6s (~10-15× faster).

What's New in v0.7.6

  • Recall latency 11-23s → 2.5s end-to-end (~5× faster) — measurement spike found the slow path was a SQLite query-plan trap, not vector search. The BM25 query JOIN engrams_fts ON e.rowid + WHERE MATCH + ORDER BY rank LIMIT N materialized all matching rows (with 1.5KB embedding blobs) before the LIMIT applied. CTE prefilter forces FTS5 LIMIT first, then joins only the top-K rowids. Same SQLite, same data, same results — 567× faster for wide OR queries (3682ms → 6.5ms verified). Also added getAssociationsForBatch to replace the per-candidate N+1 in the activation scoring loop. Top-K results are byte-identical to the old query (verified by the equivalence test in spike/).
  • Salience filter — auto-promote verified operational records — operational batch summaries (e.g., "Submitted 6 events 2026-05-07 — IDs 18969, 18971…") were being discarded at salience 0.14 because BM25 novelty couldn't distinguish "useful new operational record" from "duplicate observation" when topic terminology repeated. New detectVerifiedFinding() pattern detector parallel to detectUserFeedback(): requires action-verb header (Submitted/Finalized/Completed/Reconciled/Triaged/etc.) plus ≥2 concrete identifiers (ISO date or contextual numeric ID). Matched memories get a 0.45 salience floor (active disposition, not canonical). 7 new tests, 23 salience tests pass.

What's New in v0.7.4

  • Channel push telemetry — new GET /telemetry/channels JSON endpoint and Prometheus counters (coord_channel_push_attempts_total, ..._delivered_total, ..._failed_total{reason}, ..._no_session_total, ..._fallback_mailbox_total, ..._session_disconnects_total). Surfaces real delivery rate so coordination reliability can be measured rather than guessed.
  • Role-based /channel/push addressing — accepts {role, workspace, message} as alternative to {agentId, message}. Server resolves role+workspace to most-recently-seen alive agent. Lets workers notify the coordinator without hardcoding its UUID (which changes across coordinator restarts). Enables event-driven worker → coordinator hand-off in place of fragile coordinator self-polling.
  • /checkin writes role on every call — previously the UPDATE on existing rows preserved a stale role from initial registration; now agents can correct their own role via re-checkin.
  • /workers JOINs channel sessionsalive field is now recent_pulse OR connected_channel_session. Stops false-dead duplicate-spawn loops where a busy worker's /pulse went stale during long tool sequences while their channel-server stayed reachable.
  • cleanupStale runs on a 5-minute schedule — was only invoked manually; now zombie agents get marked dead automatically with a 600s threshold (forgiving for long edits).
  • user_feedback salience event type — new event type with bonus 0.3 (highest of any). Auto-detect heuristic on memory_write content matching ^(Robert|Katherine|Nancy|...) (said|verbatim|directed|decided|...) forces memoryClass='canonical' so user-stated decisions can't be discarded by the BM25 novelty floor in populated DBs.

v0.7.3

  • Salience filter production tuning — fixed BM25 novelty floor that was discarding ~17% salience for most writes in 10K+ engram DBs. Quadratic dampening curve (max(0.05, 1 - topScore²)); concept-match penalty scoped to last 30 days; floor lowered 0.10 → 0.05.
  • Maintenance scripts for backup pruning + lme/bench database cleanup.

v0.7.2

  • Workspace recall fix (was returning UUIDs not names in v0.7.1 release).

v0.7.1

  • Agent-provided metadata tagsmemory_write accepts project, topic, source, confidence_level, session_id, intent. Stored as searchable prefixed tags (proj=X, sid=Z). Session ID tags alone improved LongMemEval recall 3x.
  • Dual synthesis — consolidation creates two types of summary memories: session summaries (tag-based, for perfect recall) and pattern syntheses (cross-session, for novel recall/creative connections).
  • Bulk write + supersessionPOST /memory/write-batch for batch ingestion with POST /memory/supersede for knowledge updates.
  • LongMemEval benchmark — adapter built, baseline established at 40-50% with gpt-4o-mini.

v0.7.0

  • Workspace-scoped recall, validation-gated Hebbian (Kairos), multi-graph traversal (MAGMA), power-law edge decay (DASH).

v0.6.1

  • Embedding version tracking, batch backfill, deeper retraction propagation, retrieval timeouts, channel push delivery.

v0.6.0

  • Memory taxonomy — memories classified as episodic, semantic, procedural, or unclassified. Auto-classified on write. Filter by type on recall.
  • Query-adaptive retrieval — pipeline adapts to query type: targeted | exploratory | balanced | auto.
  • Decision propagation — decisions broadcast to coordination layer for cross-agent discovery.
  • Eval harnessnpm run eval benchmarks retrieval, associative, redundancy, and temporal performance.
  • DB hardening — busy_timeout, integrity check on startup, hot backups every 10 min, WAL checkpoint on shutdown.

See CHANGELOG.md for full details.

Project Status

AWM is in active development (v0.7.15). The core memory pipeline, consolidation system, multi-agent coordination, and MCP integration are stable and used daily in production coding workflows.

  • Core retrieval and consolidation: stable
  • MCP tools and Claude Code integration: stable
  • Multi-agent coordination: stable (v0.6.0)
  • Task management: stable
  • Hook sidecar and auto-checkpoint: stable
  • HTTP API: stable (for custom agents)
  • Eval harness: stable (v0.6.0)

See CHANGELOG.md for version history.


License

Apache 2.0 — see LICENSE and NOTICE.