agent-working-memory

v0.11.0

Published

16 days ago

Cognitive memory layer for AI agents â€” activation-based retrieval, salience filtering, associative connections

0High
0Medium
0Low

completeideas

AgentWorkingMemory (AWM)

Persistent working memory for AI agents.

AWM helps agents retain important project knowledge across conversations and sessions. Instead of storing everything and retrieving by similarity alone, it filters for salience, builds associative links between related memories, and periodically consolidates useful knowledge while letting noise fade.

Use it through Claude Code via MCP or as a local HTTP service for custom agents. Everything runs locally: SQLite + ONNX models + Node.js. No cloud, no API keys.

Without AWM

Agent forgets earlier architecture decision
Suggests Redux after project standardized on Zustand
Repeats discussion already settled three days ago
Every new conversation starts from scratch

With AWM

Recalls prior state-management decision and rationale
Surfaces related implementation patterns from past sessions
Continues work without re-asking for context
Gets more consistent the longer you use it

Quick Start

Node.js 20+ required — check with node --version.

npm install -g agent-working-memory
awm setup --global

Restart Claude Code. That's it — 16 memory tools appear automatically.

Upgrading

npm install -g agent-working-memory@latest
awm setup --global          # Updates MCP config, CLAUDE.md instructions, and hooks

Restart Claude Code after upgrading. Your existing memory database is preserved — all upgrades are backward compatible. New features (metadata tags, workspace recall, synthesis) are opt-in.

From v0.6.x → v0.7.x: The memory_write tool now accepts optional metadata parameters (project, topic, session_id, etc.) that improve recall quality. Re-running awm setup --global updates your CLAUDE.md with instructions for the agent to use them.

First conversation will be ~30 seconds slower while ML models download (~200MB total, cached locally). After that, everything runs on your machine.

For isolated memory per folder, see Separate Memory Pools. For team onboarding, see docs/quickstart.md.

Starting on an existing project? Warm-start the store from its own docs so recall is useful immediately: awm onboard ./docs --repo . --project <name> → review the pack → awm import <pack> --db <path> --dedupe. See What's New in v0.11.0.

Who this is for

Long-running coding agents that need cross-session project knowledge
Multi-agent workflows where specialized agents share a common memory
Local-first setups where cloud memory is not acceptable
Teams using Claude Code who want persistent context without manual notes

What this is not

Not a chatbot UI
Not a hosted SaaS
Not a generic vector database
Not a replacement for your source of truth (code, docs, tickets)

Why it's different

Most "memory for AI" projects are vector databases with a retrieval wrapper. AWM goes further:

| | Typical RAG / Vector Store | AWM | |---|---|---| | Storage | Everything | Salience-filtered with low-confidence fallback (novel events go active, borderline enter staging, low-salience stored at reduced confidence) | | Retrieval | Cosine similarity | 10-phase pipeline: dual BM25 (keyword + expanded) + vectors + reranking + graph walk + decay + coref expansion | | Connections | None | Hebbian edges that strengthen when memories co-activate | | Over time | Grows forever, gets noisier | Consolidation: diameter-enforced clustering, cross-topic bridges, synaptic-tagged decay | | Forgetting | Manual cleanup | Cognitive forgetting: unused memories fade, reinforced knowledge persists (access-count modulated) | | Feedback | None | Useful/not-useful signals tune confidence and retrieval rank | | Correction | Delete and re-insert | Retraction: wrong memories invalidated, corrections linked, penalties propagate (depth 2, decaying) | | Graph | None or single graph | Multi-graph: semantic, temporal, causal, entity — independent traversal with fused scoring | | Learning | Unconditional co-activation | Validation-gated: edges strengthen only on positive feedback (Kairos-inspired) | | Noise rejection | None | Multi-channel agreement gate: requires 2+ retrieval channels to agree before returning results | | Duplicates | Stored repeatedly | Reinforce-on-duplicate: near-exact matches boost existing memory instead of creating copies |

The design is based on cognitive science — ACT-R activation decay, Hebbian learning, complementary learning systems, synaptic homeostasis, and synaptic tagging — rather than ad-hoc heuristics. See How It Works and docs/cognitive-model.md for details.

New to AWM? docs/pipeline-walkthrough.html is a visual, plain-language walkthrough (no background required) — what happens when AWM learns and recalls a fact, why it's built this way, and how it differs from a plain vector store. Open it in a browser.

Build an agent on it: the AWM-Native Agent Harness pattern shows how to use AWM as an always-on cognitive substrate (not a tool the model calls) so the agent learns automatically by working — letting a cheap model perform at a high level and get cheaper + better over time. Measured: gpt-5.4-mini + AWM beat a frontier model on a domain workload at ~1/40th the cost.

For builders & researchers: docs/awm-for-agents.html is the agent playbook — why AWM exists (the context-window wall), the PRIME→ACT→VERIFY→LEARN harness, the full agent feature surface (workspace, session IDs, bearer-token hooks, supersede/feedback), how multi-hop is solved in the harness, and the honest gauntlet findings (where AWM wins, ties, and what isn't measured yet). Open it in a browser.

Why it matters at scale

The reason AWM exists: past roughly half a million tokens, you can no longer keep a large project's context alive by carrying it. The codebase, the docs, the decision history, and the meeting/work transcripts outgrow every model's window — and summarizing to fit silently drops the fact you needed next.

These figures come from real-world use on a large software platform project, where a single work agent has accumulated 20,000+ memories over a multi-million-token codebase and documentation set:

| To answer one question, carry… | tokens | AWM scoped recall | |---|---|---| | the accumulated memory (~20K memories) | ~1.3M | ~630, flat | | the project's notes & transcript docs | ~2M | ~630, flat | | the whole system (code + docs) | ~29M — fits in no window, any tier | ~630, flat |

A scoped recall answers from the relevant slice, independent of how large the store grows. Measured consequences on real questions against the real project:

~2,000× fewer tokens per query than carrying the memory store — and ~5× fewer than opening the single best-matching documentation file (a floor; agents usually open several and still miss cross-file facts).
At scale, "carry everything" isn't an option. At ~20K memories no context window holds it, so retrieval is not an optimization — it is the only door. A static notes file or long-context approach is forced to truncate, which silently drops facts.

Two structural advantages a file or a flat vector store cannot match:

Staleness is tracked. When a fact changes, memory_supersede retires the old value and recall stops returning it — the system knows what changed. A notes file or repo goes stale silently; you would re-scan everything to find out. (This work agent has superseded and retracted dozens of facts as the project moved.)
Dead weight costs nothing. ~90% of accumulated memories are never recalled for a given task — a notes file pays for all of them in every prompt; recall pays for ≈zero.

Honest about the trade-offs

AWM does not win on small, one-shot tasks — write/recall overhead exceeds the savings until knowledge is reused or the corpus grows past what fits in context.
Recall is not free: a few seconds of latency per query buys the token reduction.
Recall accuracy is bounded by what was written — write quality matters (lead with the fact; tag with identifiers like file, table, ticket).
It does not replace your source of truth. The intended pattern is: recall first, read/grep the code for ground truth on a miss, and supersede when reality differs.

Benchmarks

Two kinds of tests, both reproducible (see Testing & Evaluation). First, recall quality — does the pipeline return the right memory? Second, behavior under stress — does it stay honest, filter noise, and hold up as the store grows and ages? Numbers below were re-run on the 0.9-staged line (2026-06-17).

1 · Recall quality (eval harness)

Each suite has a pass threshold; all four pass.

| Suite | Score | Threshold | What it measures | |-------|-------|-----------|------------------| | Retrieval | Recall@5 = 0.980 | ≥ 0.80 | 200 facts, 50 queries — does the BM25 + vector + reranker pipeline surface the right fact in the top 5? | | Associative | success@10 = 1.000 | ≥ 0.70 | 20 multi-hop causal chains — does the graph walk find non-obvious connections? | | Redundancy | dedup F1 = 0.966 | ≥ 0.80 | 50 clusters × 4 paraphrases — does consolidation merge duplicates without losing the original? | | Temporal | Spearman = 0.932 | ≥ 0.75 | 25 facts with controlled age/access — does ACT-R decay rank recent/used memories ahead of stale ones? |

2 · Behavior under stress & adversarial conditions

These are graded suites (not pass/fail). The headline risk they guard against is a memory system that confidently returns the wrong thing — so the weakest area is called out, not hidden.

| Suite | Score | What it measures | |-------|-------|------------------| | test:run (unit) | 569 / 569 | Salience, decay, Hebbian, supersession, coordination, scheduler | | test:self | 93.9% (EXCELLENT) | Every cognitive subsystem end-to-end; weakest = exact-topic retrieval | | test:workday | 85.4% (GOOD) | A realistic mixed day — 43 memories across 4 projects, cross-cutting queries; weakest = noise filtering | | test:edge | ~32 / 34 | Named failure modes: identity collision, contradiction trapping, bridge overshoot, false generalization | | test:ab | AWM 10 / 11 vs keyword baseline 8 / 11 | Where the cognitive pipeline beats plain keyword search | | test:pilot | 14 / 15 (5/5 noise rejected) | Production-like queries that must reject planted distractors | | test:locomo | 25.7% | LoCoMo conversational-memory benchmark (a chatbot benchmark — see note) | | test:mcp | 5 / 5 | MCP protocol smoke: write, recall, feedback, retract, stats |

On LoCoMo (25.7%): LoCoMo measures chatbot recall ("what did we say about X" across long conversations). It is not the workload AWM is tuned for (productivity / engineering, staying on topic, rejecting noise), and ~66% of AWM's misses there are retriever-coverage (the gold turn isn't in the top-10), not extraction. The 0.9 recall work lifted it from 22.7% with every category up. We report it for comparability, not as the headline.

3 · The sleep cycle (consolidation)

The sleep cycle is AWM's offline maintenance pass (the term is borrowed from how human memory consolidates during sleep). On each cycle it clusters related memories, builds cross-topic bridges, strengthens co-used edges, decays unused ones, and prunes duplicates. You run it so the association graph stays healthy and navigable as the store grows — without it, edges accumulate into noise.

Reading the score: test:sleep = 78.6% is a consolidation-quality score — it asks "after the maintenance pass, is recall at least as good and is the structure better?" It is not recall falling to 78.6%. In this fixture recall is held flat across three cycles (78.6% before = 78.6% after) while the graph reorganizes. The scaling picture is the real proof:

| Under a 100-cycle stress run | Observed | |---|---| | Recall across cycles | holds 90–100% (no catastrophic forgetting) | | Cross-topic recall | ~80%, stable | | Graph self-pruning | edges grow to ~2,300 then prune back to ~1,500 as unused links decay | | Clusters / bridges per cycle | ~10 clusters, bridges formed early then settle |

So consolidation protects recall over the long run — the per-cycle score measures the health of the maintenance, and the stress run shows recall doesn't degrade.

4 · Token economics — honest

The win that matters is structural: at the scale AWM targets you can't carry the project at all (see Why it matters at scale). On real coding sessions, scoped recall costs 9.8× less in aggregate than the Read/Grep/Glob rediscovery it replaces (scripts/measure-claude-vs-awm.ts).

The per-turn micro-benchmark (test:tokens) reports against two baselines, because the baseline you pick is the result:

vs carrying the full history (what a memoryless agent must actually do — it can't know which past turn matters): +67% savings at 97.5% recall accuracy. This is the honest, apples-to-apples number.
vs an oracle that pre-scoped context to the exactly-relevant task: ≈ −13%. A deliberately brutal bar — it gives the baseline the very scoping that retrieval exists to do — and on a tiny 6–8-turn task a fixed top-5 recall is break-even-to-negative by construction.

An earlier build reported ~56% on the oracle bar, but that was an artifact: pre-v0.8.5, reinforce-on-duplicate silently discarded memory content, so recalls were artificially tiny. v0.8.5 fixed the data loss (accuracy ~72% → 97.5%); better recall now fills all five slots, which lowers the oracle-bar number while raising correctness. Net: the at-scale structural win above is the real story; the oracle bar shows AWM roughly matches perfect manual scoping even on a corpus far too small to play to its strengths.

Features

Memory Tools (16)

| Tool | Purpose | |------|---------| | memory_write | Store a memory (salience filter + reinforce-on-duplicate) | | memory_recall | Retrieve relevant memories by context (dual BM25 + coref expansion) | | memory_feedback | Report whether a recalled memory was useful | | memory_retract | Invalidate a wrong memory with optional correction | | memory_supersede | Replace outdated memory with current version | | memory_stats | View memory health metrics and activity | | memory_checkpoint | Save execution state (survives context compaction) | | memory_restore | Recover state + relevant context at session start | | memory_task_add | Create a prioritized task | | memory_task_update | Change task status/priority | | memory_task_list | List tasks by status | | memory_task_next | Get the highest-priority actionable task | | memory_task_begin | Start a task — auto-checkpoints and recalls context | | memory_task_end | End a task — writes summary and checkpoints | | compress_output | Encode a structured tool output as TOON — ~50-65% fewer tokens, lossless, output-only | | retrieve_original | Get the verbatim source back for a compress_output ref |

Separate Memory Pools

By default, all projects share one memory pool. For isolated pools per folder, place a .mcp.json in each parent folder with a different AWM_AGENT_ID:

C:\Users\you\work\.mcp.json          -> AWM_AGENT_ID: "work"
C:\Users\you\personal\.mcp.json      -> AWM_AGENT_ID: "personal"

Claude Code uses the closest .mcp.json ancestor. Same database, isolation by agent ID.

Incognito Mode

AWM_INCOGNITO=1 claude

Registers zero tools — Claude doesn't see memory at all. All other tools and MCP servers work normally.

Auto-Checkpoint Hooks

Installed by awm setup --global:

Stop — reminds Claude to write/recall after each response
PreCompact — auto-checkpoints before context compression
SessionEnd — auto-checkpoints and consolidates on close
15-min timer — silent auto-checkpoint while session is active

Auto-Backup

The HTTP server automatically copies the database to a backups/ directory on startup with a timestamp. Cheap insurance against data loss.

Activity Log

tail -f "$(npm root -g)/agent-working-memory/data/awm.log"

Real-time: writes, recalls, reinforcements, checkpoints, consolidation, hook events.

Activity Stats

curl http://127.0.0.1:8401/stats

Returns daily counts: {"writes": 8, "recalls": 9, "hooks": 3, "total": 25}

Memory Invocation Strategy

AWM combines deterministic hooks for guaranteed memory operations at lifecycle transitions with agent-directed usage during active work.

Deterministic triggers (always happen)

| Event | Action | |-------|--------| | Session start | memory_restore — recover state + recall context | | Pre-compaction | Auto-checkpoint via hook sidecar | | Session end | Auto-checkpoint + full consolidation | | Every 15 min | Silent auto-checkpoint (if active) | | Task start | memory_task_begin — checkpoint + recall | | Task end | memory_task_end — summary + checkpoint |

Agent-directed triggers (when these situations occur)

Write memory when:

A project decision is made or changed
A root cause is discovered
A reusable implementation pattern is established
A preference, constraint, or requirement is clarified
A prior assumption is found to be wrong

Recall memory when:

Starting work on a new task or subsystem
Re-entering code you haven't touched recently
After context compaction
After a failed attempt (check if there's prior knowledge)
Before refactoring or making architectural changes

Retract when:

A stored memory turns out to be wrong or outdated

Feedback when:

A recalled memory was used (useful) or irrelevant (not useful)

HTTP API

For custom agents, scripts, or non-Claude-Code workflows:

awm serve                    # From npm install
npx tsx src/index.ts         # From source

Write a memory:

curl -X POST http://localhost:8400/memory/write \
  -H "Content-Type: application/json" \
  -d '{
    "agentId": "my-agent",
    "concept": "Express error handling",
    "content": "Use centralized error middleware as the last app.use()",
    "eventType": "causal",
    "surprise": 0.5,
    "causalDepth": 0.7
  }'

Recall:

curl -X POST http://localhost:8400/memory/activate \
  -H "Content-Type: application/json" \
  -d '{
    "agentId": "my-agent",
    "context": "How should I handle errors in my Express API?"
  }'

Substrate primitives (new in 0.8)

For long-running structured projects — novels, codebases, investigations, design docs — where the agent needs to track typed state across hundreds of writes without polluting cognitive retrieval. Full reference at docs/reference.md.

# "Latest emotional state per character" — one round trip
curl -X POST http://localhost:8400/memory/latest-by-tag -d '{
  "agentId": "novel-x", "tagKey": "character=",
  "scopeTagsAll": ["topic=emotional-state"], "sortBy": "sequence"
}'

# "Top 40 active promises by weight, excluding resolved" — filter + sort native
curl -X POST http://localhost:8400/memory/top-by -d '{
  "agentId": "novel-x", "sortField": "weight=", "order": "desc",
  "filterTagsAll": ["topic=promise", "state=active"],
  "filterTagsNone": ["kind=advancement"], "limit": 40
}'

# Atomic write-and-supersede by concept match (Form B)
curl -X POST http://localhost:8400/memory/supersede -d '{
  "agentId": "novel-x",
  "matchConcept": "Mara's deferred disclosure",
  "newEngram": {
    "concept": "Mara's disclosure — RESOLVED in Ch 3",
    "content": "...", "memory_class": "structural"
  }
}'

# Race-free chronology
curl http://localhost:8400/memory/sequence/novel-x/next

New memory_class: "structural" keeps high-volume system-written records (chapter analyses, promise advancements, commit logs) out of cognitive /activate while preserving them with canonical-level salience. See the CHANGELOG entry for 0.8.0 for the full design.

How It Works

The Memory Lifecycle

Write — Salience scoring evaluates novelty, surprise, causal depth, and effort. High-salience memories go active; borderline ones enter staging; low-salience stored at reduced confidence for recall fallback. Near-duplicates reinforce existing memories instead of creating copies.
Connect — Vector embedding (BGE-small-en-v1.5, 384d). Temporal edges link to recent memories. Hebbian edges form between co-retrieved memories. Coref expansion resolves pronouns to entity names.
Retrieve — 10-phase pipeline: coref expansion + query expansion + dual BM25 (keyword-stripped + expanded) + semantic vectors + Rocchio pseudo-relevance feedback + ACT-R temporal decay (synaptic-tagged) + Hebbian boost + entity-bridge boost + graph walk + cross-encoder reranking + multi-channel agreement gate.
Consolidate — 7-phase sleep cycle: diameter-enforced clustering (prevents chaining), edge strengthening (access-weighted), cross-topic bridge formation (direct closest-pair), confidence-modulated decay (synaptic tagging extends half-life), synaptic homeostasis, cognitive forgetting, staging sweep. Embedding backfill ensures all memories are clusterable.
Feedback — Useful/not-useful signals adjust confidence, affecting retrieval rank and forgetting resistance.

Cognitive Foundations

ACT-R activation decay (Anderson 1993) — memories decay with time, strengthen with use. Synaptic tagging: heavily-accessed memories decay slower (log-scaled).
Hebbian learning — co-retrieved memories form stronger associative edges
Complementary Learning Systems — fast capture (salience + staging) + slow consolidation (sleep cycle)
Synaptic homeostasis — edge weight normalization prevents hub domination
Forgetting as feature — noise removal improves signal-to-noise for connected memories
Diameter-enforced clustering — prevents semantic chaining (e.g., physics->biophysics->cooking = 1 cluster)
Multi-channel agreement — OOD detection requires multiple retrieval channels to agree

Architecture

src/
  core/             # Cognitive primitives
    embeddings.ts     - Local vector embeddings (BGE-small-en-v1.5, 384d)
    reranker.ts       - Cross-encoder passage scoring (ms-marco-MiniLM)
    query-expander.ts - Synonym expansion (flan-t5-small)
    salience.ts       - Write-time importance scoring (novelty + salience + reinforce-on-duplicate)
    decay.ts          - ACT-R temporal activation decay
    hebbian.ts        - Association strengthening/weakening
    logger.ts         - Append-only activity log (data/awm.log)
  engine/           # Processing pipelines
    activation.ts     - 10-phase retrieval pipeline (dual BM25, coref, agreement gate)
    consolidation.ts  - 7-phase sleep cycle (diameter clustering, direct bridging, synaptic tagging)
    connections.ts    - Discover links between memories
    staging.ts        - Weak signal buffer (promote or discard)
    retraction.ts     - Negative memory / corrections
    eviction.ts       - Capacity enforcement
  hooks/
    sidecar.ts        - Hook HTTP server (auto-checkpoint, stats, timer)
  storage/
    sqlite.ts         - SQLite + FTS5 persistence layer
  api/
    routes.ts         - HTTP endpoints (memory + task + system)
  mcp.ts            - MCP server (14 tools, incognito support)
  cli.ts            - CLI (setup, serve, hook config)
  index.ts          - HTTP server entry point (auto-backup on startup)

For detailed architecture including pipeline phases, database schema, and system diagrams, see docs/architecture.md.

Testing & Evaluation

Unit Tests

npx vitest run    # 77 tests (salience, decay, hebbian, supersession)

Eval Harness (v0.6.0)

npm run eval                        # All 4 benchmark suites
npm run eval -- --suite=retrieval   # Single suite
npm run eval -- --bm25-only         # Ablation: BM25 only
npm run eval -- --no-graph-walk     # Ablation: disable graph walk

Suites: retrieval (Recall@5), associative (multi-hop), redundancy (dedup F1), temporal (Spearman vs ACT-R). Ablation flags isolate each pipeline component's contribution.

Full Test Suite

npm run test:mcp      # MCP protocol smoke test (5/5)
npm run test:self     # Pipeline component checks (94.1%)
npm run test:edge     # 9 adversarial failure modes
npm run test:stress   # 500 memories, 100 consolidation cycles (96.2%)
npm run test:workday  # 4-session production simulation (93.3%)
npm run test:ab       # AWM vs baseline comparison
npm run test:sleep    # Consolidation impact measurement
npm run test:tokens   # Token savings analysis (56.3% savings)
npm run test:pilot    # Production-like query validation (14/15)
npm run test:locomo   # LoCoMo industry benchmark (28.2%)

Environment Variables

| Variable | Default | Purpose | |----------|---------|---------| | AWM_PORT | 8400 | HTTP server port | | AWM_DB_PATH | memory.db | SQLite database path | | AWM_AGENT_ID | claude-code | Agent ID (memory namespace) | | AWM_EMBED_MODEL | Xenova/bge-small-en-v1.5 | Embedding model (retrieval-optimized) | | AWM_EMBED_DIMS | 384 | Embedding dimensions | | AWM_RERANKER_MODEL | Xenova/ms-marco-MiniLM-L-6-v2 | Reranker model | | AWM_HOOK_PORT | 8401 | Hook sidecar port | | AWM_HOOK_SECRET | (none) | Bearer token for hook auth | | AWM_API_KEY | (none) | Bearer token for HTTP API auth | | AWM_INCOGNITO | (unset) | Set to 1 to disable all tools | | AWM_COORDINATION | (unset) | Set to true to enable hive coordination endpoints | | AWM_DISABLE_POOL_FILTER | (unset) | Set to 1 to disable the candidate pool reduction (0.7.7+). Reverts recall to scoring all active candidates — slower but useful for A/B testing if a recall regression appears | | AWM_DISABLE_SLIM_CACHE | (unset) | Set to 1 to disable the in-memory slim cache (0.7.10+). Reverts to per-recall SQL fetch — slower but useful if cache invariants are suspected of drift | | AWM_DISABLE_RERANK_SKIP | (unset) | Set to 1 to disable the reranker skip on clear-winner queries (0.7.10+). Forces every recall through the cross-encoder | | AWM_DISABLE_EXPANSION_CACHE | (unset) | Set to 1 to disable the query expansion skip heuristic + LRU cache (0.7.11+). Forces every recall through the flan-t5-small expander | | AWM_WORKSPACE | (unset) | Default workspace for cross-agent recall in hive setups | | AWM_STORE_BACKEND | sqlite | sqlite (better-sqlite3 + FTS5), pglite (PGlite + pgvector + pgroonga), or postgres (node-postgres + pgvector, networked/multi-connection — experimental, 0.10.0). | | AWM_DB_PATH | memory.db (SQLite) / ./memory-pglite (PGlite) | Storage path. Directory for PGlite, file for SQLite. Ignored for postgres (uses AWM_DATABASE_URL). | | AWM_DATABASE_URL | (unset) | Postgres connection string when AWM_STORE_BACKEND=postgres (0.10.0). | | AWM_CONF_SHARPNESS_W | 0.4 | Weight of top1 / mean(top5) in recall confidence (PR-1, v0.8.5) | | AWM_CONF_CLIFF_W | 0.3 | Weight of (top1 - top10) / top1 in recall confidence (PR-1, v0.8.5) | | AWM_CONF_FLOOR_W | 0.3 | Weight of top1 absolute score in recall confidence (PR-1, v0.8.5) | | AWM_FADE_DAYS_SINCE_ACCESS | 45 | Days without access before a stale active engram fades (v0.8.5) | | AWM_FADE_KEEP_CHARS | 150 | Chars retained in faded engram content (v0.8.5) | | AWM_FADE_MIN_CONTENT_LEN | 250 | Don't fade engrams shorter than this — nothing to trim (v0.8.5) | | AWM_FADE_MAX_PER_CYCLE | 25 | Max engrams faded per consolidation cycle — gradual, not sudden (v0.8.5) | | AWM_GRANULARITY_COMPACT_LEN | 200 | Char cap for granularity: 'compact' summaries (v0.8.5) | | AWM_GRANULARITY_FULL_LEN | 1000 | Char cap for top result under granularity: 'auto' when confidence ≥ threshold (v0.8.5) | | AWM_GRANULARITY_AUTO_THRESHOLD | 0.4 | Recall-confidence threshold above which 'auto' granularity gives the top result a long-form summary (v0.8.5) |

Tech Stack

| Component | Technology | |-----------|-----------| | Language | TypeScript (ES2022, strict) | | Database | SQLite via better-sqlite3 + FTS5 | | HTTP | Fastify 5 | | MCP | @modelcontextprotocol/sdk | | ML Runtime | @huggingface/transformers (local ONNX) | | Embeddings | BGE-small-en-v1.5 (BAAI, retrieval-optimized, 384d) | | Reranker | ms-marco-MiniLM-L-6-v2 (cross-encoder) | | Query Expansion | flan-t5-small (synonym generation) | | Tests | Vitest 4 | | Validation | Zod 4 |

All three ML models run locally via ONNX. No external API calls for retrieval. The entire system is a single SQLite file + a Node.js process.

What's New in v0.11.0

awm onboard — warm-start a cold store from a project's own knowledge. A fresh store knows nothing, so recall returns nothing until interactions accumulate; onboarding seeds it up front so an agent is useful from the first turn.

Scan: awm onboard <docs> --repo <path> --project <name> extracts atomic, recall-shaped memories from docs (one per heading) + the repo (stack, layout) → an awm import-compatible pack + a human review file. Edit, then awm import --dedupe. Model-free, no API keys.
Agent-driven interview (no AWM→LLM calls): two MCP tools — onboard_scan (candidate memories to refine) and onboard_questions (anchored on "what is the goal of this memory system?"). The host agent (Codex, Claude Code) refines, confirms with you, and writes them. Works air-gapped.
The skill is a memory: awm setup seeds a canonical "onboard a new project" skill the agent can recall; a cold-store nudge in memory_restore reminds it to warm-start.

Also folds in the 0.10.1 version-reporting fix (below) and fixes the npm README links (absolute GitHub / Pages URLs + the repository field).

What's New in v0.10.1

A patch — no functional or recall-behavior change.

Reported version is now the truth. Hand-maintained version strings had drifted (a 0.10.0 build announced 0.8.5/0.8.8/0.9.2 on the banner, /health, the MCP server, and the export payload). A new src/version.ts reads version from package.json at runtime, so the reported number can never diverge from the build.
README now documents the v0.10.0 Postgres backend (below) + AWM_STORE_BACKEND=postgres / AWM_DATABASE_URL.

What's New in v0.10.0

A networked Postgres backend + backend-agnostic memory portability, plus correctness/stability fixes from a deep audit. No API changes — existing SQLite callers keep working unmodified.

New postgres backend (AWM_STORE_BACKEND=postgres, AWM_DATABASE_URL=…). A real-server adapter over node-postgres (pg) + pgvector — the first backend that is multi-connection safe (vs SQLite's single-machine WAL and PGlite's single-process WASM), for cloud / multi-replica deployments. Sub-1.0 on purpose: the backend is experimental until the remaining SQLite-only paths (coordination, hot backups, integrity check) are ported and cross-backend recall-quality parity is confirmed.
Backend-agnostic awm import / awm export. Both route through openStore(), so you can export from any backend and import INTO Postgres or PGlite (import was previously SQLite-only). Export now includes embedding vectors — a faithful, recall-ready port with no re-embed when source/target embedding models match (--no-embeddings to skip for a model mismatch). New flags: --all-stages, --include-retracted; associations + supersession links are preserved.
awm merge no longer silently drops data. The old hand-rolled merge used a truncated schema (no embedding, memory_class, task/supersession columns) and never populated FTS — killing both vector and BM25 recall for merged rows. Rewritten to route every source engram through the authoritative EngramStore/createEngram path (+ stage/retracted/supersession restore), wrapped in try/finally.
awm import preserves stage + retracted — previously --include-retracted resurrected retracted memories and flattened all stages to active.
Cross-agent contamination guard (write pipeline) — a workspace/hive recall can surface another agent's same-concept engram; the reinforce/supersede branch now refuses to mutate an engram whose agentId differs from the writer's.
Hive degradation is visible — getWorkspaceAgentIds on Postgres/PGlite emits a one-time warning (workspace coordination is SQLite-only) instead of silently returning self-only.

What's New in v0.9.0

A recall-quality default + new tuning knobs + a builder/researcher doc set. Every change is an env-revertible default with no API changes — existing callers keep working unmodified. Validated: official LoCoMo 22.7% → 25.7% (every category up) and adversarial precision 73.4 → 74.9 (strictly better on each axis); recall latency ~35 → ~77ms (sub-100ms, tunable); zero regression across the standard suite (eval 4-suite identical, 569/569 unit, edge 32/34, workday = old config).

Wide rerank pool + top-K abstention (the win). A pipeline-attribution study (new tracer, tests/locomo-eval/trace.ts) found the dominant recall loss wasn't candidate generation or the reranker — it was the stage between: ~50% of answerable queries had gold that cleared the candidate floor but was squeezed out of the rerank pool by the decay-compressed composite before the high-lift (+3.29) reranker saw it. Fix: the composite becomes a cheap wide pre-filter (AWM_TOPN_MULT=8, was 3×), the reranker discriminates on a wider pool (AWM_RERANK_POOL=max(limit*4,40), was max(limit*2,15)), and the out-of-domain abstention gate judges only the post-rerank top-K (AWM_ABSTAIN_GATE_K=5) so widening for recall doesn't inflate the in-domain signal. Reverses the v0.7.13 "pool reduction" change. See reference.md → Recall tuning.
Tunable similarity floors. AWM_SIM_FLOOR_TARGETED / _EXPLORATORY (defaults 0.50 / 0.35, unchanged) and the candidate-entry floors are now env overrides for retuning against a different embedder.
Opt-in / experimental flags (default-off). AWM_QUERY_BRIDGE (query-named-entity boost — lifts attribution "what does X think" 36% → 92% on a controlled eval; small adversarial cost, so opt-in), AWM_AUTOTAG (write-time entity:/cat: meta-tags), AWM_BROAD_EDGES. AWM_SPREAD (in-engine spreading activation) is parked — it regressed recall by displacing gold; multi-hop is solved harness-side instead (see the playbook).
New docs for builders & researchers. docs/awm-for-agents.html — the agent playbook (why AWM exists, the PRIME→ACT→VERIFY→LEARN harness, the full agent feature surface, how multi-hop is solved, and the honest gauntlet findings). docs/pipeline-walkthrough.html redesigned for devs/researchers. Both are published on GitHub Pages. A new Storage Backends + Postgres roadmap section in architecture.md documents SQLite (default) vs PGlite and the path to a networked-Postgres backend (v1 target).

What's New in v0.8.5

A research-grounded hardening pass on recall quality, retraction propagation, and lifecycle management. Every change is fully additive — existing callers keep working without modification. Full validation at the milestone: vitest run 549/549 pass; test:self 97.6% EXCELLENT (was 91.4% on 0.8.0); test:ab AWM 89.3% vs Baseline 83.0% (+6.4 points); test:perf 4/4 PASS.

Recall confidence as data (PR-1). Every ActivationResult now carries a confidence field in [0, 1] — a score-distribution-aware signal (sharpness + cliff + floor blended via weighted geometric mean) that tells the caller how trustworthy the recall set is. Same value on every result in a recall — it describes the set, not the individual. Research grounding: Geifman & El-Yaniv (NeurIPS 2017), Roitero et al (SIGIR 2022). Default behavior unchanged; this is data, not a gate.
Opt-in confidence-based abstention (PR-2). Callers can pass requireConfidence (typical values: 0.10 strict, 0.25 balanced, 0.40 aggressive). When set, the engine returns [] on recalls whose distribution shape falls below the threshold — defeats the "best-of-bad-bunch" leak where a noisy recall returns a weak top result that the agent then trusts.
Coherence-weighted retraction (#18). Retraction penalty propagation is no longer uniform. Multiplier scales with local neighborhood cohesion: dense topically-coherent clusters (a narrative) get heavier penalties when the seed is wrong; hub structures (popular node with heterogeneous edges) get lighter penalties. Implements Carrillo et al, "Continued Influence Effect" (ICCM 2025).
Counter-narrative replacement on supersede/correction (#19). When retraction creates a counter-content correction, the new engram inherits the original's 'connection' edges (scaled 0.7×, capped at 10 inheritances, with invalidation / causal / temporal skipped). The corrected fact takes over the graph role of the wrong fact rather than leaving the corrected fact disconnected.
Content fade stage (#20) — Paper 1. New intermediate 'fading' lifecycle stage between 'active' and 'archived'. Engrams accessed before but stale (no access in 45+ days, content > 250 chars) get content trimmed to 150 chars + … [faded] marker. Concept, tags, and embedding preserved — the engram still participates in BM25 + vector recall, just with less body to score against. Models human memory's loss of surface detail while retaining cue-association pathways (PLOS Comp Biology on storage degradation). Heavily-used (accessCount >= 10), canonical, structural, and retracted engrams excluded.
Adaptive output granularity (#21) — Paper 3. New granularity: 'full' | 'compact' | 'auto' on ActivationQuery. 'compact' attaches a 200-char summary to every result. 'auto' is confidence-adaptive: when recall confidence ≥ 0.4, the top result gets a long-form summary and the rest are compact; when confidence is low, everything is compact so the agent can scan a diverse set without drowning in content. Engram body never modified — just the response shape. Models cognitive teaming (Brill 2018 ACT-R collaboration).

What's New in v0.8.1

Coordination control layer — FailureMode classifier + mutation-hint retry on cleanupStale, per-worker CircuitBreaker, and voluntary POST /assignment/:id/fail endpoint. Designed to reduce the 11.5% failure-with-no-retry rate observed in production hive runs (81/703). Two new schema columns on coord_assignments plus a coord_circuit_state table — both additive CREATE IF NOT EXISTS migrations.

What's New in v0.8.0

Substrate primitives for long-running structured projects — four new HTTP endpoints (/memory/latest-by-tag, /memory/top-by, /memory/resolve, /memory/supersede Form B), three query operators (tagsAll, tagsAny, tagsNone), and a fourth memory_class value (structural). Optional engram columns sequence + references_json enable race-free chronology and typed cross-record links. Designed against the NovelForge 36,000-word "Drawdown" test bed.

What's New in v0.7.16

awm setup --global template now teaches write quality. Two new sections in AWM_INSTRUCTION_CONTENT:
- Writing for recall — explicit guidance that recall quality is determined at write time. Lead with the rule/fact, pick the most specific topic, include 2+ retrievable identifiers (file paths, function names, IDs), write in the vocabulary of the future query, reserve canonical for stable invariants, include the why for feedback memories.
- Recall strategy — formalizes the multi-query reformulation pattern observed in practice. When one query returns nothing, agents reformulate (synonyms, more specific nouns, exact identifiers). Recall is ~300ms — two-three reformulations cost less than one filesystem search. Cap at three to prevent loops.
These document the writer + reader behaviors AWM was always designed around but were previously implicit. No retriever change — pure system-prompt improvement. Run npm install -g agent-working-memory@latest && awm setup --global to apply.
LongMemEval headline number updated. Re-running the benchmark on 0.7.16 (single-session-user, 50 questions, same adapter as the original 0.7.1 baseline): 68% accuracy with gpt-4o-mini, up from the original 40-50%. Recall latency 0.12s avg (was 7-11s on 0.7.2). Multi-tier reader sweep on the same memory inputs:
- gpt-4o-mini (cheap, non-thinking): 68%
- gpt-4o (strong, non-thinking): 68%
- o4-mini (cheap, thinking): 78%
- gpt-5-mini (mid, thinking): 80%
Non-thinking models cap at 68% on this category — the bottleneck is reasoning over recalled context, not raw scale. Thinking models add 10-12pp. Memory quality is fixed; reader determines the ceiling.

What's New in v0.7.15

Documentation refresh — awm setup --global now writes a CLAUDE.md template that documents all four perf env-var escape hatches (AWM_DISABLE_POOL_FILTER, AWM_DISABLE_SLIM_CACHE, AWM_DISABLE_RERANK_SKIP, AWM_DISABLE_EXPANSION_CACHE) instead of just the first one. Troubleshooting / quickstart / user-guide docs updated to reflect the current ~300ms recall floor. No code change — version bumped solely so the new template ships via npm install -g agent-working-memory@latest.

What's New in v0.7.14

Recall latency 0.4-0.8s → 0.3-0.6s (~25-50% on top of 0.7.13) — three fixes:
1. Batched cross-encoder inference — reranker now tokenizes + runs all query-passage pairs in one batched forward pass. 15-passage rerank: 210ms → 27ms (~7×).
2. Truncate passages to 400 chars before rerank — cross-encoder has 512-token max anyway and pads to the longest passage; full content (5000+ chars) meant everything padded to max length. Truncation drops tokenization + inference 3-4× on long memory pools.
3. Eager slim-cache populate at startup — first user recall no longer pays the ~600ms cache populate cost.
Recall quality A/B: 8/8 top-1, 4.50/5 top-5. Cumulative since 0.7.4 baseline: 11s → 0.3-0.6s (~25-37× faster).

What's New in v0.7.13

Reranker pool size reduction — cross-encoder pool dropped from max(limit*3, 30) to max(limit*2, 15). For typical agent queries (limit=5 or 10), that's 15-20 candidates reranked instead of 30, halving the cross-encoder cost. Top-K quality preserved (8/8 top-1, identical top-5/top-10 overlap) — reranking the 21st-30th candidates was wasted when the user only wants top-5 anyway.
⚠️ Superseded in 0.9.0 — this reduction was reversed. A pipeline-attribution study found that "wasted" tail was actually where ~50% of retrievable answers were being squeezed out before the reranker saw them (the small 8-query A/B above missed it). 0.9.0 widens the pool back to max(limit*4, 40) and adds a top-K abstention gate — lifting LoCoMo recall 22.7%→25.7% and adversarial precision 73.4→74.9, with no regression. See the CHANGELOG and docs/reference.md → "Recall tuning."

What's New in v0.7.12

Recall latency 0.9s → 0.4-0.8s (~40-60% on top of 0.7.11) — phase-breakdown showed getAssociationsForBatch over ~300 survivors was 222ms (25% of remaining floor) but the scoring loop only reads count + sumWeight from each engram's edges. New getAssociationStatsForBatch returns scalar stats via a single GROUP BY aggregate. Graph walk still uses full associations, but only on top-N (~30) so its lookups are cheap. Recall quality A/B: 8/8 top-1, 4.50/5 top-5. Cumulative since 0.7.4 baseline: 11-23s → 0.4-0.8s (~25× faster median).

What's New in v0.7.11

Query expansion skip + LRU cache — flan-t5-small was 164ms per recall (18% of post-0.7.10 floor). Two fixes in core/query-expander.ts: (1) skip heuristic for long/specific queries (>50 chars OR ≥5 distinct meaningful tokens), and (2) 500-entry LRU cache for repeated queries. ~30% of typical agent recalls hit the skip; repeated recalls hit the cache. Avg savings: ~100-150ms per recall. Recall quality A/B: 8/8 top-1, 4.63/5 top-5. Disable via AWM_DISABLE_EXPANSION_CACHE=1.

What's New in v0.7.10

Recall latency 1.4s → 0.9s median (~35% on top of 0.7.9) — two more fixes after phase-breakdown showed the slim fetch was still 310ms (Buffer→Float32Array on every recall) and the reranker was 354ms (40% of remaining cost):
1. In-memory slim cache — Map<id, SlimCacheEntry> populated once per process, mutated in lock-step with engram writes/updates/retracts. Slim fetch 306ms → 5ms with warm cache (~60×). Disable via AWM_DISABLE_SLIM_CACHE=1. Memory cost ~15MB at 10K engrams.
2. Reranker skip on clear winners — when BM25 has a clear top-1 (textMatch ≥ 0.8, ≥1.5× the next score, small pool), skip the cross-encoder. Saves ~300ms on confident queries. Disable via AWM_DISABLE_RERANK_SKIP=1.
Quality preserved: 8/8 top-1, 4.63/5 top-5, 9.75/10 top-10 on the A/B suite. Cumulative since 0.7.4 baseline: 11s → 0.9s (~12-15× faster).

What's New in v0.7.9

Recall latency 1.6s → 1.0s end-to-end (~30% on top of 0.7.7) — phase-breakdown showed the fullSELECT-over-10K-engrams was the new bottleneck (440ms / 40% of recall) due to row materialization of content/tags/JSON for rows the pre-filter doesn't read. Two-pass fetch: slim (id, concept, embedding) for the cosine + filter pass, then hydrate only the survivors via getEngramsByIds. Recall quality A/B verified 8/8 top-1 identical, top-K overlap slightly improved (4.75/5 vs 4.50). Cumulative since 0.7.4 baseline: 11-23s → 1.0-1.6s (~10-20× faster).

What's New in v0.7.8

Install template updated for the 0.7.5/0.7.6/0.7.7 behaviors — awm setup now writes a richer CLAUDE.md that teaches agents about memory classes (canonical | working | ephemeral), salience auto-promotion patterns (detectUserFeedback for stakeholder quotes, detectVerifiedFinding for operational records with action-verb + concrete IDs), and the new env-var escape hatches. Existing installs upgrade via npm install -g agent-working-memory@latest && awm setup --global then restart Claude Code. No functional code change in this release — version bumped solely so the new template ships.

What's New in v0.7.7

Recall latency 2.5s → 1.0s end-to-end (~50% on top of 0.7.6) — phase-breakdown spike showed that after the 0.7.6 BM25 fix, the new bottleneck was getAssociationsForBatch over all ~10K candidates (68% of recall latency). Added a cheap pre-filter before deep scoring: candidates survive only if they have a BM25 hit, a cosine z-score above the gate, or concept-token overlap with the query. From ~10K candidates → typically 100-300 survivors. Graph-walk correctness preserved (it only boosts neighbors with textMatch >= 0.05, which would also pass this filter). Recall quality A/B verified: 8/8 top-1 matches, 90% top-5 overlap, 94% top-10 overlap on diverse queries. Set AWM_DISABLE_POOL_FILTER=1 to revert. Cumulative since 0.7.4 baseline: 11-23s → 0.9-1.6s (~10-15× faster).

What's New in v0.7.6

Recall latency 11-23s → 2.5s end-to-end (~5× faster) — measurement spike found the slow path was a SQLite query-plan trap, not vector search. The BM25 query JOIN engrams_fts ON e.rowid + WHERE MATCH + ORDER BY rank LIMIT N materialized all matching rows (with 1.5KB embedding blobs) before the LIMIT applied. CTE prefilter forces FTS5 LIMIT first, then joins only the top-K rowids. Same SQLite, same data, same results — 567× faster for wide OR queries (3682ms → 6.5ms verified). Also added getAssociationsForBatch to replace the per-candidate N+1 in the activation scoring loop. Top-K results are byte-identical to the old query (verified by the equivalence test in spike/).
Salience filter — auto-promote verified operational records — operational batch summaries (e.g., "Submitted 6 events 2026-05-07 — IDs 18969, 18971…") were being discarded at salience 0.14 because BM25 novelty couldn't distinguish "useful new operational record" from "duplicate observation" when topic terminology repeated. New detectVerifiedFinding() pattern detector parallel to detectUserFeedback(): requires action-verb header (Submitted/Finalized/Completed/Reconciled/Triaged/etc.) plus ≥2 concrete identifiers (ISO date or contextual numeric ID). Matched memories get a 0.45 salience floor (active disposition, not canonical). 7 new tests, 23 salience tests pass.

What's New in v0.7.4

Channel push telemetry — new GET /telemetry/channels JSON endpoint and Prometheus counters (coord_channel_push_attempts_total, ..._delivered_total, ..._failed_total{reason}, ..._no_session_total, ..._fallback_mailbox_total, ..._session_disconnects_total). Surfaces real delivery rate so coordination reliability can be measured rather than guessed.
Role-based /channel/push addressing — accepts {role, workspace, message} as alternative to {agentId, message}. Server resolves role+workspace to most-recently-seen alive agent. Lets workers notify the coordinator without hardcoding its UUID (which changes across coordinator restarts). Enables event-driven worker → coordinator hand-off in place of fragile coordinator self-polling.
/checkin writes role on every call — previously the UPDATE on existing rows preserved a stale role from initial registration; now agents can correct their own role via re-checkin.
/workers JOINs channel sessions — alive field is now recent_pulse OR connected_channel_session. Stops false-dead duplicate-spawn loops where a busy worker's /pulse went stale during long tool sequences while their channel-server stayed reachable.
cleanupStale runs on a 5-minute schedule — was only invoked manually; now zombie agents get marked dead automatically with a 600s threshold (forgiving for long edits).
user_feedback salience event type — new event type with bonus 0.3 (highest of any). Auto-detect heuristic on memory_write content matching ^(Robert|Katherine|Nancy|...) (said|verbatim|directed|decided|...) forces memoryClass='canonical' so user-stated decisions can't be discarded by the BM25 novelty floor in populated DBs.

v0.7.3

Salience filter production tuning — fixed BM25 novelty floor that was discarding ~17% salience for most writes in 10K+ engram DBs. Quadratic dampening curve (max(0.05, 1 - topScore²)); concept-match penalty scoped to last 30 days; floor lowered 0.10 → 0.05.
Maintenance scripts for backup pruning + lme/bench database cleanup.

v0.7.2

Workspace recall fix (was returning UUIDs not names in v0.7.1 release).

v0.7.1

Agent-provided metadata tags — memory_write accepts project, topic, source, confidence_level, session_id, intent. Stored as searchable prefixed tags (proj=X, sid=Z). Session ID tags alone improved LongMemEval recall 3x.
Dual synthesis — consolidation creates two types of summary memories: session summaries (tag-based, for perfect recall) and pattern syntheses (cross-session, for novel recall/creative connections).
Bulk write + supersession — POST /memory/write-batch for batch ingestion with POST /memory/supersede for knowledge updates.
LongMemEval benchmark — adapter built, baseline established at 40-50% with gpt-4o-mini.

v0.7.0

Workspace-scoped recall, validation-gated Hebbian (Kairos), multi-graph traversal (MAGMA), power-law edge decay (DASH).

v0.6.1

Embedding version tracking, batch backfill, deeper retraction propagation, retrieval timeouts, channel push delivery.

v0.6.0

Memory taxonomy — memories classified as episodic, semantic, procedural, or unclassified. Auto-classified on write. Filter by type on recall.
Query-adaptive retrieval — pipeline adapts to query type: targeted | exploratory | balanced | auto.
Decision propagation — decisions broadcast to coordination layer for cross-agent discovery.
Eval harness — npm run eval benchmarks retrieval, associative, redundancy, and temporal performance.
DB hardening — busy_timeout, integrity check on startup, hot backups every 10 min, WAL checkpoint on shutdown.

See CHANGELOG.md for full details.

Integrations

AWM is a standard MCP server, so it plugs into any MCP-capable agent host with no adapter code — the same server Claude Code uses. Point two hosts at the same AWM_DB_PATH (with a shared AWM_AGENT_ID/AWM_WORKSPACE) and they share one cognitive memory.

Hermes Agent (Nous Research)

Make AWM available where Hermes runs (e.g. a derived Docker image — the Hermes image already bundles Node):

FROM hermes-agent:local
USER root
RUN npm install -g agent-working-memory@latest
ENV HF_HOME=/opt/data/.cache/huggingface

mcp_servers:
  awm:
    command: node
    args: ["/usr/local/lib/node_modules/agent-working-memory/dist/mcp.js"]
    env:
      AWM_AGENT_ID: hermes
      AWM_DB_PATH: /opt/data/awm/hermes.db    # on a persistent volume
      HF_HOME: /opt/data/.cache/huggingface
    timeout: 600                              # first call downloads the embedder

AWM's tools appear to the agent as mcp_awm_memory_write, mcp_awm_memory_recall, etc. Works with any Hermes model provider (verified on Anthropic and Azure gpt-5-4-mini).

Full recipe — model-provider examples, the Azure GPT-5.x /openai/v1 note, and gotchas (incl. the Windows CRLF/s6 clone fix) — is in docs/integrations/hermes.md.

Project Status

AWM is in active development (v0.8.5). The core memory pipeline, consolidation system, multi-agent coordination, and MCP integration are stable and used daily in production coding workflows.

Core retrieval and consolidation: stable
MCP tools and Claude Code integration: stable
Other MCP hosts (e.g. Hermes Agent): supported — AWM drops in as an MCP memory server with no adapter code
Multi-agent coordination: stable (v0.8.1 hardening)
Task management: stable
Hook sidecar and auto-checkpoint: stable
HTTP API: stable (for custom agents)
Eval harness: stable (v0.6.0, extended through 0.8.x)
Recall confidence + opt-in abstention (PR-1, PR-2): stable (v0.8.5)
Coherence-weighted retraction + counter-narrative inheritance: stable (v0.8.5)
Content fade stage + adaptive output granularity: stable (v0.8.5)
PGlite backend (alternative to SQLite, with pgvector + ivfflat): stable (v0.8.x)
Networked Postgres backend (pg + pgvector, multi-connection): experimental (v0.10.0)
Backend-agnostic import/export (embeddings included, cross-backend port): stable (v0.10.0)

See CHANGELOG.md for version history.

License

Apache 2.0 — see LICENSE and NOTICE.