@barbozaa/memory-token

v0.2.0

Published

10 days ago

MCP server + Cursor hooks for governed workspace memory: token-budget context packs, propose/confirm flow, typed link graph, and local RAG with embeddings.

0High
0Medium
0Low

barbozaa

mcp model-context-protocol cursor claude memory context rag embeddings ai-agent token-budget

memory-token

MCP server plus Cursor hooks for governed workspace memory with token-budget packs: agents propose facts, you (or policy) confirm, then a bounded summary loads at session start—with preCompact nudges, post-tool reminders, and optional user-message triggers so long sessions still persist durable context.

The problem this solves

Coding agents (Cursor / Claude / Copilot) forget everything between sessions. Every new chat means:

The agent re-reads the same files to understand your project (5–20k wasted tokens per session).
You re-explain the same architectural decisions, conventions, build commands, and known bugs.
Long-running sessions silently lose context to summarization and the agent starts repeating mistakes you already fixed.
"Just put it in CLAUDE.md / AGENTS.md" doesn't scale: the file grows unbounded, eats context every turn, and has no provenance — you can't tell which lines are stale, who proposed them, or what actually got applied.

The shortcut of dumping everything into a markdown rules file works until it doesn't; you end up paying token rent on context the agent doesn't even need this turn.

How memory-token solves it

A two-tier memory with a strict token budget and a human-in-the-loop gate:

Confirmed store (store.json) — small, curated, typed facts (decisions, fixes, APIs, build commands, conventions). Loaded at session start as a bounded markdown pack (default ≤1500 tokens). The agent proposes, you (or policy) confirm — nothing gets in by accident.
RAG layer (rag.sqlite) — large chunks (chat distills, session capsules, design docs) stored compressed with embeddings. Pulled on demand via rag_query, ranked by hybrid score (embeddings + BM25-style lexical), returned already compressed within a token budget. Verbatim text only when explicitly fetched via rag_get_full.

Both layers respect a token budget on every read — you never pay for context the agent isn't using right now.

Benefits (measured on this repo)

| Benefit | Mechanism | Example impact | |---|---|---| | Cross-session memory | sessionStart hook injects confirmed pack automatically | Critical facts (build cmd, security fix, race condition root cause) survive across all future Cursor windows. | | Big token savings on context loads | RAG returns compressed, ranked snippets instead of file reads | A query that would cost ~3000–5000 tokens in Read calls cost 364 tokens of compressed RAG hits in this session (~85% reduction). | | Bounded session-start cost | Pack hard-capped at max_tokens, deterministic ordering | This repo's pack: 397 tok / 1500 budget = 26%, fits in any sessionStart without crowding out user prompt. | | Compression for free | compress_candidate shrinks bodies to ~1.6× ratio | 37.8% size reduction on stored bodies; same retrieval quality. | | No silent drift | propose → confirm gate, dedupe by hash + similarity, audit log | Every fact has a status, timestamp, and reason; you can prune with confidence. | | Causal graph | Typed links (SOLVES, CAUSES, BUILDS_ON, …) + traverse BFS | Ask "what fixed X?" and get back the commit that solved it, the build that depends on it, in one hop. | | Local-first, private | Embeddings via Transformers.js (ONNX, ~25 MB model, downloaded once) | No API keys required. Optional: Ollama or OpenAI for embeddings. Your facts never leave the repo. | | Workspace-scoped | Each project gets its own .memory-token/ directory | No bleeding of facts between unrelated repos; gitignored by default but can be versioned. | | Agent steering, not vibes | Hooks inject skill + policy + pack; skill is a decision tree the agent follows | Agents actually call propose/confirm/rag_query consistently instead of "maybe sometimes if they remember". | | Prevents the CLAUDE.md bloat trap | Token-budget pack + RAG fallback | Old facts move to RAG (compressed, on-demand) instead of permanently inflating the rules file. |

When to use it

✅ Multi-week projects where you reopen Cursor often and don't want to re-explain context.
✅ Codebases big enough that re-reading "to understand structure" costs noticeable tokens per session.
✅ Teams that want provenance and control over what the agent "remembers".
✅ Long debugging sessions where you want the root cause + fix to survive into next week's session.

Skip if your project is a one-off script or you don't run more than 1–2 chat sessions on the same codebase.

What it does

| Layer | Role | | --- | --- | | Store (store.json) | Typed memories (decision, fix, api, …), statuses (proposed → confirmed / rejected), links between memories, audit log. | | RAG (rag.sqlite) | Chunked text with hybrid retrieval (embeddings + lexical). Compressed snippets in query results; verbatim text only via memory_token_rag_get_full. | | MCP | Eighteen tools: pack, propose/confirm/reject, search, prune, link graph, export/import, RAG ingest/query/delete, stats, audit. | | Hooks | Inject skill path + policy + pack on sessionStart; hints on beforeSubmitPrompt / preCompact / postToolUse. Hooks do not call the MCP (shell → Node CLIs only). Flow: hook → skill → MCP. |

Workspace root comes from MEMORY_TOKEN_WORKSPACE, CURSOR_PROJECT_DIR, or CLAUDE_PROJECT_DIR, else process.cwd() (see src/workspace.ts).

Requirements

Node.js ≥ 22.5.0 (uses node:sqlite experimental API)

Install

Option A — `npx` (zero-install, recommended)

No clone, no global install. Cursor pulls the package on demand:

// ~/.cursor/mcp.json
{
  "mcpServers": {
    "memory-token": {
      "command": "npx",
      "args": ["-y", "@barbozaa/memory-token"],
      "env": { "MEMORY_TOKEN_WORKSPACE": "${workspaceFolder}" }
    }
  }
}

Restart Cursor. The first call downloads the package (~52 KB tarball). The first semantic embedding call additionally downloads the local ONNX model (~25 MB) into <workspace>/.memory-token/transformers-cache/.

To use the hooks or the skill with this install method, copy them once from the installed package into your ~/.cursor/:

PKG=$(npm root -g 2>/dev/null)/@barbozaa/memory-token   # or use `npm pack` to extract locally
cp -r "$PKG/.cursor/hooks"  ~/.cursor/
cp    "$PKG/.cursor/hooks.json" ~/.cursor/
cp -r "$PKG/.cursor/skills" ~/.cursor/
chmod +x ~/.cursor/hooks/*.sh

Option B — Global install

npm install -g @barbozaa/memory-token

{
  "mcpServers": {
    "memory-token": {
      "command": "memory-token-mcp",
      "env": { "MEMORY_TOKEN_WORKSPACE": "${workspaceFolder}" }
    }
  }
}

Option C — From source (development)

git clone https://github.com/barbozaa/memory-token.git
cd memory-token
npm install
npm run build
chmod +x .cursor/hooks/*.sh
npm run smoke

Point Cursor at dist/mcp/index.js with MEMORY_TOKEN_WORKSPACE=${workspaceFolder} (see Cursor MCP). Merge .cursor/hooks.json into your project if you use hooks elsewhere.

Global Cursor install (all new windows / workspaces)

Use user-level config so you do not copy hooks into every repo:

~/.cursor/mcp.json — add the memory-token server with MEMORY_TOKEN_WORKSPACE: ${workspaceFolder} (each project still gets its own .memory-token/).
~/.cursor/hooks.json + ~/.cursor/hooks/memory-token/*.sh — wrappers that set MEMORY_TOKEN_CLI_ROOT to your clone and call dist/hook-*.js. See ~/.cursor/hooks/memory-token/README.md after install.
~/.cursor/skills/memory-token/SKILL.md — global skill; session hook sets MEMORY_TOKEN_SKILL_PATH so the banner points here.
User Rules — Cursor reads them from Settings → Rules, not from ~/.cursor/rules/. Use ~/.cursor/rules/memory-token.mdc only as a reference to paste into User Rules, or add the same rule under each repo’s .cursor/rules/.

Restart Cursor after changing MCP or hooks. Re-run npm run build in the memory-token clone when you pull updates.

Data locations

All under <workspace>/.memory-token/ (typically gitignored):

| Path | Contents | | --- | --- | | store.json | Memories, links[], audit[] | | rag.sqlite | RAG chunks + vectors | | transformers-cache/ | Downloaded ONNX model (first semantic embed run) |

Remove .memory-token/ from .gitignore if you want the store versioned.

Repository layout

src/
  mcp/index.ts          # MCP server + tool handlers
  store.ts              # JSON store + lockfile (O_EXCL) for concurrent writes
  pack.ts               # Token-budget pack builder (used by MCP + session hook)
  types.ts              # Memory types, links, audit shapes
  session-policy.ts     # Injected policy text for hooks
  compress-lite.ts      # Lightweight compression helpers
  memory-dedupe.ts      # Near-duplicate detection on propose
  workspace.ts          # Resolve workspace root from env
  hook-session-start.ts # sessionStart payload
  hook-post-tool-nudge.ts
  hook-user-message.ts
  hook-git-commit.ts    # Optional git post-commit integration
  mcp-nudge-messages.ts
  rag/                  # db, query, embed, embed-local, lexical, paths, compress-default
scripts/
  smoke-mcp.mjs         # MCP smoke (listTools, pack, propose, RAG, …)
  metrics.mjs         # Store/pack/link/RAG health report
  install-git-hook.sh # Install post-commit hook in another repo
.cursor/
  hooks.json            # Cursor hook wiring
  hooks/*.sh            # Shell wrappers → dist/*.js
  skills/memory-token/SKILL.md   # Agent workflow (read at session start)
  rules/memory-token.mdc         # Optional Cursor rules

Cursor MCP

For npx or global installs see Install. The block below is for the from-source workflow:

"memory-token": {
  "command": "node",
  "args": [
    "--disable-warning=ExperimentalWarning",
    "/ABS/PATH/TO/memory-token/dist/mcp/index.js"
  ],
  "env": {
    "MEMORY_TOKEN_WORKSPACE": "${workspaceFolder}"
  }
}

Replace /ABS/PATH/TO/memory-token with this repo’s path. Restart Cursor after edits.

CLI / other clients: npm run mcp runs the server with cwd as workspace unless env overrides. memory-token-mcp is the package bin entry (same script).

Cursor hooks

Shipped in .cursor/hooks.json:

| Event | Script | Purpose | | --- | --- | --- | | sessionStart | .cursor/hooks/session-memory.sh | Skill banner (MEMORY_TOKEN_SKILL_PATH or .cursor/skills/memory-token/SKILL.md), policy cheat sheet, token-capped confirmed pack. | | preCompact | .cursor/hooks/precompact-memory-hint.sh | Reminder to rag_ingest / propose / confirm before context compaction. | | postToolUse | .cursor/hooks/post-tool-mcp-nudge.sh | Every MEMORY_TOKEN_NUDGE_EVERY tool calls (default 10), short [memory-token] nudge (skips tools already named like memory_token_*). | | beforeSubmitPrompt | .cursor/hooks/user-message-triggers.sh (matcher: UserPromptSubmit) | Same as legacy “user send” nudge: phrases like “remember this”, “root cause”, long paste → additional_context for propose / rag_ingest. |

Hook env (optional)

| Variable | Default | Meaning | | --- | --- | --- | | MEMORY_TOKEN_SESSION_MAX_TOKENS | 2400 | Total rough budget for policy + pack in session hook (char/4 estimate). | | MEMORY_TOKEN_NUDGE_EVERY | 10 | Post-tool nudge interval. | | MEMORY_TOKEN_CLI_ROOT | auto | Absolute path to this repo when hooks live in another project (so dist/hook-*.js resolve). | | MEMORY_TOKEN_SKILL_PATH | — | Custom SKILL.md path. |

MCP tools (full list)

| Tool | Purpose | | --- | --- | | memory_token_get_context_pack | Confirmed memories within max_tokens; optional query / tags rerank. | | memory_token_propose | Create proposed memory (type, importance, optional body_compressed, force to bypass dedupe). | | memory_token_confirm / memory_token_reject | Promote or drop proposals. | | memory_token_compress_candidate | Deterministic squeeze for body_compressed. | | memory_token_search | Substring search over memories. | | memory_token_list_audit | Recent audit entries. | | memory_token_prune | Remove old/matching proposed (and optionally confirmed) rows; dry_run preview. | | memory_token_link / memory_token_unlink | Directed edges between memories (relation_type, optional bidirectional). | | memory_token_traverse | BFS from a memory id over the link graph. | | memory_token_stats | Workspace counts / health-style summary. | | memory_token_export / memory_token_import | JSON backup / import (include_rag on export). | | memory_token_rag_ingest | Store chunk (text_raw, optional text_compressed, source); dedupe by hash of raw. | | memory_token_rag_query | Ranked compressed chunks within token budget (hybrid score). | | memory_token_rag_get_full | Verbatim text_raw for an id (use before quoting or patching from RAG hits). | | memory_token_rag_delete | Remove a chunk. |

Recommended link relation strings include SOLVES, CAUSES, BUILDS_ON, CONTRADICTS, SUPERSEDES, BLOCKS, REQUIRES, ALTERNATIVE_TO, RELATED_TO (see src/types.ts).

RAG embeddings (local first)

Default: local embeddings via @xenova/transformers + SQLite vectors (model Xenova/all-MiniLM-L6-v2). First run downloads into .memory-token/transformers-cache/ (needs network once).

| Env | Effect | | --- | --- | | MEMORY_TOKEN_LOCAL_EMBED_MODEL | Override Hugging Face model id (ONNX / Transformers.js). | | MEMORY_TOKEN_LOCAL_EMBED_THREADS | ONNX WASM threads (default 2). | | MEMORY_TOKEN_NO_LOCAL_EMBED=1 | Skip local embedder → lexical unless Ollama/OpenAI configured. | | MEMORY_TOKEN_OLLAMA_EMBED_MODEL | Ollama embeddings. | | MEMORY_TOKEN_OLLAMA_URL | Ollama base URL (default http://127.0.0.1:11434). | | OPENAI_API_KEY / MEMORY_TOKEN_OPENAI_API_KEY | Optional cloud embeddings. |

This is not ChromaDB; the DB format is project-specific.

npm scripts

| Script | Command | Purpose | | --- | --- | --- | | build | tsc | Compile src/ → dist/. | | mcp | node … dist/mcp/index.js | Run MCP stdio server. | | smoke | node … scripts/smoke-mcp.mjs | End-to-end MCP smoke against repo root. | | metrics | node … scripts/metrics.mjs | Human or --json report: store, pack usage, links, RAG, audit. Optional --root /path/to/workspace. |

Optional git post-commit hook

Install into another git repo:

./scripts/install-git-hook.sh /path/to/target/repo

Uses MEMORY_TOKEN_CLI_ROOT if the memory-token repo is not next to the script. On each commit, runs dist/hook-git-commit.js with MEMORY_TOKEN_WORKSPACE set to the target repo root (see scripts/install-git-hook.sh).

VS Code and Kiro

Stock VS Code has no Cursor-style sessionStart / postToolUse hooks. Options: run node dist/hook-session-start.js with MEMORY_TOKEN_WORKSPACE set, use a task, or an extension that exposes similar lifecycle events.

Kiro IDE: full install (MCP, steering, hooks, spec-task ideas, Cursor parity table) is in KIRO_SETUP.md. Example hook YAML lives under .kiro/hooks/.

Agent workflow

Read .cursor/skills/memory-token/SKILL.md (injected path on session start).
Call memory_token_get_context_pack early; use memory_token_rag_query before reading many files or huge pasted context.
Treat RAG snippets as non-verbatim until memory_token_rag_get_full.
Persist stable facts with propose → confirm; link related memories when useful; prune junk or stale proposals.