@barbozaa/memory-token
v0.2.0
Published
MCP server + Cursor hooks for governed workspace memory: token-budget context packs, propose/confirm flow, typed link graph, and local RAG with embeddings.
Maintainers
Readme
memory-token
MCP server plus Cursor hooks for governed workspace memory with token-budget packs: agents propose facts, you (or policy) confirm, then a bounded summary loads at session start—with preCompact nudges, post-tool reminders, and optional user-message triggers so long sessions still persist durable context.
The problem this solves
Coding agents (Cursor / Claude / Copilot) forget everything between sessions. Every new chat means:
- The agent re-reads the same files to understand your project (5–20k wasted tokens per session).
- You re-explain the same architectural decisions, conventions, build commands, and known bugs.
- Long-running sessions silently lose context to summarization and the agent starts repeating mistakes you already fixed.
- "Just put it in
CLAUDE.md/AGENTS.md" doesn't scale: the file grows unbounded, eats context every turn, and has no provenance — you can't tell which lines are stale, who proposed them, or what actually got applied.
The shortcut of dumping everything into a markdown rules file works until it doesn't; you end up paying token rent on context the agent doesn't even need this turn.
How memory-token solves it
A two-tier memory with a strict token budget and a human-in-the-loop gate:
- Confirmed store (
store.json) — small, curated, typed facts (decisions, fixes, APIs, build commands, conventions). Loaded at session start as a bounded markdown pack (default ≤1500 tokens). The agent proposes, you (or policy) confirm — nothing gets in by accident. - RAG layer (
rag.sqlite) — large chunks (chat distills, session capsules, design docs) stored compressed with embeddings. Pulled on demand viarag_query, ranked by hybrid score (embeddings + BM25-style lexical), returned already compressed within a token budget. Verbatim text only when explicitly fetched viarag_get_full.
Both layers respect a token budget on every read — you never pay for context the agent isn't using right now.
Benefits (measured on this repo)
| Benefit | Mechanism | Example impact |
|---|---|---|
| Cross-session memory | sessionStart hook injects confirmed pack automatically | Critical facts (build cmd, security fix, race condition root cause) survive across all future Cursor windows. |
| Big token savings on context loads | RAG returns compressed, ranked snippets instead of file reads | A query that would cost ~3000–5000 tokens in Read calls cost 364 tokens of compressed RAG hits in this session (~85% reduction). |
| Bounded session-start cost | Pack hard-capped at max_tokens, deterministic ordering | This repo's pack: 397 tok / 1500 budget = 26%, fits in any sessionStart without crowding out user prompt. |
| Compression for free | compress_candidate shrinks bodies to ~1.6× ratio | 37.8% size reduction on stored bodies; same retrieval quality. |
| No silent drift | propose → confirm gate, dedupe by hash + similarity, audit log | Every fact has a status, timestamp, and reason; you can prune with confidence. |
| Causal graph | Typed links (SOLVES, CAUSES, BUILDS_ON, …) + traverse BFS | Ask "what fixed X?" and get back the commit that solved it, the build that depends on it, in one hop. |
| Local-first, private | Embeddings via Transformers.js (ONNX, ~25 MB model, downloaded once) | No API keys required. Optional: Ollama or OpenAI for embeddings. Your facts never leave the repo. |
| Workspace-scoped | Each project gets its own .memory-token/ directory | No bleeding of facts between unrelated repos; gitignored by default but can be versioned. |
| Agent steering, not vibes | Hooks inject skill + policy + pack; skill is a decision tree the agent follows | Agents actually call propose/confirm/rag_query consistently instead of "maybe sometimes if they remember". |
| Prevents the CLAUDE.md bloat trap | Token-budget pack + RAG fallback | Old facts move to RAG (compressed, on-demand) instead of permanently inflating the rules file. |
When to use it
- ✅ Multi-week projects where you reopen Cursor often and don't want to re-explain context.
- ✅ Codebases big enough that re-reading "to understand structure" costs noticeable tokens per session.
- ✅ Teams that want provenance and control over what the agent "remembers".
- ✅ Long debugging sessions where you want the root cause + fix to survive into next week's session.
Skip if your project is a one-off script or you don't run more than 1–2 chat sessions on the same codebase.
What it does
| Layer | Role |
| --- | --- |
| Store (store.json) | Typed memories (decision, fix, api, …), statuses (proposed → confirmed / rejected), links between memories, audit log. |
| RAG (rag.sqlite) | Chunked text with hybrid retrieval (embeddings + lexical). Compressed snippets in query results; verbatim text only via memory_token_rag_get_full. |
| MCP | Eighteen tools: pack, propose/confirm/reject, search, prune, link graph, export/import, RAG ingest/query/delete, stats, audit. |
| Hooks | Inject skill path + policy + pack on sessionStart; hints on beforeSubmitPrompt / preCompact / postToolUse. Hooks do not call the MCP (shell → Node CLIs only). Flow: hook → skill → MCP. |
Workspace root comes from MEMORY_TOKEN_WORKSPACE, CURSOR_PROJECT_DIR, or CLAUDE_PROJECT_DIR, else process.cwd() (see src/workspace.ts).
Requirements
- Node.js ≥ 22.5.0 (uses
node:sqliteexperimental API)
Install
Option A — npx (zero-install, recommended)
No clone, no global install. Cursor pulls the package on demand:
// ~/.cursor/mcp.json
{
"mcpServers": {
"memory-token": {
"command": "npx",
"args": ["-y", "@barbozaa/memory-token"],
"env": { "MEMORY_TOKEN_WORKSPACE": "${workspaceFolder}" }
}
}
}Restart Cursor. The first call downloads the package (~52 KB tarball). The first semantic embedding call additionally downloads the local ONNX model (~25 MB) into <workspace>/.memory-token/transformers-cache/.
To use the hooks or the skill with this install method, copy them once from the installed package into your ~/.cursor/:
PKG=$(npm root -g 2>/dev/null)/@barbozaa/memory-token # or use `npm pack` to extract locally
cp -r "$PKG/.cursor/hooks" ~/.cursor/
cp "$PKG/.cursor/hooks.json" ~/.cursor/
cp -r "$PKG/.cursor/skills" ~/.cursor/
chmod +x ~/.cursor/hooks/*.shOption B — Global install
npm install -g @barbozaa/memory-token{
"mcpServers": {
"memory-token": {
"command": "memory-token-mcp",
"env": { "MEMORY_TOKEN_WORKSPACE": "${workspaceFolder}" }
}
}
}Option C — From source (development)
git clone https://github.com/barbozaa/memory-token.git
cd memory-token
npm install
npm run build
chmod +x .cursor/hooks/*.sh
npm run smokePoint Cursor at dist/mcp/index.js with MEMORY_TOKEN_WORKSPACE=${workspaceFolder} (see Cursor MCP). Merge .cursor/hooks.json into your project if you use hooks elsewhere.
Global Cursor install (all new windows / workspaces)
Use user-level config so you do not copy hooks into every repo:
~/.cursor/mcp.json— add thememory-tokenserver withMEMORY_TOKEN_WORKSPACE:${workspaceFolder}(each project still gets its own.memory-token/).~/.cursor/hooks.json+~/.cursor/hooks/memory-token/*.sh— wrappers that setMEMORY_TOKEN_CLI_ROOTto your clone and calldist/hook-*.js. See~/.cursor/hooks/memory-token/README.mdafter install.~/.cursor/skills/memory-token/SKILL.md— global skill; session hook setsMEMORY_TOKEN_SKILL_PATHso the banner points here.- User Rules — Cursor reads them from Settings → Rules, not from
~/.cursor/rules/. Use~/.cursor/rules/memory-token.mdconly as a reference to paste into User Rules, or add the same rule under each repo’s.cursor/rules/.
Restart Cursor after changing MCP or hooks. Re-run npm run build in the memory-token clone when you pull updates.
Data locations
All under <workspace>/.memory-token/ (typically gitignored):
| Path | Contents |
| --- | --- |
| store.json | Memories, links[], audit[] |
| rag.sqlite | RAG chunks + vectors |
| transformers-cache/ | Downloaded ONNX model (first semantic embed run) |
Remove .memory-token/ from .gitignore if you want the store versioned.
Repository layout
src/
mcp/index.ts # MCP server + tool handlers
store.ts # JSON store + lockfile (O_EXCL) for concurrent writes
pack.ts # Token-budget pack builder (used by MCP + session hook)
types.ts # Memory types, links, audit shapes
session-policy.ts # Injected policy text for hooks
compress-lite.ts # Lightweight compression helpers
memory-dedupe.ts # Near-duplicate detection on propose
workspace.ts # Resolve workspace root from env
hook-session-start.ts # sessionStart payload
hook-post-tool-nudge.ts
hook-user-message.ts
hook-git-commit.ts # Optional git post-commit integration
mcp-nudge-messages.ts
rag/ # db, query, embed, embed-local, lexical, paths, compress-default
scripts/
smoke-mcp.mjs # MCP smoke (listTools, pack, propose, RAG, …)
metrics.mjs # Store/pack/link/RAG health report
install-git-hook.sh # Install post-commit hook in another repo
.cursor/
hooks.json # Cursor hook wiring
hooks/*.sh # Shell wrappers → dist/*.js
skills/memory-token/SKILL.md # Agent workflow (read at session start)
rules/memory-token.mdc # Optional Cursor rulesCursor MCP
For npx or global installs see Install. The block below is for the from-source workflow:
"memory-token": {
"command": "node",
"args": [
"--disable-warning=ExperimentalWarning",
"/ABS/PATH/TO/memory-token/dist/mcp/index.js"
],
"env": {
"MEMORY_TOKEN_WORKSPACE": "${workspaceFolder}"
}
}Replace /ABS/PATH/TO/memory-token with this repo’s path. Restart Cursor after edits.
CLI / other clients: npm run mcp runs the server with cwd as workspace unless env overrides. memory-token-mcp is the package bin entry (same script).
Cursor hooks
Shipped in .cursor/hooks.json:
| Event | Script | Purpose |
| --- | --- | --- |
| sessionStart | .cursor/hooks/session-memory.sh | Skill banner (MEMORY_TOKEN_SKILL_PATH or .cursor/skills/memory-token/SKILL.md), policy cheat sheet, token-capped confirmed pack. |
| preCompact | .cursor/hooks/precompact-memory-hint.sh | Reminder to rag_ingest / propose / confirm before context compaction. |
| postToolUse | .cursor/hooks/post-tool-mcp-nudge.sh | Every MEMORY_TOKEN_NUDGE_EVERY tool calls (default 10), short [memory-token] nudge (skips tools already named like memory_token_*). |
| beforeSubmitPrompt | .cursor/hooks/user-message-triggers.sh (matcher: UserPromptSubmit) | Same as legacy “user send” nudge: phrases like “remember this”, “root cause”, long paste → additional_context for propose / rag_ingest. |
Hook env (optional)
| Variable | Default | Meaning |
| --- | --- | --- |
| MEMORY_TOKEN_SESSION_MAX_TOKENS | 2400 | Total rough budget for policy + pack in session hook (char/4 estimate). |
| MEMORY_TOKEN_NUDGE_EVERY | 10 | Post-tool nudge interval. |
| MEMORY_TOKEN_CLI_ROOT | auto | Absolute path to this repo when hooks live in another project (so dist/hook-*.js resolve). |
| MEMORY_TOKEN_SKILL_PATH | — | Custom SKILL.md path. |
MCP tools (full list)
| Tool | Purpose |
| --- | --- |
| memory_token_get_context_pack | Confirmed memories within max_tokens; optional query / tags rerank. |
| memory_token_propose | Create proposed memory (type, importance, optional body_compressed, force to bypass dedupe). |
| memory_token_confirm / memory_token_reject | Promote or drop proposals. |
| memory_token_compress_candidate | Deterministic squeeze for body_compressed. |
| memory_token_search | Substring search over memories. |
| memory_token_list_audit | Recent audit entries. |
| memory_token_prune | Remove old/matching proposed (and optionally confirmed) rows; dry_run preview. |
| memory_token_link / memory_token_unlink | Directed edges between memories (relation_type, optional bidirectional). |
| memory_token_traverse | BFS from a memory id over the link graph. |
| memory_token_stats | Workspace counts / health-style summary. |
| memory_token_export / memory_token_import | JSON backup / import (include_rag on export). |
| memory_token_rag_ingest | Store chunk (text_raw, optional text_compressed, source); dedupe by hash of raw. |
| memory_token_rag_query | Ranked compressed chunks within token budget (hybrid score). |
| memory_token_rag_get_full | Verbatim text_raw for an id (use before quoting or patching from RAG hits). |
| memory_token_rag_delete | Remove a chunk. |
Recommended link relation strings include SOLVES, CAUSES, BUILDS_ON, CONTRADICTS, SUPERSEDES, BLOCKS, REQUIRES, ALTERNATIVE_TO, RELATED_TO (see src/types.ts).
RAG embeddings (local first)
Default: local embeddings via @xenova/transformers + SQLite vectors (model Xenova/all-MiniLM-L6-v2). First run downloads into .memory-token/transformers-cache/ (needs network once).
| Env | Effect |
| --- | --- |
| MEMORY_TOKEN_LOCAL_EMBED_MODEL | Override Hugging Face model id (ONNX / Transformers.js). |
| MEMORY_TOKEN_LOCAL_EMBED_THREADS | ONNX WASM threads (default 2). |
| MEMORY_TOKEN_NO_LOCAL_EMBED=1 | Skip local embedder → lexical unless Ollama/OpenAI configured. |
| MEMORY_TOKEN_OLLAMA_EMBED_MODEL | Ollama embeddings. |
| MEMORY_TOKEN_OLLAMA_URL | Ollama base URL (default http://127.0.0.1:11434). |
| OPENAI_API_KEY / MEMORY_TOKEN_OPENAI_API_KEY | Optional cloud embeddings. |
This is not ChromaDB; the DB format is project-specific.
npm scripts
| Script | Command | Purpose |
| --- | --- | --- |
| build | tsc | Compile src/ → dist/. |
| mcp | node … dist/mcp/index.js | Run MCP stdio server. |
| smoke | node … scripts/smoke-mcp.mjs | End-to-end MCP smoke against repo root. |
| metrics | node … scripts/metrics.mjs | Human or --json report: store, pack usage, links, RAG, audit. Optional --root /path/to/workspace. |
Optional git post-commit hook
Install into another git repo:
./scripts/install-git-hook.sh /path/to/target/repoUses MEMORY_TOKEN_CLI_ROOT if the memory-token repo is not next to the script. On each commit, runs dist/hook-git-commit.js with MEMORY_TOKEN_WORKSPACE set to the target repo root (see scripts/install-git-hook.sh).
VS Code and Kiro
Stock VS Code has no Cursor-style sessionStart / postToolUse hooks. Options: run node dist/hook-session-start.js with MEMORY_TOKEN_WORKSPACE set, use a task, or an extension that exposes similar lifecycle events.
Kiro IDE: full install (MCP, steering, hooks, spec-task ideas, Cursor parity table) is in KIRO_SETUP.md. Example hook YAML lives under .kiro/hooks/.
Agent workflow
- Read
.cursor/skills/memory-token/SKILL.md(injected path on session start). - Call
memory_token_get_context_packearly; usememory_token_rag_querybefore reading many files or huge pasted context. - Treat RAG snippets as non-verbatim until
memory_token_rag_get_full. - Persist stable facts with propose → confirm; link related memories when useful; prune junk or stale proposals.
License
MIT © 2026 barbozaa
