@iflow-mcp/mimi180306-claude-persistent-memory
v1.1.1
Published
Persistent memory system for Claude Code — hybrid BM25 + vector search, LLM-driven structuring, automatic clustering
Readme
Features
Hybrid Search — BM25 full-text (FTS5) + vector semantic similarity (sqlite-vec), combined ranking (0.7 vector + 0.3 BM25)
4-Channel Retrieval — Pull (MCP tools on demand) + Push (auto-inject via hooks on user prompt, pre-tool, post-tool)
LLM Structuring — Memories auto-structured into <what>/<when>/<do>/<warn> XML format via Azure OpenAI
Multi-Project Isolation — Single shared embedding server routes requests by dataDir. Each project has its own database, no cross-contamination.
Automatic Clustering — Similar memories grouped, mature clusters merged into high-confidence consolidated memories
Confidence Scoring — Memories gain/lose confidence through validation feedback and usage patterns
Local-First — All data stored locally in SQLite. Your memories never leave your machine.
Quick Start
Install
# Set Azure OpenAI credentials (required for LLM structuring)
export AZURE_OPENAI_ENDPOINT="https://your-resource.openai.azure.com"
export AZURE_OPENAI_KEY="your-api-key"
# Install in any project
npm install @alex900530/claude-persistent-memoryThe postinstall script automatically:
- Generates
.claude-memory.config.js(project config) - Configures
.mcp.json(MCP server registration) - Configures
.claude/settings.json(5 lifecycle hooks) - Downloads and verifies the embedding model (bge-m3, ~2GB)
- Registers background services via launchd/systemd
- Updates
.gitignore
Open Claude Code in the project directory — memory is ready.
Note: The embedding model (~2GB) is downloaded and verified during install. If the download is interrupted or the model is corrupt, install will fail. Simply re-run
npm installto retry.
Configure later
If you skipped Azure credentials during install:
npx claude-persistent-memoryInstall from source
git clone https://github.com/MIMI180306/claude-persistent-memory.git
cd claude-persistent-memory
npm install
cp config.default.js config.js
# Edit config.js with your Azure credentials
# Start services
npm run embedding-server # Terminal 1
npm run llm-server # Terminal 2Then manually configure .mcp.json and .claude/settings.json — see Configuration.
Architecture
┌─────────────────────────────────────────────────────────────┐
│ Claude Code Session │
├─────────────────────────────────────────────────────────────┤
│ │
│ Pull Channel (on demand) Push Channels (auto) │
│ ┌───────────────────┐ ┌──────────────────────────────┐ │
│ │ MCP Server │ │ UserPromptSubmit Hook │ │
│ │ memory_search │ │ PreToolUse Hook │ │
│ │ memory_save │ │ PostToolUse Hook │ │
│ │ memory_validate │ │ PreCompact Hook (analysis) │ │
│ │ memory_stats │ │ SessionEnd Hook (clustering) │ │
│ └────────┬──────────┘ └──────────────┬───────────────┘ │
│ │ │ │
│ └──────────┬───────────────────┘ │
│ │ dataDir routing │
│ ▼ │
│ ┌───────────────────────────────────────────────────────┐ │
│ │ Shared Embedding Server (TCP :23811) │ │
│ │ bge-m3 model (shared across projects) │ │
│ │ Database pool (per-project by dataDir) │ │
│ └───────────────────────────────────────────────────────┘ │
│ │ │
│ ┌──────────────┼──────────────┐ │
│ ▼ ▼ ▼ │
│ ┌─────────┐ ┌─────────┐ ┌─────────┐ │
│ │Project A│ │Project B│ │Project C│ │
│ │memory.db│ │memory.db│ │memory.db│ │
│ └─────────┘ └─────────┘ └─────────┘ │
│ │
│ ┌───────────────────────────────────────────────────────┐ │
│ │ LLM Server (TCP :23812) │ │
│ │ Azure OpenAI GPT-4.1 │ │
│ └───────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────┘Multi-Project Support
The embedding server is shared across all projects. Each request carries a dataDir parameter that routes to the correct project's database:
- Embedding model — loaded once, shared across all projects (~2GB RAM)
- Database connections — pooled per
dataDir, created on first access (~5ms) - No cross-contamination — searching in Project A never returns Project B's memories
MCP Tools
| Tool | Description |
|------|-------------|
| memory_search | Hybrid BM25 + vector search. Params: query, limit?, type?, domain? |
| memory_save | Save a new memory. Params: content, type?, domain?, confidence? |
| memory_validate | Feedback loop — helpful (+0.1) or unhelpful (-0.05). Params: memory_id, is_valid |
| memory_stats | System stats: total memories, type/domain distribution, cluster status |
Hooks
| Hook | Event | Timeout | What it does |
|------|-------|---------|-------------|
| user-prompt-hook.js | UserPromptSubmit | 1500ms | Embeds user query, searches, injects top memories via stdout |
| pre-tool-memory-hook.js | PreToolUse | 300ms | Embeds tool context, searches, injects via additionalContext |
| post-tool-memory-hook.js | PostToolUse | 300ms | Embeds tool context + result, searches, injects via additionalContext |
| pre-compact-hook.js | PreCompact | async | Spawns LLM analysis of full transcript, extracts memories |
| session-end-hook.js | SessionEnd | async | Incremental transcript analysis + clustering + mature cluster merging |
Memory Types
| Type | Use case |
|------|----------|
| fact | Stable facts about the codebase |
| decision | Architectural decisions and rationale |
| bug | Bug fixes and root causes |
| pattern | Recurring code patterns |
| context | Session-specific context |
| preference | User workflow preferences |
| skill | Promoted from mature clusters |
Memory Lifecycle
Save → memory_save or auto-extract from transcript
Structure → LLM converts to <what>/<when>/<do>/<warn> XML
Embed → bge-m3 generates 1024-dim vector
Dedupe → Jaccard similarity >= 0.95 → update existing
Search → 0.7 * vectorSimilarity + 0.3 * normalizedBM25
Validate → memory_validate adjusts confidence ±
Cluster → similar memories auto-grouped
Merge → mature clusters consolidated into single memoryUninstall
npx claude-persistent-memory-uninstallOr manually: remove memory from .mcp.json, remove memory hooks from .claude/settings.json, then npm uninstall @alex900530/claude-persistent-memory. The .claude-memory/ data directory is preserved — delete manually if no longer needed.
Configuration
All settings in config.default.js (override via .claude-memory.config.js):
module.exports = {
embeddingPort: 23811, // TCP port for embedding server
llmPort: 23812, // TCP port for LLM server
dataDir: './data', // memory.db location (per-project)
azure: {
endpoint: process.env.AZURE_OPENAI_ENDPOINT,
apiKey: process.env.AZURE_OPENAI_KEY,
deployment: 'gpt-4-1',
},
embedding: {
model: 'Xenova/bge-m3', // 1024 dimensions, 8192 token context
dimensions: 1024,
},
search: {
maxResults: 3, // top-K results per query
minSimilarity: 0.6, // vector similarity threshold
},
cluster: {
similarityThreshold: 0.70, // min similarity to join a cluster
maturityCount: 5, // memories needed for mature cluster
},
};Project Structure
claude-persistent-memory/
├── bin/
│ ├── setup.js # postinstall + interactive setup
│ └── uninstall.js # cleanup script
├── hooks/
│ ├── user-prompt-hook.js # UserPromptSubmit → memory injection
│ ├── pre-tool-memory-hook.js # PreToolUse → memory injection
│ ├── post-tool-memory-hook.js # PostToolUse → memory injection
│ ├── pre-compact-hook.js # PreCompact → transcript analysis
│ └── session-end-hook.js # SessionEnd → clustering + merging
├── lib/
│ ├── memory-db.js # SQLite + FTS5 + sqlite-vec + connection pool
│ ├── embedding-client.js # TCP client for embedding server
│ ├── llm-client.js # TCP client for LLM server
│ ├── compact-analyzer.js # Transcript → memory extraction
│ └── utils.js
├── services/
│ ├── embedding-server.js # Shared embedding service (bge-m3)
│ ├── llm-server.js # LLM proxy (Azure OpenAI)
│ └── memory-mcp-server.js # MCP server (stdio, per-project)
├── config.default.js
└── package.jsonRequirements
- Node.js >= 18
- macOS or Linux
- ~2GB RAM for embedding model (bge-m3)
- ~2GB disk for model cache (
~/.cache/huggingface/transformers-js/) - Azure OpenAI API access (for LLM structuring)
Notes
- LLM provider: Currently supports Azure OpenAI only. Modify
services/llm-server.jsfor other providers. - Ports: Embedding and LLM servers default to TCP 23811 / 23812. Change in config if conflicting.
- Multi-project: All projects share one embedding server process. The model is loaded once; databases are pooled by
dataDir. - Data:
.claude-memory/directory (containingmemory.dband logs) is auto-created and gitignored per project.
Contributing
Contributions welcome! Please read the Contributing Guide before submitting a PR.
