@iflow-mcp/mimi180306-claude-persistent-memory

v1.1.1

Published

a month ago

Persistent memory system for Claude Code — hybrid BM25 + vector search, LLM-driven structuring, automatic clustering

0High
0Medium
0Low

chatflowdev

qystart

Features

Hybrid Search — BM25 full-text (FTS5) + vector semantic similarity (sqlite-vec), combined ranking (0.7 vector + 0.3 BM25)

4-Channel Retrieval — Pull (MCP tools on demand) + Push (auto-inject via hooks on user prompt, pre-tool, post-tool)

LLM Structuring — Memories auto-structured into <what>/<when>/<do>/<warn> XML format via Azure OpenAI

Multi-Project Isolation — Single shared embedding server routes requests by dataDir. Each project has its own database, no cross-contamination.

Automatic Clustering — Similar memories grouped, mature clusters merged into high-confidence consolidated memories

Confidence Scoring — Memories gain/lose confidence through validation feedback and usage patterns

Local-First — All data stored locally in SQLite. Your memories never leave your machine.

Quick Start

Install

# Set Azure OpenAI credentials (required for LLM structuring)
export AZURE_OPENAI_ENDPOINT="https://your-resource.openai.azure.com"
export AZURE_OPENAI_KEY="your-api-key"

# Install in any project
npm install @alex900530/claude-persistent-memory

The postinstall script automatically:

Generates .claude-memory.config.js (project config)
Configures .mcp.json (MCP server registration)
Configures .claude/settings.json (5 lifecycle hooks)
Downloads and verifies the embedding model (bge-m3, ~2GB)
Registers background services via launchd/systemd
Updates .gitignore

Open Claude Code in the project directory — memory is ready.

Note: The embedding model (~2GB) is downloaded and verified during install. If the download is interrupted or the model is corrupt, install will fail. Simply re-run npm install to retry.

Configure later

If you skipped Azure credentials during install:

npx claude-persistent-memory

Install from source

git clone https://github.com/MIMI180306/claude-persistent-memory.git
cd claude-persistent-memory
npm install
cp config.default.js config.js
# Edit config.js with your Azure credentials

# Start services
npm run embedding-server   # Terminal 1
npm run llm-server         # Terminal 2

Then manually configure .mcp.json and .claude/settings.json — see Configuration.

Architecture

┌─────────────────────────────────────────────────────────────┐
│                     Claude Code Session                     │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  Pull Channel (on demand)         Push Channels (auto)      │
│  ┌───────────────────┐    ┌──────────────────────────────┐  │
│  │ MCP Server        │    │ UserPromptSubmit Hook        │  │
│  │ memory_search     │    │ PreToolUse Hook              │  │
│  │ memory_save       │    │ PostToolUse Hook             │  │
│  │ memory_validate   │    │ PreCompact Hook (analysis)   │  │
│  │ memory_stats      │    │ SessionEnd Hook (clustering) │  │
│  └────────┬──────────┘    └──────────────┬───────────────┘  │
│           │                              │                  │
│           └──────────┬───────────────────┘                  │
│                      │  dataDir routing                     │
│                      ▼                                      │
│  ┌───────────────────────────────────────────────────────┐  │
│  │            Shared Embedding Server (TCP :23811)       │  │
│  │            bge-m3 model (shared across projects)      │  │
│  │            Database pool (per-project by dataDir)     │  │
│  └───────────────────────────────────────────────────────┘  │
│                      │                                      │
│       ┌──────────────┼──────────────┐                       │
│       ▼              ▼              ▼                       │
│  ┌─────────┐   ┌─────────┐   ┌─────────┐                   │
│  │Project A│   │Project B│   │Project C│                    │
│  │memory.db│   │memory.db│   │memory.db│                    │
│  └─────────┘   └─────────┘   └─────────┘                   │
│                                                             │
│  ┌───────────────────────────────────────────────────────┐  │
│  │            LLM Server (TCP :23812)                    │  │
│  │            Azure OpenAI GPT-4.1                       │  │
│  └───────────────────────────────────────────────────────┘  │
│                                                             │
└─────────────────────────────────────────────────────────────┘

Multi-Project Support

The embedding server is shared across all projects. Each request carries a dataDir parameter that routes to the correct project's database:

Embedding model — loaded once, shared across all projects (~2GB RAM)
Database connections — pooled per dataDir, created on first access (~5ms)
No cross-contamination — searching in Project A never returns Project B's memories

MCP Tools

| Tool | Description | |------|-------------| | memory_search | Hybrid BM25 + vector search. Params: query, limit?, type?, domain? | | memory_save | Save a new memory. Params: content, type?, domain?, confidence? | | memory_validate | Feedback loop — helpful (+0.1) or unhelpful (-0.05). Params: memory_id, is_valid | | memory_stats | System stats: total memories, type/domain distribution, cluster status |

Hooks

| Hook | Event | Timeout | What it does | |------|-------|---------|-------------| | user-prompt-hook.js | UserPromptSubmit | 1500ms | Embeds user query, searches, injects top memories via stdout | | pre-tool-memory-hook.js | PreToolUse | 300ms | Embeds tool context, searches, injects via additionalContext | | post-tool-memory-hook.js | PostToolUse | 300ms | Embeds tool context + result, searches, injects via additionalContext | | pre-compact-hook.js | PreCompact | async | Spawns LLM analysis of full transcript, extracts memories | | session-end-hook.js | SessionEnd | async | Incremental transcript analysis + clustering + mature cluster merging |

Memory Types

| Type | Use case | |------|----------| | fact | Stable facts about the codebase | | decision | Architectural decisions and rationale | | bug | Bug fixes and root causes | | pattern | Recurring code patterns | | context | Session-specific context | | preference | User workflow preferences | | skill | Promoted from mature clusters |

Memory Lifecycle

Save       → memory_save or auto-extract from transcript
Structure  → LLM converts to <what>/<when>/<do>/<warn> XML
Embed      → bge-m3 generates 1024-dim vector
Dedupe     → Jaccard similarity >= 0.95 → update existing
Search     → 0.7 * vectorSimilarity + 0.3 * normalizedBM25
Validate   → memory_validate adjusts confidence ±
Cluster    → similar memories auto-grouped
Merge      → mature clusters consolidated into single memory

Uninstall

npx claude-persistent-memory-uninstall

Or manually: remove memory from .mcp.json, remove memory hooks from .claude/settings.json, then npm uninstall @alex900530/claude-persistent-memory. The .claude-memory/ data directory is preserved — delete manually if no longer needed.

Configuration

All settings in config.default.js (override via .claude-memory.config.js):

module.exports = {
  embeddingPort: 23811,          // TCP port for embedding server
  llmPort: 23812,                // TCP port for LLM server
  dataDir: './data',             // memory.db location (per-project)
  azure: {
    endpoint: process.env.AZURE_OPENAI_ENDPOINT,
    apiKey: process.env.AZURE_OPENAI_KEY,
    deployment: 'gpt-4-1',
  },
  embedding: {
    model: 'Xenova/bge-m3',     // 1024 dimensions, 8192 token context
    dimensions: 1024,
  },
  search: {
    maxResults: 3,               // top-K results per query
    minSimilarity: 0.6,          // vector similarity threshold
  },
  cluster: {
    similarityThreshold: 0.70,   // min similarity to join a cluster
    maturityCount: 5,            // memories needed for mature cluster
  },
};

Project Structure

claude-persistent-memory/
├── bin/
│   ├── setup.js                  # postinstall + interactive setup
│   └── uninstall.js              # cleanup script
├── hooks/
│   ├── user-prompt-hook.js       # UserPromptSubmit → memory injection
│   ├── pre-tool-memory-hook.js   # PreToolUse → memory injection
│   ├── post-tool-memory-hook.js  # PostToolUse → memory injection
│   ├── pre-compact-hook.js       # PreCompact → transcript analysis
│   └── session-end-hook.js       # SessionEnd → clustering + merging
├── lib/
│   ├── memory-db.js              # SQLite + FTS5 + sqlite-vec + connection pool
│   ├── embedding-client.js       # TCP client for embedding server
│   ├── llm-client.js             # TCP client for LLM server
│   ├── compact-analyzer.js       # Transcript → memory extraction
│   └── utils.js
├── services/
│   ├── embedding-server.js       # Shared embedding service (bge-m3)
│   ├── llm-server.js             # LLM proxy (Azure OpenAI)
│   └── memory-mcp-server.js      # MCP server (stdio, per-project)
├── config.default.js
└── package.json

Requirements

Node.js >= 18
macOS or Linux
~2GB RAM for embedding model (bge-m3)
~2GB disk for model cache (~/.cache/huggingface/transformers-js/)
Azure OpenAI API access (for LLM structuring)

Notes

LLM provider: Currently supports Azure OpenAI only. Modify services/llm-server.js for other providers.
Ports: Embedding and LLM servers default to TCP 23811 / 23812. Change in config if conflicting.
Multi-project: All projects share one embedding server process. The model is loaded once; databases are pooled by dataDir.
Data: .claude-memory/ directory (containing memory.db and logs) is auto-created and gitignored per project.

Contributing

Contributions welcome! Please read the Contributing Guide before submitting a PR.

License

MIT