@alex900530/claude-persistent-memory

v1.1.1

Published

3 months ago

Persistent memory system for Claude Code — hybrid BM25 + vector search, LLM-driven structuring, automatic clustering

Downloads

156

0High
0Medium
0Low

alex900530

Features

🧠 Hybrid Search — BM25 full-text (FTS5) + vector semantic similarity (sqlite-vec), combined ranking

📡 4-Channel Retrieval — Pull (MCP tools) + Push (auto-inject via hooks on user prompt, pre-tool, post-tool)

🏗️ LLM Structuring — Memories auto-structured into <what>/<when>/<do>/<warn> XML format

📦 Automatic Clustering — Similar memories grouped, mature clusters promoted to reusable skills

📊 Confidence Scoring — Memories gain/lose confidence through validation feedback and usage

⏳ Time Decay — Configurable half-lives per memory type (facts: 90d, context: 30d, skills: never)

🔒 Local-First — All data stored locally in SQLite. Your memories never leave your machine.

Architecture

┌─────────────────────────────────────────────────────────────┐
│                     Claude Code Session                     │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  Pull Channel (on demand)         Push Channels (auto)      │
│  ┌───────────────────┐    ┌──────────────────────────────┐  │
│  │ MCP Server        │    │ UserPromptSubmit Hook        │  │
│  │ memory_search     │    │ PreToolUse Hook              │  │
│  │ memory_save       │    │ PostToolUse Hook             │  │
│  │ memory_validate   │    │ PreCompact Hook (analysis)   │  │
│  │ memory_stats      │    │ SessionEnd Hook (clustering) │  │
│  └────────┬──────────┘    └──────────────┬───────────────┘  │
│           │                              │                  │
│           └──────────┬───────────────────┘                  │
│                      ▼                                      │
│  ┌─────────────────────────────────────────────────────────┐│
│  │              SQLite + FTS5 + sqlite-vec                 ││
│  │              (memory.db)                                ││
│  └─────────────────────────────────────────────────────────┘│
│                      ▲                                      │
│           ┌──────────┴───────────────────┐                  │
│           │                              │                  │
│  ┌────────┴──────────┐    ┌──────────────┴───────────────┐  │
│  │ Embedding Server  │    │ LLM Server                   │  │
│  │ TCP :23811        │    │ TCP :23812                   │  │
│  │ bge-m3 (1024d)    │    │ Azure OpenAI GPT-4.1        │  │
│  └───────────────────┘    └──────────────────────────────┘  │
│                                                             │
└─────────────────────────────────────────────────────────────┘

Quick Start

One-command install

# Set Azure OpenAI credentials (optional — skip to configure later)
export AZURE_OPENAI_ENDPOINT="https://your-resource.openai.azure.com"
export AZURE_OPENAI_KEY="your-api-key"

# Install in any project
npm install @alex900530/claude-persistent-memory

That's it. The postinstall script automatically:

Configures .mcp.json (MCP server)
Configures .claude/settings.json (5 hooks)
Generates .claude-memory.config.js (config)
Registers background services (embedding + LLM)
Updates .gitignore

Open Claude Code in the project directory — memory is ready.

Configure later

If you skipped Azure credentials during install:

npx claude-persistent-memory

This runs an interactive setup to configure credentials and start services.

Manual install (from source)

git clone https://github.com/MIMI180306/claude-persistent-memory.git
cd claude-persistent-memory
npm install
cp config.default.js config.js
# Edit config.js with your Azure credentials

# Start services
npm run embedding-server   # Terminal 1
npm run llm-server         # Terminal 2

Then manually configure .mcp.json and .claude/settings.json — see Configuration.

MCP Tools

| Tool | Description | |------|-------------| | memory_search | Hybrid BM25 + vector search. Params: query, limit?, type?, domain? | | memory_save | Save a new memory. Params: content, type?, domain?, confidence? | | memory_validate | Feedback loop — helpful (+0.1) or unhelpful (-0.05). Params: memory_id, is_valid | | memory_stats | System stats: total memories, type/domain distribution, cluster status |

Hooks

| Hook | Event | Timeout | What it does | |------|-------|---------|-------------| | user-prompt-hook.js | UserPromptSubmit | 1500ms | Embeds user query → searches → injects top memories via stdout | | pre-tool-memory-hook.js | PreToolUse | 300ms | Embeds tool context → searches → injects via additionalContext | | post-tool-memory-hook.js | PostToolUse | 300ms | Embeds tool context + result → searches → injects via additionalContext | | pre-compact-hook.js | PreCompact | async | Spawns LLM analysis of full transcript → extracts memories | | session-end-hook.js | SessionEnd | async | Incremental transcript analysis + clustering + mature cluster merging |

Memory Types

| Type | Half-life | Use case | |------|-----------|----------| | fact | 90 days | Stable facts about the codebase | | decision | 90 days | Architectural decisions and rationale | | bug | 60 days | Bug fixes and root causes | | pattern | 90 days | Recurring code patterns | | context | 30 days | Session-specific context | | preference | 60 days | User workflow preferences | | skill | never | Promoted from mature clusters |

Memory Lifecycle

1. Save       → memory_save or auto-extract from transcript
2. Structure  → LLM converts to <what>/<when>/<do>/<warn> XML
3. Embed      → bge-m3 generates 1024-dim vector
4. Search     → BM25 + vector similarity, combined ranking
5. Validate   → memory_validate adjusts confidence ±
6. Cluster    → similar memories auto-grouped
7. Promote    → mature clusters → skill memories
8. Decay      → low-confidence memories fade over time

Uninstall

npx claude-persistent-memory-uninstall

Or manually: remove the memory entry from .mcp.json, remove memory hooks from .claude/settings.json, then npm uninstall @alex900530/claude-persistent-memory. The .claude-memory/ data directory is preserved — delete it manually if no longer needed.

Configuration

All settings in config.default.js (auto-loaded, override via .claude-memory.config.js):

module.exports = {
  embeddingPort: 23811,          // TCP port for embedding server
  llmPort: 23812,                // TCP port for LLM server
  dataDir: './data',             // memory.db lives here
  azure: {
    endpoint: process.env.AZURE_OPENAI_ENDPOINT || '',
    apiKey: process.env.AZURE_OPENAI_KEY || '',
    deployment: process.env.AZURE_OPENAI_DEPLOYMENT || 'gpt-4.1',
  },
  embedding: {
    model: 'Xenova/bge-m3',     // 1024 dimensions, 8192 token context
    dimensions: 1024,
  },
  search: {
    maxResults: 3,               // top-K results per query
    minSimilarity: 0.6,          // vector similarity threshold
  },
  cluster: {
    similarityThreshold: 0.70,   // min similarity to join a cluster
    maturityCount: 5,            // memories needed for mature cluster
  },
};

Project Structure

claude-persistent-memory/
├── bin/                          # CLI scripts
│   ├── setup.js                  # postinstall + interactive setup
│   └── uninstall.js              # cleanup script
├── hooks/                        # Claude Code lifecycle hooks
│   ├── user-prompt-hook.js       # UserPromptSubmit → memory injection
│   ├── pre-tool-memory-hook.js   # PreToolUse → memory injection
│   ├── post-tool-memory-hook.js  # PostToolUse → memory injection
│   ├── pre-compact-hook.js       # PreCompact → transcript analysis
│   └── session-end-hook.js       # SessionEnd → clustering
├── lib/                          # Core libraries
│   ├── memory-db.js              # SQLite + FTS5 + sqlite-vec
│   ├── embedding-client.js       # TCP client for embedding server
│   ├── llm-client.js             # TCP client for LLM server
│   ├── compact-analyzer.js       # Transcript → memory extraction
│   └── utils.js                  # Minimal utilities
├── services/                     # Background services
│   ├── embedding-server.js       # TCP embedding service (bge-m3)
│   ├── llm-server.js             # TCP LLM proxy (Azure OpenAI)
│   └── memory-mcp-server.js      # MCP server for Claude Code
├── tools/
│   └── rebuild-vectors.js        # Rebuild all embeddings
├── config.default.js             # Configuration template
├── CLAUDE.md                     # Claude Code project instructions
└── package.json

Requirements

Node.js >= 18
macOS or Linux
~2GB RAM for embedding model (bge-m3)
Azure OpenAI API access (for LLM structuring)

Notes

LLM provider: Currently supports Azure OpenAI only. For standard OpenAI or other providers, modify services/llm-server.js.
Ports: Embedding and LLM servers default to TCP 23811 / 23812. Change in config.js if needed.
Data: The .claude-memory/ directory (containing memory.db and logs) is created automatically and gitignored.

Contributing

Contributions are welcome! Please read the Contributing Guide before submitting a PR.