npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

@provos/memory-mcp-server

v0.1.3

Published

Persistent memory MCP server with semantic search, LLM summarization, and automatic compaction

Downloads

248

Readme

Memory MCP Server

A persistent memory server for LLM agents using the Model Context Protocol. Provides semantic search, optional LLM summarization, and automatic maintenance — all backed by a single SQLite file.

Why Another Memory Server?

Most memory MCP servers fall into two camps: minimal and infrastructure-heavy. The official @modelcontextprotocol/server-memory stores a knowledge graph in a JSON file with substring search only — useful, but limited as memories grow. Mem0, Hindsight, and doobidoo offer semantic search and smart features, but require Python runtimes, Docker containers, or external databases (PostgreSQL, Qdrant, Neo4j).

This server fills a specific gap: a TypeScript MCP server with semantic search, LLM summarization, and automatic maintenance that runs from a single SQLite file with zero external dependencies.

  • Single command, single filenpx to start, one .db file to back up. SQLite + sqlite-vec + FTS5 handle storage, vector search, and keyword search in-process.
  • Works without an LLM, improves with one — Extractive retrieval, cosine-only dedup, and bullet-point formatting work out of the box. Adding a cheap LLM endpoint (Haiku, Ollama, etc.) enables abstractive summarization, direct question answering, contradiction detection, and smarter consolidation.
  • Token-budget-aware responses — Instead of dumping raw memories into your context window, the server summarizes and packs results to fit a specified token budget. The memory_context tool provides a structured session-start briefing — a capability unique to this server.

For a detailed competitive analysis covering 11 memory systems, see docs/designs/memory-server-comparison.md.

Features

  • Score-based hybrid fusion — vector similarity (BGE-base) + FTS5 BM25 keyword search, merged via Weaviate-style relativeScoreFusion with min-max normalized scores blended by alpha weighting. Unlike pure rank-based fusion (used by Letta, Zep, mind-mem, LangChain), score-based fusion retains the discriminating power of both retrieval signals — an approach validated by production search engines (Weaviate, Elasticsearch, Qdrant) and research (Bruch et al. 2023)
  • Cross-encoder reranking — ms-marco-MiniLM re-ranks candidates using raw logits with relative gap filtering, improving precision without aggressive cutoffs
  • Composite scoring — fusion relevance (incorporating vector + BM25 magnitudes) blended with recency, importance, and access pattern signals
  • Token-budget-aware retrieval — returns pre-summarized context blocks sized to fit your context window
  • Direct question answeringformat="answer" synthesizes across retrieved memories to answer questions directly, without a separate reader LLM
  • SQLite-native, zero external dependencies — embeddings and reranking run locally in-process; no Postgres, Neo4j, Redis, or Docker required. One .db file to back up
  • Works without an LLM, improves with one — extractive retrieval and bullet-point formatting work out of the box; adding a cheap LLM (Haiku, Ollama) enables abstractive summarization, direct answers, and contradiction detection
  • Automatic maintenance — unused memories decay over time; related memories compact into summaries via three-phase pipeline (consolidation, decay, compaction)
  • Namespace isolation — multiple agents or projects share one database without cross-contamination

Quick Start

Configure in Claude Desktop / MCP client

{
  "mcpServers": {
    "memory": {
      "command": "npx",
      "args": ["-y", "@provos/memory-mcp-server"],
      "env": {
        "MEMORY_DB_PATH": "~/.local/share/memory-mcp/default.db"
      }
    }
  }
}

With an LLM (recommended)

Adding an LLM enables summarization, direct question answering (format="answer"), and contradiction detection. The LLM client uses the OpenAI SDK, so MEMORY_LLM_BASE_URL must point to an OpenAI-compatible endpoint.

{
  "mcpServers": {
    "memory": {
      "command": "npx",
      "args": ["-y", "@provos/memory-mcp-server"],
      "env": {
        "MEMORY_DB_PATH": "~/.local/share/memory-mcp/default.db",
        "MEMORY_LLM_API_KEY": "your-api-key",
        "MEMORY_LLM_BASE_URL": "https://openrouter.ai/api/v1",
        "MEMORY_LLM_MODEL": "anthropic/claude-haiku-4-5-20251001"
      }
    }
  }
}

With Ollama (fully local)

{
  "mcpServers": {
    "memory": {
      "command": "npx",
      "args": ["-y", "@provos/memory-mcp-server"],
      "env": {
        "MEMORY_DB_PATH": "~/.local/share/memory-mcp/default.db",
        "MEMORY_LLM_BASE_URL": "http://localhost:11434/v1",
        "MEMORY_LLM_MODEL": "llama3.2:3b"
      }
    }
  }
}

Tools

memory_store

Store a memory for later retrieval. Memories are automatically embedded for semantic search.

content    (string, required)   — The memory content. Store one fact per call.
tags       (string[], optional) — For filtering, e.g. ["preference", "project:foo"]
importance (number, optional)   — 0-1 scale, controls decay resistance. Default 0.5.

memory_recall

Retrieve memories relevant to a query. Returns formatted results within a token budget.

query        (string, required)   — Natural language query
token_budget (integer, optional)  — Max tokens in response. Default 500.
tags         (string[], optional) — Filter to memories with ALL these tags
format       (string, optional)   — "summary" (default), "list", "raw", or "answer"

Format modes:

  • summary — LLM-generated briefing (falls back to extractive clusters without LLM)
  • list — Bullet list with dates and importance scores
  • raw — Full JSON with all metadata
  • answer — LLM answers the query directly by synthesizing across retrieved memories (falls back to list without LLM)

memory_context

Get a session briefing of relevant memories. Call at the start of each conversation.

task         (string, optional)   — Brief description of the current task
token_budget (integer, optional)  — Max tokens for briefing. Default 800.

memory_forget

Remove memories by ID, tag, query match, or timestamp.

ids     (string[], optional)  — Specific memory IDs to forget
tags    (string[], optional)  — Forget all memories with ALL these tags
query   (string, optional)    — Forget top-10 matches
before  (string, optional)    — ISO 8601 timestamp cutoff
confirm (boolean, optional)   — Required for query-based or bulk deletion
dry_run (boolean, optional)   — Preview what would be deleted. Default false.

memory_inspect

View memory statistics or inspect specific memories.

view  (string, optional)    — "stats" (default), "recent", "important", "tags", "export"
ids   (string[], optional)  — Inspect specific memories by ID
limit (integer, optional)   — Max items returned. Default 20.

Architecture

┌──────────────────────────────────────────────────┐
│                  MCP Client                      │
│            (Claude, Cursor, etc.)                │
└──────────────┬───────────────────────────────────┘
               │ stdio (JSON-RPC)
┌──────────────▼───────────────────────────────────┐
│  MCP Server (server.ts)                          │
│  Tool handlers: store, recall, context,          │
│                 forget, inspect                  │
└──────────────┬───────────────────────────────────┘
               │
┌──────────────▼───────────────────────────────────┐
│  Engine (engine-impl.ts)                         │
│  Wires together all subsystems                   │
├──────────────────────────────────────────────────┤
│                                                  │
│  ┌─────────────┐  ┌──────────────────────────┐   │
│  │  Embedder   │  │   Retrieval Pipeline     │   │
│  │ BGE-base    │  │  Vector KNN + FTS5 BM25  │   │
│  │  (local)    │  │  → Score-based fusion     │   │
│  └─────────────┘  │  → Composite scoring     │   │
│  ┌─────────────┐  │  → Cross-encoder rerank  │   │
│  │  Reranker   │  │  → Dedup by embedding    │   │
│  │ ms-marco    │  │  → Token budget packing  │   │
│  │  (local)    │  │  → Format (answer/       │   │
│  └─────────────┘  │     summary/list/raw)    │   │
│  ┌─────────────┐  └──────────────────────────┘   │
│  │  LLM Client │                                 │
│  │  (optional) │  ┌──────────────────────────┐   │
│  │  Haiku etc. │  │   Maintenance            │   │
│  └─────────────┘  │  Phase 0: Consolidation  │   │
│                   │    (batch LLM dedup)     │   │
│                   │  Phase 1: Vitality decay │   │
│                   │  Phase 2: Compaction     │   │
│                   └──────────────────────────┘   │
├──────────────────────────────────────────────────┤
│  SQLite (WAL mode)                               │
│  ┌────────────┬──────────────┬────────────────┐  │
│  │  memories   │ vec_memories │  memories_fts  │  │
│  │  (rows)     │ (768-dim)   │  (BM25 index)  │  │
│  └────────────┴──────────────┴────────────────┘  │
└──────────────────────────────────────────────────┘

Store path

  1. Embed content locally (~5ms)
  2. Check for near-exact duplicates (cosine distance < 0.05) — auto-merge, no LLM
  3. Insert with consolidated = false
  4. Every N stores (default 50), run maintenance which includes batch consolidation

Retrieval pipeline

  1. Candidate generation — parallel vector KNN (50 candidates) + FTS5 BM25 search (50 candidates, Porter stemming, bigram phrases)
  2. Score-based fusion — Weaviate-style relativeScoreFusion: min-max normalize vector similarity and BM25 scores independently to [0,1], then blend with alpha weighting (default 0.5). Candidates from only one source get only that source's weighted contribution
  3. Tag filter — optional intersection filter
  4. Composite scoring — fusion relevance (0.65) + recency (0.15) + importance (0.1) + access patterns (0.1). The fusion score already encodes vector + BM25 magnitudes
  5. Relevance gating — drop candidates with fusion score < 5% of max
  6. Cross-encoder reranking — ms-marco-MiniLM-L-6-v2 re-scores candidates; relative gap filter (5 logit points from best)
  7. Deduplication — remove near-duplicates by embedding cosine similarity
  8. Token budget packing — greedily select memories by score until budget is filled (skip, not break)
  9. Formattinganswer (LLM synthesis), summary (LLM briefing), list (bullets), or raw (JSON)

Maintenance (amortized, no background processes)

  • Consolidation — finds unconsolidated memories, groups close candidates, makes one batched LLM call to classify duplicates/contradictions/distinct. Also runs on server startup to handle leftovers from previous sessions.
  • Decay — samples memories and checks vitality (half-life proportional to importance, boosted by access frequency). Below-threshold memories decay.
  • Compaction — clusters decayed memories and summarizes them via LLM, preserving originals as soft-deleted references.

Graceful LLM degradation

Every LLM-enhanced feature has an extractive fallback:

| Feature | With LLM | Without LLM | |---------|----------|-------------| | Recall formatting | Abstractive summary or direct answer | Extractive bullet list | | Dedup on store | Exact-match heuristic only | Same (LLM dedup deferred to consolidation) | | Consolidation | Batch LLM judgment | Mark all as consolidated | | Compaction | LLM-generated summaries | Skip compaction | | Contradiction detection | LLM classification | Only exact-match merging |

Configuration

All settings are controlled via environment variables:

| Variable | Default | Description | |----------|---------|-------------| | MEMORY_DB_PATH | ~/.local/share/memory-mcp/default.db | SQLite database path | | MEMORY_NAMESPACE | default | Namespace for memory isolation | | MEMORY_EMBEDDING_MODEL | Xenova/bge-base-en-v1.5 | HuggingFace embedding model | | MEMORY_EMBEDDING_DTYPE | q8 | Model quantization (q8, fp16, fp32) | | MEMORY_RERANKER_ENABLED | true | Enable cross-encoder reranking | | MEMORY_RERANKER_MODEL | Xenova/ms-marco-MiniLM-L-6-v2 | HuggingFace reranker model | | MEMORY_LLM_API_KEY | (none) | API key for LLM (enables enhanced features) | | MEMORY_LLM_BASE_URL | (none) | OpenAI-compatible endpoint (must support /v1/chat/completions) | | MEMORY_LLM_MODEL | claude-haiku-4-5-20251001 | LLM model name | | MEMORY_DEFAULT_TOKEN_BUDGET | 500 | Default token budget for recall | | MEMORY_MAINTENANCE_INTERVAL | 50 | Stores between maintenance passes | | MEMORY_DECAY_THRESHOLD | 0.05 | Vitality below which memories decay | | MEMORY_COMPACTION_MIN_GROUP | 10 | Min decayed memories before compaction | | MEMORY_CONSOLIDATION_BATCH_SIZE | 50 | Max memories per consolidation pass |

Agent Integration

The package exports system prompts and tool descriptions for integrating the memory server with LLM agents:

import { MEMORY_SYSTEM_PROMPT } from '@provos/memory-mcp-server/prompts';

// Append to your agent's system prompt
const systemPrompt = basePrompt + '\n\n' + MEMORY_SYSTEM_PROMPT;

For custom setups, use the configurable builder:

import { buildMemorySystemPrompt } from '@provos/memory-mcp-server/prompts';

const prompt = buildMemorySystemPrompt({
  persona: 'Research Assistant',
  additionalInstructions: 'Always tag paper references with "paper:<title>".',
});

The TOOL_DESCRIPTIONS export provides enhanced descriptions used in tool registration:

import { TOOL_DESCRIPTIONS } from '@provos/memory-mcp-server/prompts';
// TOOL_DESCRIPTIONS.memory_store, .memory_recall, .memory_context, etc.

Development

npm install
npm run build          # TypeScript compilation
npm test               # Run unit tests
npm run test:e2e       # Run end-to-end tests (spawns real server)
npm run lint           # ESLint
npm run format         # Prettier
npm run benchmark      # Run retrieval quality benchmark

License

Apache-2.0