mindkeg-mcp

v0.7.2

Published

2 months ago

A persistent memory MCP server for AI coding agents — stores, searches, and retrieves atomic learnings per repository.

0High
0Medium
0Low

carloluisito

mcp ai memory agent claude cursor learning knowledge

Mind Keg MCP

A persistent memory MCP server for AI coding agents. Stores atomic learnings — debugging insights, architectural decisions, codebase conventions — so every agent session starts with relevant institutional knowledge.

Problem

AI coding agents (Claude Code, Cursor, Windsurf) lose context between sessions. Hard-won insights are forgotten the moment a conversation ends. Developers repeatedly re-explain the same things; agents repeatedly make the same mistakes.

Mind Keg solves this with a centralized, persistent brain that any MCP-compatible agent can query and contribute to.

How It Works

Mind Keg implements a RAG (Retrieval-Augmented Generation) pattern for AI coding agents:

Retrieval — Agent searches the brain for relevant learnings using semantic or keyword search
Augmentation — Retrieved learnings are injected into the agent's conversation context
Generation — The agent responds with awareness of past discoveries and decisions

Unlike traditional RAG systems that chunk large documents, Mind Keg stores pre-curated atomic learnings (max 500 chars each). No chunking strategy needed — each learning IS the retrieval unit. The agent controls both retrieval and storage, creating a feedback loop where knowledge improves over time.

Features

Store and retrieve atomic learnings (max 500 chars, one insight per entry)
Semantic search with three provider options:
- FastEmbed (free, local, ONNX-based — BAAI/bge-small-en-v1.5, 384 dims)
- OpenAI (paid, best quality — text-embedding-3-small, 1536 dims)
- None (FTS5 keyword fallback — zero external dependencies)
Six categories: architecture, conventions, debugging, gotchas, dependencies, decisions
Free-form tags and group linking
Three scoping levels: repository-specific, workspace-wide, and global learnings
Dual transport: stdio (local) + HTTP+SSE (remote)
Auth-free stdio for local use; API key authentication with per-repository access control for HTTP
SQLite storage (zero dependencies, zero config)
Import/export for backup and migration
Smarter knowledge management: auto-categorization (KNN voting), conflict detection, smart staleness scoring, access tracking with relevance decay, near-duplicate merging, typed learning relationships
Enterprise security: encryption at rest, audit logging, TTL/data retention, Prometheus monitoring, rate limiting, content integrity verification

Quick Start

npx mindkeg-mcp init

That's it. This installs Mind Keg globally for your AI agent (Claude Code, Cursor, Windsurf). Open any project and your agent has persistent memory -- no API keys, no per-project setup.

For Claude Code, a SessionStart hook is also installed -- your agent loads prior knowledge automatically at the start of every session.

Options:

npx mindkeg-mcp init --agent cursor    # Target a specific agent
npx mindkeg-mcp init --project         # Per-project setup instead of global

init is idempotent -- safe to run multiple times. It merges with existing configs and never overwrites.

Manual setup

If you prefer to configure manually, or need HTTP mode:

Install

npm install -g mindkeg-mcp

Create an API key (only needed for HTTP mode)

mindkeg api-key create --name "My Laptop"
# Displays the key ONCE — save it securely
# mk_abc123...

API keys are only required for HTTP transport. stdio transport (used by Claude Code, Cursor, Windsurf local setups) is auth-free.

Connect your AI agent

Mind Keg works with any MCP-compatible AI coding agent. Choose your setup:

Claude Code — Add to ~/.claude.json or your project's .claude/mcp.json:

{
  "mcpServers": {
    "mindkeg": {
      "command": "mindkeg",
      "args": ["serve", "--stdio"]
    }
  }
}

Cursor — Add to .cursor/mcp.json or global settings:

{
  "mcpServers": {
    "mindkeg": {
      "command": "mindkeg",
      "args": ["serve", "--stdio"]
    }
  }
}

Windsurf — Add to ~/.codeium/windsurf/mcp_config.json:

{
  "mcpServers": {
    "mindkeg": {
      "command": "mindkeg",
      "args": ["serve", "--stdio"]
    }
  }
}

HTTP mode (any MCP client):

MINDKEG_API_KEY=mk_your_key mindkeg serve --http
# Listening on http://127.0.0.1:52100/mcp

{
  "mcpServers": {
    "mindkeg": {
      "type": "http",
      "url": "http://127.0.0.1:52100/mcp",
      "headers": {
        "Authorization": "Bearer mk_your_key_here"
      }
    }
  }
}

Other MCP-compatible agents — Mind Keg works with any agent that supports the Model Context Protocol — including Codex CLI, Gemini CLI, GitHub Copilot, and more. Use the stdio config above adapted to your agent's MCP settings format.

Add Mind Keg instructions to your repository

Copy templates/AGENTS.md to the root of any repository where you want agents to use Mind Keg.

AGENTS.md is the industry standard supported by 20+ AI tools (Cursor, Windsurf, Codex, Gemini CLI, GitHub Copilot, etc.).

Claude Code only: Claude Code doesn't auto-load AGENTS.md natively. Add @AGENTS.md to your CLAUDE.md to bridge it.

MCP Tools

8 consolidated tools (primary API):

| Tool | Description | |---|---| | get_context | Retrieve relevant knowledge — session primer, task-scoped context, or semantic search (replaces get_context, get_relevant_context, search_learnings) | | store | Save knowledge — learning, decision, finding, or gotcha (replaces store_learning, store_decision, store_finding, store_gotcha) | | update | Modify/manage knowledge — update, deprecate, flag_stale, delete, or merge (replaces update_learning, deprecate_learning, flag_stale, delete_learning, merge_learnings) | | resolve | Close out a decision or finding (replaces supersede_decision, resolve_finding) | | complete_run | Record a completed work session | | query | List knowledge by type — decisions, findings, gotchas, or runs (replaces get_decisions, get_open_findings, get_gotchas, get_run_history) | | list_scopes | List repositories and workspaces with counts (replaces list_repositories, list_workspaces) | | relate_learnings | Create typed relationships between learnings |

Backwards-compatible aliases: All 19 old tool names (store_learning, search_learnings, update_learning, deprecate_learning, flag_stale, delete_learning, merge_learnings, store_decision, get_decisions, supersede_decision, store_finding, resolve_finding, get_open_findings, store_gotcha, get_gotchas, get_run_history, get_relevant_context, list_repositories, list_workspaces) are registered as aliases that delegate to the same service methods. They will be removed in the next major version.

CLI Commands

# Global setup (one-time) — writes MCP config, SessionStart hook, runs migrations
mindkeg init
mindkeg init --agent cursor    # Target a specific agent (default: claude-code)
mindkeg init --project         # Per-project setup instead of global (optional)

# Database statistics
mindkeg stats
mindkeg stats --json

# Start in stdio mode (for local agent connections)
mindkeg serve --stdio

# Start in HTTP mode (for remote connections)
mindkeg serve --http

# API key management
mindkeg api-key create --name "My Key"
mindkeg api-key create --name "Team Key" --repositories /repo/a /repo/b
mindkeg api-key list
mindkeg api-key revoke <prefix>

# Database
mindkeg migrate

# Near-duplicate detection (backfill existing learnings)
mindkeg dedup-scan
mindkeg dedup-scan --dry-run

# Backup and restore
mindkeg export --output backup.json
mindkeg import backup.json --regenerate-embeddings

# Data retention
mindkeg purge --older-than 90          # Purge learnings older than 90 days
mindkeg purge --repository /path/repo  # Purge all learnings for a repo
mindkeg purge --all --confirm          # Purge everything (requires --confirm)

# Encryption at rest
mindkeg encrypt-db   # Encrypt existing database (requires MINDKEG_ENCRYPTION_KEY)
mindkeg decrypt-db   # Decrypt existing database (requires MINDKEG_ENCRYPTION_KEY)

# Integrity backfill
mindkeg backfill-integrity  # Compute SHA-256 hashes for legacy learnings

Configuration

| Environment Variable | Default | Description | |-------------------------------|------------------------------|-------------------------------------| | MINDKEG_SQLITE_PATH | ~/.mindkeg/brain.db | SQLite database file | | MINDKEG_EMBEDDING_PROVIDER | fastembed | fastembed, openai, or none | | OPENAI_API_KEY | (none) | OpenAI API key (when provider=openai)| | MINDKEG_HOST | 127.0.0.1 | HTTP server bind address | | MINDKEG_PORT | 52100 | HTTP server port | | MINDKEG_LOG_LEVEL | info | debug, info, warn, error | | MINDKEG_API_KEY | (none) | API key for HTTP transport (stdio is auth-free) |

Embedding providers

FastEmbed (default, free, local)

Semantic search works out of the box using FastEmbed — no API key needed, no network calls. Uses BAAI/bge-small-en-v1.5 (384 dimensions) via local ONNX Runtime. Model files are downloaded once on first use (~50MB).

OpenAI (paid, best quality)

export MINDKEG_EMBEDDING_PROVIDER=openai
export OPENAI_API_KEY=sk-...

Uses text-embedding-3-small (1536 dimensions). Best semantic search quality but requires an API key and incurs per-request costs.

None (keyword search only)

export MINDKEG_EMBEDDING_PROVIDER=none

Disables semantic search and falls back to SQLite FTS5 full-text search — all other features work identically.

Enterprise Security

Mind Keg ships a suite of security features suitable for corporate and regulated environments.

Encryption at Rest

Encrypt content and embedding fields using AES-256-GCM. All other fields (category, tags, timestamps) remain plaintext.

# Generate a 256-bit key
node -e "console.log(require('crypto').randomBytes(32).toString('base64'))"

export MINDKEG_ENCRYPTION_KEY=<your-base64-key>
mindkeg serve --stdio

To encrypt an existing database in-place:

MINDKEG_ENCRYPTION_KEY=<key> mindkeg encrypt-db
# Creates a backup automatically before operating

Note: FTS5 keyword search does not work when encryption is enabled. Use FastEmbed or OpenAI embedding providers for search.

Audit Logging

All MCP tool invocations are written to a structured JSON lines audit log (SIEM-compatible).

export MINDKEG_AUDIT_LOG=~/.mindkeg/audit.jsonl  # default
# Or: MINDKEG_AUDIT_LOG=stderr  (write to stderr alongside app logs)
# Or: MINDKEG_AUDIT_LOG=none    (disable)

Each audit entry contains: timestamp (ISO 8601), action, actor (API key prefix), resource_id, result, client transport metadata. Sensitive fields (content, embedding) are never logged.

TTL and Data Retention

Set a global default TTL or a per-learning TTL to automatically expire old entries.

export MINDKEG_DEFAULT_TTL_DAYS=365    # Expire all learnings after 1 year by default
export MINDKEG_PURGE_INTERVAL_HOURS=24 # Run purge every 24 hours (default)

Per-learning TTL overrides the global default:

{ "content": "...", "ttl_days": 30 }

Manual purge:

mindkeg purge --older-than 180 --confirm

Monitoring

HTTP transport exposes Prometheus-compatible endpoints:

GET /health   → JSON: { status, version, uptime, database }
GET /metrics  → Prometheus text format

Both endpoints are unauthenticated by default. Set MINDKEG_METRICS_AUTH=true to require API key auth.

Metrics exposed: mindkeg_learnings_total, mindkeg_tool_invocations_total, mindkeg_tool_duration_seconds, mindkeg_errors_total, mindkeg_uptime_seconds, mindkeg_search_latency_seconds.

Rate Limiting

HTTP transport enforces per-API-key token bucket rate limits with separate write and read buckets.

export MINDKEG_RATE_LIMIT_WRITE_RPM=100  # default: 100 write req/min per key
export MINDKEG_RATE_LIMIT_READ_RPM=300   # default: 300 read req/min per key

Returns HTTP 429 with Retry-After header when exceeded. stdio transport is not rate-limited.

Supply Chain Security

npm packages published with --provenance (Sigstore attestation via GitHub Actions)
CycloneDX SBOM generated and uploaded as a release asset on every GitHub release
Cosign signatures for npm tarballs uploaded as release assets

Content Integrity

SHA-256 integrity hashes are computed and stored for every learning on write. Verify on demand:

{ "query": "...", "verify_integrity": true }

Each result includes integrity_valid: true | false | null (null for legacy learnings without a stored hash).

Backfill integrity hashes for existing learnings:

mindkeg backfill-integrity

Data Model

Each learning contains:

| Field | Type |-------------------|---- | id | content | category | tags | repository | workspace | group_id | source | status | stale_flag | ttl_days | source_agent | integrity_hash | access_count | last_accessed_at| | staleness_score | created_at | updated_at | Notes | ---------------|-------------------------------------------------------------| | UUID | Auto-generated | | string (max 500) | The atomic learning text (sanitized on write) | | enum | One of 6 categories | | string[] | Free-form labels | | string or null | Repo path; null = workspace or global | | string or null | Workspace path; null = repo-specific or global | | UUID or null | Link related learnings | | string | Who created this (e.g., "claude-code") | | enum | active or deprecated | | boolean | Agent-flagged as potentially outdated | | integer or null | Per-learning TTL; overrides global MINDKEG_DEFAULT_TTL_DAYS | | string or null | Agent name for provenance tracking | | string or null | SHA-256 hash of canonical fields for tamper detection | | integer | Times returned by search/get_context (feeds ranking) | ISO 8601 or null | Last time returned by search/get_context | | float 0.0–1.0 | Auto-computed from age, access recency, and conflicts | | ISO 8601 | Auto-set on creation | | ISO 8601 | Auto-updated on modification; TTL expiry anchors to this |

Scoping

Learnings have three scope levels:

| Scope | repository | workspace | Visible where | |-------|-------------|-------------|---------------| | Repo-specific | set | null | Only that repo | | Workspace-wide | null | set | All repos in the same parent folder | | Global | null | null | Everywhere |

Workspaces are auto-detected from the parent folder of a repository path. For example, if your repos are organized as:

repositories/
  personal/     ← workspace
    app-a/
    app-b/
  work/          ← workspace
    project-x/

A workspace learning stored under repositories/personal/ is shared across app-a and app-b but not project-x.

When searching, results include all three scopes: repo-specific + workspace + global. Each result has a scope field indicating its level.

What Makes a Good Learning?

Atomic: One insight per entry. Max 500 characters.
Actionable: What to DO or AVOID, not just what exists.
Specific: Mentions the concrete context (library, pattern, file).

Good: "Always wrap Prisma queries in try/catch — it throws on constraint violations, not returns null."

Bad: "Be careful with the database." (too vague)

Development

# Clone and install
git clone ...
npm install

# Run tests
npm test

# Build
npm run build

# Development mode (rebuilds on change)
npm run dev

# Type check
npm run typecheck

Running without external APIs

Mind Keg works fully offline by default. FastEmbed provides free, local semantic search using ONNX Runtime — no API keys or network calls required. All CRUD operations and search work out of the box.

Architecture

CLI (Commander.js)
  └── init / stats / serve / api-key / migrate / export / import / dedup-scan
      purge / encrypt-db / decrypt-db / backfill-integrity

src/
  index.ts          Entry point, stdio + HTTP transports
  server.ts         MCP server + tool registration
  config.ts         Config loading (env vars → defaults)
  audit/            Structured JSON lines audit logger
  auth/             API key generation + validation middleware
  crypto/           AES-256-GCM field encryption
  hooks/            Hook script generation (SessionStart auto-retrieval)
  monitoring/       Prometheus metrics + /health endpoint
  security/         Content sanitization, integrity hashing, rate limiter
  tools/            MCP tool handlers (8 consolidated + 19 backwards-compatible aliases)
  services/         LearningService + EmbeddingService + PurgeService + ConflictDetector + StalenessEngine
  storage/          StorageAdapter interface + SQLite impl
  models/           Zod schemas + TypeScript types
  utils/            Logger (pino → stderr) + error classes

templates/
  AGENTS.md         Template for instructing agents to use Mind Keg

See CLAUDE.md for detailed development conventions.

License

MIT