@nacho-labs/mcp-semantic-search

v0.2.0

Published

a day ago

MCP server for local semantic search — give Claude Code, Cursor, and other AI tools persistent memory powered by nachos-embeddings

Downloads

234

0High
0Medium
0Low

naterchrdsn

mcp model-context-protocol semantic-search embeddings vector-search claude claude-code ai-memory local privacy

@nacho-labs/mcp-semantic-search

MCP server that gives AI coding tools persistent semantic memory. Index decisions, patterns, and project context — recall them by meaning, not keywords.

Prerequisites

Node.js 18+
Internet on first run to download the embedding model (~25MB, cached permanently)

Quick start

Claude Code

claude mcp add --transport stdio semantic-search -- npx @nacho-labs/mcp-semantic-search

Cursor / VS Code / any MCP client

Add to your MCP config (.mcp.json, mcp.json, or client-specific config):

{
  "mcpServers": {
    "semantic-search": {
      "type": "stdio",
      "command": "npx",
      "args": ["@nacho-labs/mcp-semantic-search"]
    }
  }
}

That's it. Your AI tool now has six semantic memory tools.

Tools

| Tool | Description | | ---- | ----------- | | semantic_search | Search indexed documents by meaning | | semantic_index | Add a document to the index | | semantic_index_batch | Add multiple documents at once | | semantic_remove | Remove a document by ID | | semantic_stats | Get index size, store location, and config | | semantic_clear | Remove all documents (requires confirmation) |

What it does

You ask Claude: "How do we handle rate limiting?"
                 |
Claude calls:    semantic_search("rate limiting")
                 |
Server embeds:   query -> 384-dimension vector
                 |
Cosine search:   against all indexed vectors
                 |
Returns:         "We throttle API requests using sliding windows..."
                 (matched by meaning, not keywords)

The embedding model understands meaning:

| Query | Finds | | ----- | ----- | | "rate limiting" | "We throttle API requests using sliding windows" | | "how to deploy" | "Production runs via docker compose up with..." | | "error handling" | "We use Result types instead of try/catch for..." |

What to index

High-value content for project memory:

Architecture decisions — "We chose PostgreSQL over DynamoDB because we need complex joins for the reporting module."

Code patterns — "Authentication middleware is in src/middleware/auth.ts. Uses JWT with RS256, tokens expire after 1 hour, refresh tokens after 30 days."

Conventions — "All API endpoints return { data, error, meta } shape. Errors use RFC 7807 problem details format."

Debugging insights — "If the worker queue backs up, check Redis memory. The default maxmemory-policy is noeviction which causes write failures."

Configuration

CLI arguments

npx @nacho-labs/mcp-semantic-search \
  --store /path/to/store.json \
  --similarity 0.5 \
  --model Xenova/all-mpnet-base-v2 \
  --cache-dir /tmp/models

Environment variables

| Variable | Description | Default | | -------- | ----------- | ------- | | MCP_SEMANTIC_STORE | Path to persistence file | .semantic-store.json | | MCP_SEMANTIC_SIMILARITY | Min similarity threshold (0-1) | 0.6 | | MCP_SEMANTIC_MODEL | Embedding model | Xenova/all-MiniLM-L6-v2 | | MCP_SEMANTIC_CACHE_DIR | Model cache directory | .cache/transformers |

With environment variables in MCP config

{
  "mcpServers": {
    "semantic-search": {
      "type": "stdio",
      "command": "npx",
      "args": ["@nacho-labs/mcp-semantic-search"],
      "env": {
        "MCP_SEMANTIC_STORE": "/home/user/.semantic-memory/project.json",
        "MCP_SEMANTIC_SIMILARITY": "0.5"
      }
    }
  }
}

Performance

| Operation | Time | | --------- | ---- | | Server startup (model cached) | ~500ms | | Server startup (first run) | ~2-5s | | Index a document | ~10-50ms | | Search 1000 documents | ~5-10ms |

Memory: ~100MB for model + ~1.5KB per document.

The in-memory store works well up to ~10K documents. Beyond that, consider a dedicated vector database.

Persistence

The index is saved to disk automatically after every write operation (index, remove, clear). On startup, the server loads the existing store if present.

Default location: .semantic-store.json in the working directory.

How it's built

This MCP server is a thin wrapper around two packages:

@nacho-labs/nachos-embeddings — Local vector embeddings and semantic search
@modelcontextprotocol/sdk — Official MCP TypeScript SDK

The embeddings package can also be used directly in your own code. See its README for the standalone API.

Development

git clone https://github.com/nacho-labs-llc/mcp-semantic-search.git
cd mcp-semantic-search
npm install
npm run build
npm start

License

MIT

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

@nacho-labs/mcp-semantic-search

Prerequisites

Quick start

Claude Code

Cursor / VS Code / any MCP client

Tools

What it does

What to index

Configuration

CLI arguments

Environment variables

With environment variables in MCP config

Performance

Persistence

How it's built

Development

License