deepmatch-mcp

v0.1.0

Published

a month ago

Semantic code search server for MCP, powered by vector embeddings and Qdrant

0High
0Medium
0Low

kuyermqi

mcp semantic-search embeddings code-search qdrant

DeepMatch MCP

A Model Context Protocol (MCP) server for semantic code search using vector embeddings. Index your codebase and search with natural language queries.

Features

Semantic Code Search: Find code by meaning, not just keywords
Multiple Embedding Providers: OpenAI, Ollama, Gemini, OpenAI-compatible APIs
Real-time File Watching: Automatically re-index on file changes
Multi-repository Support: Index multiple directories simultaneously
Smart Filtering: Respects .gitignore, skips binary files and common build directories
MCP Protocol: Works with any MCP-compatible client (Claude Desktop, etc.)

Architecture

┌─────────────────────────────────────────────────────────────────┐
│                        deepmatch-mcp                            │
├─────────────────────────────────────────────────────────────────┤
│  CLI Entrypoint (src/cli.ts)                                    │
│  - Parses config from CLI flags and environment variables       │
│  - Orchestrates startup: scan → index → watch → serve           │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────────────────┐  │
│  │   Config    │  │  Providers  │  │     Vector Store        │  │
│  │  (Zod)      │  │ (Embedders) │  │      (Qdrant)           │  │
│  └─────────────┘  └─────────────┘  └─────────────────────────┘  │
│                                                                 │
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────────────────┐  │
│  │  Scanner    │  │  Chunker    │  │     Index Manager       │  │
│  │ (Directory) │  │ (Line-based)│  │   (Batch Processing)    │  │
│  └─────────────┘  └─────────────┘  └─────────────────────────┘  │
│                                                                 │
│  ┌─────────────┐  ┌───────────────────────────────────────────┐ │
│  │  Watcher    │  │              MCP Server                   │ │
│  │ (Chokidar)  │  │  (stdio transport, 'search' tool)         │ │
│  └─────────────┘  └───────────────────────────────────────────┘ │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

Module Overview

| Module | Path | Description | |--------|------|-------------| | Config | src/config/ | CLI/ENV parsing with Zod validation | | Providers | src/providers/ | Embedding providers (OpenAI, Ollama, Gemini, OpenAI-compatible) | | Store | src/store/ | Qdrant vector database wrapper | | Chunker | src/chunker/ | Line-based text chunking with configurable limits | | Scanner | src/scanner/ | Directory traversal with .gitignore support | | Indexer | src/indexer/ | Batch embedding and vector upsert orchestration | | Watcher | src/watcher/ | File change detection with debouncing | | MCP | src/mcp/ | MCP stdio server with search tool |

Installation

# Install dependencies
npm install

# Build
npm run build

Usage

Prerequisites

Qdrant - Vector database (default: http://localhost:6333)
```
# Using Docker
docker run -p 6333:6333 qdrant/qdrant
```
Embedding Provider - One of:
- OpenAI API key
- Ollama running locally
- Gemini API key
- Any OpenAI-compatible API

CLI Options

npx deepmatch-mcp [options]

Options:
  --path <path>              Repository path to index (repeatable)
  --provider <provider>      Embedding provider: openai|ollama|gemini|openai-compatible
  --model <model>            Embedding model name
  --embedding-dim <dim>      Embedding dimension (auto-detected if not set)
  --batch-size <size>        Batch size for embeddings (default: 60)
  --max-files <count>        Maximum files to index (default: 50000)
  --qdrant-url <url>         Qdrant server URL (default: http://localhost:6333)
  --qdrant-key <key>         Qdrant API key
  --openai-key <key>         OpenAI API key
  --ollama-url <url>         Ollama server URL
  --gemini-key <key>         Gemini API key
  --openai-compat-base-url   OpenAI-compatible base URL
  --openai-compat-key        OpenAI-compatible API key

Environment Variables

All CLI options can be set via environment variables:

| Variable | CLI Flag | |----------|----------| | DEEPMATCH_PATHS | --path (comma-separated) | | DEEPMATCH_PROVIDER | --provider | | DEEPMATCH_MODEL | --model | | DEEPMATCH_EMBEDDING_DIM | --embedding-dim | | DEEPMATCH_BATCH_SIZE | --batch-size | | DEEPMATCH_MAX_FILES | --max-files | | DEEPMATCH_QDRANT_URL | --qdrant-url | | DEEPMATCH_QDRANT_API_KEY | --qdrant-key | | DEEPMATCH_OPENAI_API_KEY | --openai-key | | DEEPMATCH_OLLAMA_URL | --ollama-url | | DEEPMATCH_GEMINI_API_KEY | --gemini-key | | DEEPMATCH_OPENAI_COMPAT_BASE_URL | --openai-compat-base-url | | DEEPMATCH_OPENAI_COMPAT_API_KEY | --openai-compat-key |

CLI flags take precedence over environment variables.

Examples

With OpenAI:

npx deepmatch-mcp \
  --path /path/to/your/repo \
  --provider openai \
  --openai-key sk-xxx

With Ollama:

npx deepmatch-mcp \
  --path /path/to/repo1 \
  --path /path/to/repo2 \
  --provider ollama \
  --ollama-url http://localhost:11434 \
  --model nomic-embed-text

With environment variables:

export DEEPMATCH_PATHS="/path/to/repo"
export DEEPMATCH_PROVIDER="openai"
export DEEPMATCH_OPENAI_API_KEY="sk-xxx"
npx deepmatch-mcp

MCP Client Configuration

For Claude Desktop, add to your MCP settings:

{
  "mcpServers": {
    "deepmatch": {
      "command": "npx",
      "args": ["deepmatch-mcp", "--path", "/path/to/repo", "--provider", "openai"],
      "env": {
        "DEEPMATCH_OPENAI_API_KEY": "sk-xxx"
      }
    }
  }
}

MCP Tools

`search`

Search for code using semantic similarity.

Input Schema:

| Parameter | Type | Required | Description | |-----------|------|----------|-------------| | query | string | Yes | Natural language search query | | limit | number | No | Max results (1-50, default: 10) | | paths | string[] | No | Filter to specific repository paths | | minScore | number | No | Minimum similarity score (0-1) |

Output:

{
  "total_count": 5,
  "items": [
    {
      "filePath": "/repo/src/auth.ts",
      "repoPath": "/repo",
      "startLine": 10,
      "endLine": 25,
      "codeChunk": "function authenticate(token: string) {...}",
      "score": 0.92
    }
  ]
}

Local Development

Setup

# Clone and install
git clone https://github.com/657KB/deepmatch-mcp
cd deep-match
npm install

Development Workflow

# Run tests (TDD)
npm test

# Run tests in watch mode
npx vitest

# Build TypeScript
npm run build

# Test the CLI
node dist/cli.js --help

Project Structure

src/
├── cli.ts                 # Main entry point
├── config/
│   ├── schema.ts          # Zod schemas and defaults
│   ├── index.ts           # CLI/ENV parsing
│   └── config.test.ts
├── providers/
│   ├── types.ts           # IEmbedder interface
│   ├── embedders.ts       # Provider implementations
│   ├── index.ts
│   └── embedders.test.ts
├── store/
│   ├── types.ts           # IVectorStore interface
│   ├── qdrant.ts          # Qdrant implementation
│   ├── index.ts
│   └── qdrant.test.ts
├── chunker/
│   ├── extensions.ts      # Supported file extensions
│   ├── chunker.ts         # Line-based chunking
│   ├── index.ts
│   └── chunker.test.ts
├── scanner/
│   ├── scanner.ts         # Directory traversal
│   ├── index.ts
│   └── scanner.test.ts
├── indexer/
│   ├── index-manager.ts   # Batch indexing orchestration
│   ├── index.ts
│   └── index-manager.test.ts
├── watcher/
│   ├── file-watcher.ts    # Chokidar file watching
│   ├── index.ts
│   └── file-watcher.test.ts
└── mcp/
    ├── server.ts          # MCP server + search tool
    ├── index.ts
    └── server.test.ts

Running Tests

# Run all tests
npm test

# Run specific test file
npx vitest src/chunker/chunker.test.ts

# Run with coverage
npx vitest --coverage

Configuration Defaults (Roo-Code Aligned)

| Parameter | Default | Description | |-----------|---------|-------------| | batchSize | 60 | Embedding batch size | | maxFiles | 50,000 | Maximum files to index | | chunkMin | 50 | Minimum chunk size (chars) | | chunkMax | 1,000 | Maximum chunk size (chars) | | chunkMaxTolerance | 1.15 | Tolerance factor for max size | | chunkRebalanceMin | 200 | Minimum remainder to trigger rebalance | | qdrantUrl | http://localhost:6333 | Qdrant server URL |

File Filtering

Supported Extensions

TypeScript, JavaScript, Python, Java, C/C++, C#, Go, Rust, Ruby, PHP, Swift, Kotlin, Scala, Lua, R, Perl, Shell, SQL, HTML, CSS, JSON, YAML, XML, Markdown, Vue, Svelte

Excluded Directories

node_modules, dist, build, target, .git, hidden directories, __pycache__, venv, .next, .nuxt, coverage, vendor, Pods, .gradle, .idea, .vscode

Additional Filters

Files larger than 1MB are skipped
.gitignore rules are respected (stacked for nested directories)

License

MIT