mcp-code-search
v1.2.1
Published
MCP server for semantic code search across all neulandAI repositories
Readme
Codebase Search MCP Server
A Model Context Protocol (MCP) server for semantic code search across your repositories. Integrates with AI tools like Claude Desktop and Cursor.
Features
Search Capabilities
- Hybrid Search: Combines semantic (vector) and keyword (BM25) search with RRF fusion
- Multiple Search Modes:
hybrid,semantic,keyword,regex - Query Filters: Filter by repo, language, type, path, category
- Example:
auth repo:portal lang:ts -path:test
- Example:
- Synonym Expansion:
authfindsauthentication,login,session - Intent Detection: Adjusts search weights based on query type (identifier, concept, file_path, etc.)
- Fuzzy Matching: Typo tolerance -
getUesrmatchesgetUser - Code-Aware Tokenization: Splits camelCase, snake_case, PascalCase
Code Analysis
- Complexity Metrics: Cyclomatic, cognitive complexity, maintainability index
- Security Detection: Hardcoded secrets, SQL injection, command injection, eval usage
Performance
- Embedding Cache: SQLite-based cache for embeddings (~10x faster re-indexing)
- BM25 Persistence: Cached keyword index for instant startup
- Index State Tracking: Only re-index changed files
Languages
- TypeScript, JavaScript, Python
Quick Start (for developers)
Prerequisites
- Node.js 18+
- Ollama with
mxbai-embed-largemodel - Qdrant credentials (ask your admin or check the team wiki)
One-Command Setup
npx @neulandai/mcp-code-search-setupThe setup wizard will ask for your Qdrant URL and API key, then configure Claude Code (~/.claude/mcp.json). No local Qdrant or ingestion needed.
Prerequisites for Queries
Ollama must be running locally to embed your search queries:
# Install Ollama: https://ollama.com
ollama pull mxbai-embed-large
ollama serve # keep running in backgroundManual Configuration
If you prefer to configure manually, add to ~/.claude/mcp.json:
{
"mcpServers": {
"code-search": {
"command": "npx",
"args": ["-y", "@neulandai/mcp-code-search"],
"env": {
"QDRANT_URL": "<your-qdrant-url>",
"QDRANT_API_KEY": "<your-qdrant-api-key>"
}
}
}
}Or for Claude Desktop (~/Library/Application Support/Claude/claude_desktop_config.json), same structure.
Development Setup
For contributing to this project or running local ingestion:
npm install
npm run build
# Local Qdrant + ingestion
docker compose up qdrant -d
export REPOS_BASE_PATH=/path/to/your/repos
npm run ingest
# Cloud ingestion (requires .env with credentials)
npm run ingest:cloudSearch Query Syntax
Basic Search
getUserById # Search for identifier
authentication flow # Search for conceptWith Filters
auth repo:portal # Only search client-portal
fetch lang:python # Only Python files
validate type:function # Only function chunks
path:services # Only files in services/
auth -path:test # Exclude test files
cat:error-handling # Only error handling codeFilter Reference
| Filter | Example | Effect |
|--------|---------|--------|
| repo: | repo:portal | Only this repo |
| lang: | lang:ts | Only TypeScript (aliases: ts, js, py) |
| type: | type:function | Only functions/classes/methods |
| path: | path:services | Only files in path |
| -path: | -path:test | Exclude path |
| cat: | cat:auth | Only category (auth, db-access, api-call, etc.) |
Docker Usage
# Start everything (Qdrant + ingestion + server)
docker compose up --build
# Or run services individually
docker compose up qdrant -d # Start vector database
docker compose run ingest # Run ingestion
docker compose up mcp-server # Start MCP serverConfiguration
| Variable | Default | Description |
|----------|---------|-------------|
| QDRANT_URL | http://localhost:6333 | Qdrant server URL |
| REPOS_BASE_PATH | .. | Directory containing repos (or single repo path if using --single-repo) |
| REPO_LIMIT | 0 (no limit) | Max repos to index |
| SINGLE_REPO | false | Treat REPOS_BASE_PATH as a single repository instead of scanning for subdirectories |
Single Repository Mode
To index a directory directly as a single repository (instead of scanning for subdirectories), use the --single-repo flag or set SINGLE_REPO=true:
# Using command-line flag
export REPOS_BASE_PATH=/path/to/your/repo
npm run ingest -- --single-repo
# Using environment variable
export REPOS_BASE_PATH=/path/to/your/repo
export SINGLE_REPO=true
npm run ingestDevelopment
npm run dev # Run server in dev mode
npm test # Run tests (~700+ tests)
npm run test:coverage # Run tests with coverage
npm run lint # Lint code
npm run format # Format code
npm run build # Build TypeScriptProject Structure
src/
├── index.ts # MCP server entry point
├── ingest.ts # Ingestion pipeline
├── analysis/
│ ├── categorize.ts # Pattern categorization
│ ├── complexity.ts # Complexity metrics (NEW)
│ ├── security.ts # Security detection (NEW)
│ └── structure.ts # Code structure analysis
├── search/
│ ├── hybrid.ts # Hybrid search orchestrator
│ ├── semantic.ts # Vector search
│ ├── keyword.ts # BM25 search
│ ├── regex.ts # Regex search
│ ├── ranker.ts # RRF result fusion + filters
│ ├── tokenizer.ts # Code-aware tokenization (NEW)
│ ├── synonyms.ts # Synonym expansion (NEW)
│ ├── intent.ts # Query intent detection (NEW)
│ ├── filters.ts # Query filter parsing (NEW)
│ ├── fuzzy.ts # Fuzzy matching (NEW)
│ ├── bm25-persistence.ts # BM25 caching (NEW)
│ └── embeddings.ts # Embedding generation + cache
├── ingestion/
│ ├── crawler.ts # Repo discovery
│ ├── parser.ts # Code parsing (AST)
│ ├── chunker.ts # Chunk generation
│ ├── cache.ts # SQLite embedding cache (NEW)
│ └── state.ts # Index state tracking (NEW)
├── db/
│ └── qdrant.ts # Qdrant client
└── types/
└── index.ts # Type definitionsNew Modules Reference
Search Enhancements
| Module | Purpose |
|--------|---------|
| tokenizer.ts | Splits camelCase/snake_case for better matching |
| synonyms.ts | Expands queries with related terms |
| intent.ts | Detects query type, adjusts search weights |
| filters.ts | Parses repo:x lang:ts -path:test syntax |
| fuzzy.ts | Levenshtein-based typo tolerance |
Performance
| Module | Purpose |
|--------|---------|
| cache.ts | SQLite cache for embeddings |
| state.ts | Tracks file changes for incremental indexing |
| bm25-persistence.ts | Persists BM25 index to disk |
Analysis
| Module | Purpose |
|--------|---------|
| complexity.ts | Cyclomatic, cognitive, maintainability metrics |
| security.ts | Detects hardcoded secrets, injection risks |
Recent Additions (v2.1.0)
- [x] Enhanced Dependency Analysis (cycle detection, hub detection, unused exports)
- [x] Output Formatters (markdown, JSON, plain text output formats)
- [x] Query Suggestions ("did you mean..." for typos)
- [x] Enhanced Tool Responses (confidence scores, timing, explanations)
Future Work (Deferred)
- [ ] Git Insights (change frequency, hotspots)
Search Tool
The MCP server exposes a search_codebase tool:
{
query: string, // Search query with optional filters
mode?: 'hybrid' | 'semantic' | 'keyword' | 'regex',
limit?: number, // Max results (1-50, default: 10)
repo?: string // Filter to specific repo (legacy, use query filters)
}