mcp-codebase-index
v0.4.0
Published
MCP server for semantic code search using Tree-sitter, embeddings, and Qdrant vector database
Maintainers
Readme
MCP Codebase Index
A powerful Model Context Protocol (MCP) server that enables AI assistants to search and understand your codebase using semantic search. Find code using natural language queries like "authentication logic" or "database connection handling" instead of exact text matching.
Features
- 🔍 Semantic Code Search - Search using natural language, not just keywords
- 🌐 Multi-Language Support - TypeScript, JavaScript, Python, Java, Go, Rust, C/C++, C#, Ruby, PHP, and more
- 🎯 Smart Code Parsing - Understands functions, classes, and code structure using Tree-sitter
- 🔄 Real-time Updates - Automatically reindexes when files change or branches switch
- 🚀 Multiple Embedding Providers - Use Google Gemini (free), OpenAI, or local Ollama models
- ☁️ Flexible Storage - Works with Qdrant Cloud (free tier) or self-hosted instances
Installation
npm install -g mcp-codebase-indexOr use with npx (no installation required):
npx mcp-codebase-indexQuick Start
1. Set Up Qdrant Vector Database
Choose one option:
Option A: Qdrant Cloud (Recommended for beginners)
- Sign up for free at Qdrant Cloud
- Create a cluster (free tier available)
- Get your API URL and key
Option B: Local Docker
docker run -p 6333:6333 qdrant/qdrant2. Get an Embedding Provider API Key
Choose one option:
Option A: Google Gemini (Free)
- Get a free API key at Google AI Studio
Option B: OpenAI
- Get an API key at OpenAI Platform
Option C: Ollama (Local, Free)
- Install Ollama
- Pull a model:
ollama pull nomic-embed-text
3. Configure Your IDE/Editor
Choose your IDE or editor and follow the setup instructions:
Add this to your Claude Desktop config file:
macOS: ~/Library/Application Support/Claude/claude_desktop_config.json
Windows: %APPDATA%/Claude/claude_desktop_config.json
Linux: ~/.config/Claude/claude_desktop_config.json
{
"mcpServers": {
"codebase-index": {
"command": "npx",
"args": ["-y", "mcp-codebase-index"],
"env": {
"CODEBASE_PATH": "/absolute/path/to/your/repository",
"EMBEDDING_PROVIDER": "gemini",
"GEMINI_API_KEY": "your-api-key-here",
"QDRANT_URL": "http://localhost:6333"
}
}
}
}For Qdrant Cloud:
{
"mcpServers": {
"codebase-index": {
"command": "npx",
"args": ["-y", "mcp-codebase-index"],
"env": {
"CODEBASE_PATH": "/absolute/path/to/your/repository",
"EMBEDDING_PROVIDER": "gemini",
"GEMINI_API_KEY": "your-api-key-here",
"QDRANT_URL": "https://your-cluster.cloud.qdrant.io",
"QDRANT_API_KEY": "your-qdrant-api-key"
}
}
}
}Restart Claude Desktop after saving the configuration.
- Open VS Code Settings (Cmd/Ctrl + ,)
- Search for "Claude Code: MCP Servers"
- Click "Edit in settings.json"
- Add this configuration:
{
"claude-code.mcpServers": {
"codebase-index": {
"command": "npx",
"args": ["-y", "mcp-codebase-index"],
"env": {
"CODEBASE_PATH": "/absolute/path/to/your/repository",
"EMBEDDING_PROVIDER": "gemini",
"GEMINI_API_KEY": "your-api-key-here",
"QDRANT_URL": "http://localhost:6333"
}
}
}
}Reload VS Code after saving.
Cursor supports MCP servers through its AI settings:
Open Cursor Settings (Cmd/Ctrl + ,)
Navigate to "Features" → "AI"
Scroll to "Model Context Protocol"
Click "Edit Config" or locate the config file:
- macOS:
~/Library/Application Support/Cursor/User/globalStorage/mcp.json - Windows:
%APPDATA%/Cursor/User/globalStorage/mcp.json - Linux:
~/.config/Cursor/User/globalStorage/mcp.json
- macOS:
Add this configuration:
{
"mcpServers": {
"codebase-index": {
"command": "npx",
"args": ["-y", "mcp-codebase-index"],
"env": {
"CODEBASE_PATH": "/absolute/path/to/your/repository",
"EMBEDDING_PROVIDER": "gemini",
"GEMINI_API_KEY": "your-api-key-here",
"QDRANT_URL": "http://localhost:6333"
}
}
}
}Restart Cursor after saving.
If you're using the Continue extension:
Open the Continue configuration file:
- macOS/Linux:
~/.continue/config.json - Windows:
%USERPROFILE%\.continue\config.json
- macOS/Linux:
Add to the
experimental.modelContextProtocolServerssection:
{
"experimental": {
"modelContextProtocolServers": [
{
"name": "codebase-index",
"command": "npx",
"args": ["-y", "mcp-codebase-index"],
"env": {
"CODEBASE_PATH": "/absolute/path/to/your/repository",
"EMBEDDING_PROVIDER": "gemini",
"GEMINI_API_KEY": "your-api-key-here",
"QDRANT_URL": "http://localhost:6333"
}
}
]
}
}Reload VS Code after saving.
Windsurf supports MCP servers natively:
Open Windsurf Settings
Navigate to MCP Settings or locate the config file:
- macOS:
~/Library/Application Support/Windsurf/mcp_config.json - Windows:
%APPDATA%/Windsurf/mcp_config.json - Linux:
~/.config/Windsurf/mcp_config.json
- macOS:
Add this configuration:
{
"mcpServers": {
"codebase-index": {
"command": "npx",
"args": ["-y", "mcp-codebase-index"],
"env": {
"CODEBASE_PATH": "/absolute/path/to/your/repository",
"EMBEDDING_PROVIDER": "gemini",
"GEMINI_API_KEY": "your-api-key-here",
"QDRANT_URL": "http://localhost:6333"
}
}
}
}Restart Windsurf after saving.
Zed has experimental MCP support:
- Open Zed Settings (Cmd/Ctrl + ,)
- Add to your settings.json:
{
"experimental": {
"mcp_servers": {
"codebase-index": {
"command": "npx",
"args": ["-y", "mcp-codebase-index"],
"env": {
"CODEBASE_PATH": "/absolute/path/to/your/repository",
"EMBEDDING_PROVIDER": "gemini",
"GEMINI_API_KEY": "your-api-key-here",
"QDRANT_URL": "http://localhost:6333"
}
}
}
}
}Restart Zed after saving.
For any MCP-compatible client, use this standard configuration:
{
"mcpServers": {
"codebase-index": {
"command": "npx",
"args": ["-y", "mcp-codebase-index"],
"env": {
"CODEBASE_PATH": "/absolute/path/to/your/repository",
"EMBEDDING_PROVIDER": "gemini",
"GEMINI_API_KEY": "your-api-key-here",
"QDRANT_URL": "http://localhost:6333"
}
}
}
}Consult your client's documentation for the exact config file location.
4. Restart Your IDE/Editor
After saving the configuration, restart your IDE/editor. The server will automatically start indexing your codebase on first run.
Usage
Once configured, you can ask your AI assistant to search your codebase:
Example queries:
- "Find the authentication middleware"
- "Show me database connection code"
- "Where is the user validation logic?"
- "Find API endpoint handlers"
- "Show me error handling utilities"
Advanced usage:
- "Search for 'rate limiting' in TypeScript files"
- "Find functions related to payment processing in the /src/api directory"
- "Show me recent changes (reindex first)"
Available Tools
The MCP server provides these tools to your AI assistant:
codebase_search
Search your codebase using natural language or code queries.
Parameters:
query(required) - Your search querylimit- Number of results (default: 10)threshold- Similarity threshold 0-1 (default: 0.7)fileTypes- Filter by extensions (e.g.,[".ts", ".js"])paths- Filter by paths (e.g.,["src/api"])includeContext- Include surrounding code (default: true)
indexing_status
Check indexing progress and statistics.
reindex
Re-index your codebase (useful after pulling changes).
Parameters:
mode- 'full', 'incremental', or 'file'paths- Specific files to reindex (optional)force- Force reindex even if unchanged
configure_indexer
Update indexer settings at runtime.
validate_config
Test your configuration and connections.
clear_index
Clear all indexed data and start fresh.
Configuration Reference
Required Settings
| Variable | Description | Example |
|----------|-------------|---------|
| CODEBASE_PATH | Absolute path to your repository | /Users/you/projects/myapp |
| EMBEDDING_PROVIDER | Provider to use | gemini, openai, or ollama |
| QDRANT_URL | Qdrant instance URL | http://localhost:6333 |
Provider-Specific Settings
For Gemini:
GEMINI_API_KEY=your-api-keyFor OpenAI:
OPENAI_API_KEY=your-api-key
OPENAI_MODEL=text-embedding-3-small # or text-embedding-3-largeFor Ollama:
OLLAMA_BASE_URL=http://localhost:11434
OLLAMA_MODEL=nomic-embed-textOptional Settings
| Variable | Description | Default |
|----------|-------------|---------|
| COLLECTION_NAME | Qdrant collection name | codebase_index |
| INDEX_BATCH_SIZE | Files per batch | 50 |
| INDEX_CONCURRENCY | Parallel processing limit | 5 |
| LOG_LEVEL | Logging detail | info |
| QDRANT_API_KEY | For Qdrant Cloud | - |
Supported Languages
- TypeScript/JavaScript
- Python
- Java
- Go
- Rust
- C/C++
- C#
- Ruby
- PHP
- Markdown
Unsupported languages fall back to intelligent text chunking.
Troubleshooting
"Cannot find module" errors
npm install -g mcp-codebase-indexIndexing is slow
Reduce INDEX_CONCURRENCY or increase INDEX_BATCH_SIZE in your configuration.
"Connection refused" to Qdrant
- Ensure Qdrant is running:
docker psor check Qdrant Cloud status - Verify
QDRANT_URLis correct - For cloud: ensure
QDRANT_API_KEYis set
No search results
- Check indexing status using the
indexing_statustool - Try reindexing with the
reindextool - Lower the similarity
thresholdin your search
Rate limiting errors
If using a free API tier:
- Reduce
INDEX_CONCURRENCYto 1-2 - Increase
INDEX_BATCH_SIZEto reduce API calls
How It Works
- Code Parsing: Uses Tree-sitter to parse code into meaningful chunks (functions, classes, etc.)
- Embedding Generation: Converts code into vector embeddings using your chosen AI provider
- Vector Storage: Stores embeddings in Qdrant for fast similarity search
- Semantic Search: Finds relevant code by comparing query embeddings to stored code embeddings
- Real-time Updates: Watches for file changes and automatically reindexes
Performance Tips
- For large codebases (1000+ files): Use
INDEX_BATCH_SIZE=100andINDEX_CONCURRENCY=3 - For fast iteration: Use Ollama with local models (no API calls)
- For best quality: Use OpenAI's
text-embedding-3-largemodel - For free usage: Use Google Gemini (generous free tier)
Privacy & Security
- All code indexing happens locally or in your chosen infrastructure
- API keys are only used for embedding generation
- Code is never sent to third parties (except for embedding generation)
- Self-hosted Ollama option keeps everything completely local
Examples
Example 1: Find Authentication Code
You: "Find all authentication middleware in the codebase"
AI Assistant: [Uses codebase_search tool with query "authentication middleware"]Example 2: Reindex After Git Pull
You: "I just pulled new changes, please reindex"
AI Assistant: [Uses reindex tool with mode "incremental"]Example 3: Search Specific Directory
You: "Search for error handlers in the src/api directory"
AI Assistant: [Uses codebase_search with query "error handlers" and paths ["src/api"]]Contributing
Contributions are welcome! Please visit our GitHub repository to:
- Report issues
- Submit pull requests
- Request features
- Ask questions
License
MIT License - see LICENSE file for details
Support
- Issues: GitHub Issues
- Discussions: GitHub Discussions
Acknowledgments
Built with:
- Model Context Protocol by Anthropic
- Tree-sitter for code parsing
- Qdrant for vector storage
