npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

@knath2000/codebase-indexing-mcp

v1.0.0

Published

MCP server for codebase indexing with Voyage AI embeddings and Qdrant vector storage

Readme

MCP Codebase Indexing Server

npm version License: MIT TypeScript Model Context Protocol

A Model Context Protocol (MCP) server that provides intelligent codebase indexing and semantic search capabilities for AI assistants like Cursor. This server uses Voyage AI for embeddings and Qdrant for vector storage to enable powerful semantic code search across your entire codebase.

📋 Table of Contents

✨ Features

  • 🧠 Intelligent Code Parsing: Uses tree-sitter to parse code into meaningful chunks (functions, classes, modules, etc.)
  • 🔍 Semantic Search: Leverages Voyage AI embeddings for semantic code search beyond keyword matching
  • 📊 Vector Storage: Uses Qdrant for efficient vector storage and lightning-fast similarity search
  • 🌐 Multiple Language Support: Supports JavaScript, TypeScript, Python, and more
  • Incremental Indexing: Tracks file changes and only re-indexes when necessary
  • 🎯 Flexible Search: Search by language, chunk type, file path, or semantic similarity
  • 🔗 Context-Aware: Provides code context and related chunks for better understanding
  • 🚀 MCP Compatible: Works seamlessly with Cursor and other MCP-compatible AI assistants
  • 🛠️ 12 Powerful Tools: Complete set of indexing and search tools for comprehensive codebase management

🏗️ Architecture

graph TB
    subgraph "AI Assistant"
        A[Cursor/Claude]
    end
    
    subgraph "MCP Server"
        B[HTTP Server<br/>Custom SSE + JSON-RPC]
        C[IndexingService]
        D[SearchService]
        E[Code Parser<br/>Tree-sitter]
    end
    
    subgraph "External Services"
        F[Voyage AI<br/>Embeddings]
        G[Qdrant<br/>Vector DB]
    end
    
    A ↔ B
    B → C
    B → D
    C → E
    C → F
    C → G
    D → F
    D → G
    
    style A fill:#e1f5fe
    style B fill:#f3e5f5
    style F fill:#fff3e0
    style G fill:#e8f5e8

The server consists of several key components:

  1. Code Parser: Tree-sitter based parser that extracts semantic chunks from code
  2. Voyage Client: Handles embedding generation via Voyage AI API
  3. Qdrant Client: Manages vector storage and similarity search
  4. Indexing Service: Orchestrates the indexing process
  5. Search Service: Provides semantic search capabilities
  6. MCP Server: Exposes tools via the Model Context Protocol

📦 Installation

NPM Package (Recommended)

# Install globally
npm install -g mcp-codebase-indexing-server

# Or run directly with npx
npx mcp-codebase-indexing-server

Docker

# Pull and run
docker run -p 3001:3001 ghcr.io/your-org/mcp-codebase-indexing-server:latest

From Source

git clone <repository-url>
cd mcp-codebase-indexing-server
npm install
npm run build
npm start

🚀 Quick Start

Prerequisites

  • Node.js 18+
  • Voyage AI API key (Get one here)
  • Qdrant instance (local or cloud)
  • AI assistant that supports MCP (like Cursor)

5-Minute Setup

  1. Get your services ready:
# Start local Qdrant
docker run -d -p 6333:6333 --name qdrant qdrant/qdrant

# Get Voyage AI API key from https://www.voyageai.com/
  1. Deploy the server:
git clone <repository-url>
cd mcp-codebase-indexing-server
npm install && npm run build
VOYAGE_API_KEY=your_key_here npm start
  1. Connect to Cursor:

    • Add MCP server in Cursor settings
    • Use server URL: http://localhost:3001
    • You should see a green circle with 12 tools available
  2. Test it out:

    • Index your codebase: "Index the current directory"
    • Search your code: "Find authentication functions in TypeScript"

Prerequisites (Detailed)

  • Node.js 18+
  • Voyage AI API key
  • Qdrant instance (local or cloud)

Installation

  1. Clone the repository:
git clone <repository-url>
cd mcp-codebase-indexing-server
  1. Install dependencies:
npm install
  1. Build the server:
npm run build

🎯 Cursor Integration Guide

Setting Up MCP Server in Cursor

  1. Open Cursor Settings:

    • Go to Settings → Features → Model Context Protocol
  2. Add MCP Server:

    {
      "name": "codebase-indexing",
      "command": "node",
      "args": ["path/to/your/mcp-codebase-indexing-server/dist/index.js"],
      "env": {
        "VOYAGE_API_KEY": "your_voyage_api_key_here",
        "QDRANT_URL": "http://localhost:6333"
      }
    }
  3. Verify Connection:

    • Look for green circle indicator in Cursor
    • Should show "12 tools" when connected
    • If red circle: check logs and troubleshooting section

Using with Cursor

Indexing Your Codebase

"Index the current directory for semantic search"
"Index the src/ folder in my project"
"Re-index the modified files in my codebase"

Searching Your Code

"Find authentication functions in TypeScript"
"Search for error handling patterns"
"Look for database query functions"
"Find classes that handle user data"
"Show me similar functions to the one I'm looking at"

Getting Code Context

"Get context around the login function"
"Show me similar code to this authentication logic"
"Find related functions in this file"

Troubleshooting Cursor Connection

| Issue | Solution | |-------|----------| | Red circle (0 tools) | Check VOYAGE_API_KEY is set correctly | | "No server info found" | Restart Cursor completely | | Connection timeout | Ensure Qdrant is running on correct port | | Tools not responding | Check server logs for errors |

⚙️ Customization Guide

For Different Project Types

Large Enterprise Codebases

# Handle large codebases efficiently
BATCH_SIZE=50
MAX_FILE_SIZE=2097152
CHUNK_SIZE=1500
EXCLUDE_PATTERNS=node_modules,dist,build,.git,coverage,logs

AI/ML Projects

# Optimize for Python-heavy codebases
SUPPORTED_EXTENSIONS=.py,.ipynb,.md,.yaml,.yml
EMBEDDING_MODEL=voyage-code-2
CHUNK_SIZE=2000

Frontend Projects

# Focus on web technologies
SUPPORTED_EXTENSIONS=.js,.jsx,.ts,.tsx,.vue,.svelte,.css,.scss
EXCLUDE_PATTERNS=node_modules,dist,build,.next,coverage
CHUNK_SIZE=800

Microservices Architecture

# Index multiple service repositories
COLLECTION_NAME=microservices-org
BATCH_SIZE=100
# Consider separate instances per service

Advanced Configuration Options

Performance Tuning

# Memory optimization
BATCH_SIZE=25              # Smaller batches for memory-constrained environments
CHUNK_OVERLAP=100          # Reduce overlap to save storage
MAX_FILE_SIZE=1048576      # Limit file size (1MB default)

# Speed optimization  
BATCH_SIZE=200             # Larger batches for faster processing
EMBEDDING_MODEL=voyage-code-2  # Optimized model for code

Custom File Filtering

# Include only specific file types
SUPPORTED_EXTENSIONS=.py,.js,.ts,.go,.rust

# Exclude testing and generated files
EXCLUDE_PATTERNS=*test*,*spec*,generated,vendor,node_modules

# Include documentation
SUPPORTED_EXTENSIONS=.md,.rst,.txt,.py,.js,.ts

Multi-Environment Setup

# Development
COLLECTION_NAME=dev-codebase
QDRANT_URL=http://localhost:6333

# Staging  
COLLECTION_NAME=staging-codebase
QDRANT_URL=https://staging-qdrant.company.com

# Production
COLLECTION_NAME=prod-codebase
QDRANT_URL=https://qdrant.company.com
QDRANT_API_KEY=prod_api_key

🔒 Privacy & Security

Your Code Stays Private

The MCP server is designed with privacy as a core principle:

Small Code Chunks Only

  • Chunk Size: Only small code segments (100-1000 characters) are sent for embedding
  • Default: 800 characters maximum per chunk (configurable)
  • Enforcement: Automatic truncation of larger chunks with logging
  • No Full Files: Complete files are never sent to external services

One-Way Mathematical Representations

  • Embeddings: Code chunks are converted to mathematical vectors (embeddings)
  • Irreversible: Embeddings cannot be converted back to original code
  • Semantic Only: Vectors capture meaning, not exact text
  • No Code Storage: Original code never leaves your environment

Local Processing

  • Parsing: All code parsing happens locally using Tree-sitter
  • Chunking: Code segmentation occurs on your machine
  • Storage: Only vector embeddings stored in your Qdrant instance
  • Search: Semantic search runs on your infrastructure

Network Security

  • HTTPS: All external API calls use TLS encryption
  • API Keys: Securely stored in environment variables
  • No Logging: Code content is never logged to external services
  • Minimal Data: Only mathematical vectors transmitted

Privacy Configuration

# Privacy-optimized settings
CHUNK_SIZE=800                    # Max 800 chars per chunk (100-1000 range)
CHUNK_OVERLAP=100                 # Reduced overlap for privacy
MAX_FILE_SIZE=1048576             # 1MB file size limit
EXCLUDE_PATTERNS=*.git*,node_modules/**,dist/**  # Skip sensitive directories

📝 Configuration

The server is configured via environment variables:

Required Environment Variables

  • VOYAGE_API_KEY: Your Voyage AI API key

Optional Environment Variables

  • QDRANT_URL: Qdrant server URL (default: http://localhost:6333)
  • QDRANT_API_KEY: Qdrant API key (if using cloud instance)
  • COLLECTION_NAME: Name of the Qdrant collection (default: codebase)
  • EMBEDDING_MODEL: Voyage AI model to use (default: voyage-code-3)
  • BATCH_SIZE: Batch size for embedding generation (default: 100)
  • CHUNK_SIZE: Maximum chunk size in characters (default: 800, range: 100-1000)
  • CHUNK_OVERLAP: Overlap between chunks (default: 100)
  • MAX_FILE_SIZE: Maximum file size to index in bytes (default: 1048576)
  • EXCLUDE_PATTERNS: Comma-separated patterns to exclude (default: see config)
  • SUPPORTED_EXTENSIONS: Comma-separated file extensions to support (default: see config)

Example Configuration

Create a .env file in the project root:

VOYAGE_API_KEY=your_voyage_api_key_here
QDRANT_URL=http://localhost:6333
COLLECTION_NAME=my_codebase
EMBEDDING_MODEL=voyage-code-2
BATCH_SIZE=50
MAX_FILE_SIZE=2097152

Usage

Running the Server

npm start

Or in development mode:

npm run dev

Setting up Qdrant

Local Qdrant (Docker)

docker run -p 6333:6333 qdrant/qdrant

Qdrant Cloud

Sign up at Qdrant Cloud and get your API key and URL.

🛠️ MCP Tools

The server provides 16 powerful tools organized by functionality:

  • index_directory: Index all files in a directory recursively
  • index_file: Index a single file
  • reindex_file: Re-index a file (force update)
  • remove_file: Remove a file from the index
  • clear_index: Clear the entire search index
  • codebase_search: 🌟 Natural language search for codebase understanding (e.g., "How is user authentication handled?", "Database connection setup", "Error handling patterns")
  • search_code: Search for code chunks using semantic similarity
  • search_functions: Search for functions by name or description
  • search_classes: Search for classes by name or description
  • find_similar: Find code chunks similar to a given chunk
  • get_code_context: Get code context around a specific chunk
  • get_indexing_stats: Get statistics about the indexed codebase
  • get_search_stats: Get statistics about the search index
  • get_enhanced_stats: Get enhanced statistics including cache and hybrid search metrics
  • get_health_status: Get comprehensive health status of all services
  • clear_search_cache: Clear search cache for fresh results
  • invalidate_file_cache: Invalidate cache for a specific file

Example Usage

  1. Index a directory:
{
  "tool": "index_directory",
  "arguments": {
    "directory_path": "/path/to/your/codebase"
  }
}
  1. 🌟 Natural language codebase search:
{
  "tool": "codebase_search",
  "arguments": {
    "query": "How is user authentication handled?",
    "limit": 5,
    "enable_hybrid": true,
    "enable_reranking": true
  }
}
  1. Search for authentication functions:
{
  "tool": "search_functions",
  "arguments": {
    "query": "authentication login user",
    "language": "typescript",
    "limit": 5
  }
}
  1. Search for error handling patterns:
{
  "tool": "search_code",
  "arguments": {
    "query": "error handling exception try catch",
    "chunk_type": "function",
    "threshold": 0.7
  }
}

🌟 Natural Language Search Examples

The codebase_search tool understands natural language queries and provides:

  • Relevant code snippets with syntax highlighting
  • File paths with line numbers for direct navigation
  • Similarity scores as percentages
  • Clickable navigation links to jump to specific locations

Example queries that work great:

  • "How is user authentication handled?"
  • "Database connection setup"
  • "Error handling patterns"
  • "API endpoint definitions"
  • "Component state management"
  • "Configuration loading"
  • "Logging implementation"

Sample output format:

# 🔍 Natural Language Codebase Search

**Query:** "How is user authentication handled?"

## 📊 Search Results
- **Found:** 8 relevant code references
- **Search Time:** 45ms
- **Hybrid Search:** ✅ (Dense + Sparse)
- **LLM Re-ranked:** ✅ (Relevance optimized)

## 📝 Code References with Navigation Links

### 1. [📂 src/auth/auth-service.ts:15](file://src/auth/auth-service.ts#L15)
**Lines 15-28** | **function** | **typescript** | **Similarity: 94.2%**

```typescript
async authenticateUser(token: string): Promise<User | null> {
  try {
    const decoded = jwt.verify(token, this.secretKey);
    return await this.userRepository.findById(decoded.userId);
  } catch (error) {
    logger.error('Authentication failed:', error);
    return null;
  }
}

## 🌐 Supported Languages

| Language | File Extensions | Status |
|----------|----------------|--------|
| **JavaScript** | `.js`, `.jsx` | ✅ Full Support |
| **TypeScript** | `.ts`, `.tsx` | ✅ Full Support |
| **Python** | `.py` | ✅ Full Support |
| **Go** | `.go` | 🔄 Coming Soon |
| **Rust** | `.rs` | 🔄 Coming Soon |
| **Java** | `.java` | 🔄 Coming Soon |

> 💡 **Extensible**: Additional languages can be added by installing the corresponding tree-sitter grammars and updating the configuration.

## API Reference

### Indexing Service

The `IndexingService` class provides:

```typescript
// Initialize the service
await indexingService.initialize();

// Index a directory
const stats = await indexingService.indexDirectory('/path/to/code');

// Index a single file
const chunks = await indexingService.indexFile('/path/to/file.ts');

// Remove a file from index
await indexingService.removeFile('/path/to/file.ts');

// Clear entire index
await indexingService.clearIndex();

Search Service

The SearchService class provides:

// Initialize the service
await searchService.initialize();

// Basic search
const results = await searchService.search({
  query: 'authentication',
  language: 'typescript',
  limit: 10
});

// Search functions
const functions = await searchService.searchFunctions('login', 'typescript');

// Find similar chunks
const similar = await searchService.findSimilar('chunk_id', 5);

// Get code context
const context = await searchService.getCodeContext('chunk_id', 5);

Performance Considerations

  • Batch Processing: The server processes files in batches to avoid memory issues
  • Incremental Updates: Only re-indexes files that have changed
  • Embedding Caching: Consider caching embeddings to reduce API calls
  • Vector Storage: Qdrant provides efficient vector storage and retrieval

🔧 Troubleshooting

Common Issues

MCP Connection Issues

| Problem | Symptoms | Solution | |---------|----------|----------| | Server won't start | Error: EADDRINUSE | Port 3001 already in use. Change PORT env var or kill existing process | | Connection timeout | Cursor shows "connecting..." forever | Check VOYAGE_API_KEY is valid and Qdrant is running | | Red circle in Cursor | 0 tools shown | Restart Cursor completely, verify server is running | | "Not connected" error | Tools fail with connection error | Server restarted automatically, wait 30 seconds |

Service Connection Issues

  1. Connection to Qdrant fails:

    # Check if Qdrant is running
    curl http://localhost:6333/collections
       
    # Start Qdrant if not running
    docker run -d -p 6333:6333 --name qdrant qdrant/qdrant
       
    # Check firewall settings
    netstat -tulpn | grep 6333
  2. Voyage AI API errors:

    # Test API key
    curl -H "Authorization: Bearer YOUR_API_KEY" \
         -H "Content-Type: application/json" \
         -d '{"input": ["test"], "model": "voyage-code-2"}' \
         https://api.voyageai.com/v1/embeddings
       
    # Check quota at https://www.voyageai.com/dashboard

Performance Issues

  1. Out of memory during indexing:

    # Reduce memory usage
    BATCH_SIZE=25
    MAX_FILE_SIZE=524288
    CHUNK_SIZE=500
       
    # Exclude large directories
    EXCLUDE_PATTERNS=node_modules,dist,build,.git,logs,coverage,vendor
  2. Slow indexing performance:

    # Optimize for speed
    BATCH_SIZE=100
    CHUNK_OVERLAP=100
       
    # Use faster embedding model if available
    EMBEDDING_MODEL=voyage-code-2

Code Parsing Issues

  1. Tree-sitter parsing errors:
    • Error: Language not supported
      • Solution: Add tree-sitter grammar for your language
    • Error: Failed to parse file
      • Solution: Check file encoding (must be UTF-8)
    • Error: File too large
      • Solution: Increase MAX_FILE_SIZE or exclude the file

Diagnostic Commands

Check Server Health

# Test server is running
curl http://localhost:3001/health

# Test MCP endpoint
curl http://localhost:3001/sse

# Check server logs
npm start 2>&1 | tee server.log

Check Services

# Test Qdrant
curl http://localhost:6333/collections

# Test Voyage AI
curl -H "Authorization: Bearer $VOYAGE_API_KEY" \
     https://api.voyageai.com/v1/embeddings \
     -d '{"input":["test"],"model":"voyage-code-2"}'

Debug Indexing

# Enable debug mode
DEBUG=1 npm start

# Test specific directory
curl -X POST http://localhost:3001/tools/call \
  -H "Content-Type: application/json" \
  -d '{"tool":"index_directory","arguments":{"directory_path":"./test"}}'

Debug Mode

Enable comprehensive logging:

# Full debug output
DEBUG=1 npm start

# Service-specific debugging
DEBUG=indexing npm start
DEBUG=search npm start
DEBUG=mcp npm start

Log Analysis

Look for these patterns in logs:

| Log Pattern | Meaning | Action | |-------------|---------|--------| | Error: VOYAGE_API_KEY is required | Missing API key | Set VOYAGE_API_KEY environment variable | | Failed to connect to Qdrant | Vector DB unavailable | Check Qdrant is running and accessible | | Rate limit exceeded | API quota reached | Wait or upgrade Voyage AI plan | | Memory usage warning | High memory usage | Reduce BATCH_SIZE or exclude more files | | Lazy initialization completed | Services ready | Normal startup, server ready for requests |

Getting Help

  1. Check server logs for specific error messages
  2. Test each service individually using diagnostic commands
  3. Verify environment variables are set correctly
  4. Restart services in order: Qdrant → MCP Server → Cursor
  5. Create minimal reproduction with a small test directory

If issues persist, create a GitHub issue with:

  • Complete error logs
  • Environment configuration (without API keys)
  • Steps to reproduce
  • System information (OS, Node.js version, etc.)

Development

Project Structure

src/
├── types.ts              # Type definitions
├── index.ts              # Main MCP server
├── clients/
│   ├── voyage-client.ts  # Voyage AI client
│   └── qdrant-client.ts  # Qdrant client
├── parsers/
│   └── code-parser.ts    # Tree-sitter based parser
└── services/
    ├── indexing-service.ts # Indexing orchestration
    └── search-service.ts   # Search functionality

Adding New Languages

  1. Install the tree-sitter grammar:
npm install tree-sitter-rust
  1. Update the loadLanguage function in code-parser.ts
  2. Add language configuration in initializeLanguageConfigs
  3. Update the file extension mapping

Testing

Run tests with:

npm test

Linting

Check code style with:

npm run lint

Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Add tests
  5. Submit a pull request

License

MIT License

Acknowledgments

🤝 Contributing

We welcome contributions! Please see our Contributing Guidelines for details.

Development Setup

# Clone the repository
git clone <repository-url>
cd mcp-codebase-indexing-server

# Install dependencies
npm install

# Build the project
npm run build

# Run in development mode
npm run dev

# Run tests
npm test

Adding New Languages

  1. Install the tree-sitter grammar:

    npm install tree-sitter-rust
  2. Update the loadLanguage function in src/parsers/code-parser.ts

  3. Add language configuration in initializeLanguageConfigs

  4. Update the file extension mapping

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

📈 Changelog

v1.0.0 - Production Release

  • ✅ Complete MCP protocol implementation with 12 tools
  • ✅ Lazy initialization to prevent connection timeouts
  • ✅ Custom SSE implementation for Cursor compatibility
  • ✅ Support for JavaScript, TypeScript, Python
  • ✅ Voyage AI integration for semantic embeddings
  • ✅ Qdrant integration for vector storage
  • ✅ Incremental indexing with file change tracking
  • ✅ Automated Fly.io deployment with GitHub Actions

🔗 Related Projects