npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2025 – Pkg Stats / Ryan Hefner

mongodocs-mcp

v11.0.0

Published

Transform any GitHub repository into searchable vector embeddings. MCP server with smart indexing, voyage-context-3 embeddings, and semantic search for Claude/Cursor IDEs.

Readme

mongodocs-mcp

A Model Context Protocol (MCP) server that transforms any GitHub repository into searchable vector embeddings, enabling semantic search across codebases and documentation through IDE integration.

Architecture

The system implements a three-phase indexing pipeline with smart change detection:

Repository → Git Clone → Smart Chunking → Vector Embeddings → MongoDB Atlas
                ↓              ↓                ↓                    ↓
           Hash Tracking   Semantic Split   voyage-context-3    Vector Search

Core Components

  • Indexer (src/core/indexer.ts): Git-based change detection using commit hashes
  • Semantic Chunker (src/core/semantic-chunker.ts): Multi-strategy content splitting
  • Embedding Service (src/core/embeddings.ts): Voyage AI integration with batching
  • Storage Service (src/core/storage.ts): MongoDB Atlas vector operations
  • Search Service (src/core/search.ts): Vector, hybrid RRF, and MMR algorithms
  • MCP Server (src/index.ts): Protocol implementation for IDE integration

Installation

Global Package

npm install -g mongodocs-mcp

From Source

git clone https://github.com/yourusername/mongodocs-mcp.git
cd mongodocs-mcp
npm install
npm run build
npm link

Setup

1. MongoDB Atlas

Create free M0 cluster at cloud.mongodb.com:

# Database structure
Database: mongodb_semantic_docs
Collection: documents

# Connection string format
mongodb+srv://username:[email protected]/?retryWrites=true&w=majority

Network Access Configuration:

  • Navigate to Network Access → Add IP Address
  • Add 0.0.0.0/0 for development (restrict in production)

Vector Search Index Creation:

  1. Go to Atlas Search → Create Index
  2. Select "JSON Editor"
  3. Paste configuration:
{
  "mappings": {
    "dynamic": true,
    "fields": {
      "embedding": {
        "type": "knnVector",
        "dimensions": 1024,
        "similarity": "cosine"
      }
    }
  }
}

Name: vector_index

2. Voyage AI

Get API key from voyageai.com:

  • Model: voyage-context-3
  • Dimensions: 1024
  • Context window: 32,000 tokens
  • Rate limit: 2000 RPM

3. Environment Configuration

Create .env file:

# Required
MONGODB_URI=mongodb+srv://username:[email protected]/?retryWrites=true&w=majority
VOYAGE_API_KEY=pa-your-api-key

# Optional
GITHUB_TOKEN=ghp_your_token  # For private repos

Usage

Web Interface

# Start web UI
npm run web

# Opens http://localhost:3000
# 4-step wizard:
# 1. Configure APIs
# 2. Select repositories
# 3. Review MCP setup
# 4. Start processing

Command Line

# Index repositories (smart mode - only changed files)
npm run index

# Force complete rebuild
npm run rebuild

# Monitor indexing progress
npm run progress

# Database statistics
npm run stats

# Clean database
npm run clean

Programmatic API

import { Indexer } from 'mongodocs-mcp';

const config = {
  repositories: [{
    name: 'My Documentation',
    repo: 'owner/repository',
    branch: 'main',
    product: 'custom-my-docs'
  }],
  embedding: {
    model: 'voyage-context-3',
    dimensions: 1024,
    chunkSize: 1000,
    chunkOverlap: 200
  }
};

const indexer = new Indexer(config);
indexer.onProgress((progress) => {
  console.log(`${progress.phase}: ${progress.current}/${progress.total}`);
});
await indexer.index();

MCP Integration

Claude Desktop

File: ~/Library/Application Support/Claude/claude_desktop_config.json

{
  "mcpServers": {
    "mongodocs": {
      "command": "npx",
      "args": ["mongodocs-mcp"],
      "env": {
        "MONGODB_URI": "your-connection-string",
        "VOYAGE_API_KEY": "your-api-key"
      }
    }
  }
}

Cursor IDE

File: .cursor/mcp_settings.json

{
  "mcpServers": {
    "mongodocs": {
      "command": "npx",
      "args": ["mongodocs-mcp"],
      "env": {
        "MONGODB_URI": "your-connection-string",
        "VOYAGE_API_KEY": "your-api-key"
      }
    }
  }
}

Restart IDE after configuration.

Search Methods

1. Hybrid RRF Search (Primary)

Reciprocal Rank Fusion combining vector and keyword search:

// Weight configuration
vectorWeight: 0.7
keywordWeight: 0.3

// Ranking formula
score = 1 / (k + rank) where k = 60

2. MMR Search (Diversity)

Maximum Marginal Relevance for result diversity:

// Parameters
fetchK: 20        // Initial candidates
lambdaMult: 0.7   // Relevance vs diversity
limit: 5          // Final results

// Algorithm
MMR = λ * Sim(Di, Q) - (1-λ) * max Sim(Di, Dj)

3. Pure Vector Search

Cosine similarity search:

// Configuration
numCandidates: 40  // 7.5x faster than default 300
limit: 10

Technical Implementation

Semantic Chunking

Three-strategy approach with statistical analysis:

1. Interquartile Method

// Calculate sentence distances
distances = sentences.map(embed).map(cosineDistance)
// Find breakpoints at quartile boundaries
Q1, Q3 = quartiles(distances)
threshold = Q3 + 1.5 * (Q3 - Q1)

2. Gradient Method

// Identify semantic transitions
gradients = distances.map(derivative)
breakpoints = gradients.filter(g => g > threshold)

3. Hybrid Scoring

score = 0.6 * interquartile + 0.4 * gradient
// Adaptive to content type

Chunk Optimization

const CHUNK_CONFIG = {
  base: 1000,      // Target size
  min: 100,        // Prevent empty
  max: 2500,       // Respect limits
  overlap: 200,    // Context preservation
  
  // Token validation
  maxTokens: 6000,  // voyage-context-3 safety
  tokenizer: 'cl100k_base'
};

Smart Indexing

Repository state tracking:

// Check existing hash
const existingHash = await storage.getRepositoryHash(repo.name);
const currentHash = await git.getLatestCommit();

if (existingHash === currentHash) {
  console.log('✅ Repository up to date, skipping...');
  return;
}

// Process only changed files
const changedFiles = await git.diff(existingHash, currentHash);
await processFiles(changedFiles);
await storage.updateRepositoryHash(repo.name, currentHash);

Error Handling

// Exponential backoff with jitter
const delay = Math.min(1000 * Math.pow(2, attempt), 30000);
const jitter = Math.random() * 1000;
await sleep(delay + jitter);

// Token limit handling
if (error.message.includes('32000 tokens')) {
  // Split chunk and retry
  const subChunks = emergencySplit(chunk);
  return processSubChunks(subChunks);
}

Performance Characteristics

Indexing Metrics

  • Processing rate: 100-150 docs/hour (Voyage API limited)
  • Batch size: 32 documents optimal
  • Memory usage: <500MB peak
  • Network bandwidth: ~10MB/hour

Search Performance

  • Latency: <100ms p99
  • Throughput: 1000+ QPS
  • Index size: ~1.5KB per chunk
  • Cache TTL: 5 minutes

Storage Efficiency

// Document structure (avg 1.5KB)
{
  _id: ObjectId,
  title: string,           // 50 bytes
  content: string,         // 1000 bytes
  embedding: float[1024],  // 4KB compressed
  metadata: {              // 200 bytes
    file: string,
    repo: string,
    product: string,
    indexedAt: Date
  }
}

Repository Configuration

Default Repositories

const repositories = [
  {
    name: 'MongoDB Documentation',
    repo: 'mongodb/docs',
    branch: 'master',
    product: 'mongodb-docs',
    priority: 10
  },
  // Add custom repositories...
];

Custom Repository

{
  name: 'Your Documentation',
  repo: 'owner/repository',
  branch: 'main',
  product: 'custom-your-docs',
  
  // Optional filters
  include: ['docs/**/*.md'],
  exclude: ['**/node_modules/**'],
  
  // Processing options
  chunkSize: 1500,
  chunkOverlap: 300
}

Development

Build Pipeline

# Development with watch
npm run dev

# Production build
npm run build

# Type checking
npm run typecheck

# Linting
npm run lint

# Testing
npm test

Project Structure

src/
├── core/
│   ├── indexer.ts           # Orchestration
│   ├── semantic-chunker.ts  # Content splitting
│   ├── embeddings.ts        # Vector generation
│   ├── storage.ts           # Database operations
│   └── search.ts            # Query algorithms
├── config/
│   └── index.ts             # Repository definitions
├── web/
│   ├── server.ts            # Express server
│   ├── coordinator.ts       # Web orchestration
│   └── templates/           # HTML interfaces
└── index.ts                 # MCP server

dist/                        # Compiled output
.repos/                      # Cloned repositories

Key Dependencies

{
  "mongodb": "^6.10.0",           // Native driver
  "voyageai": "^0.0.1-5",         // Embeddings
  "@modelcontextprotocol/sdk": "^1.0.0",  // MCP
  "js-tiktoken": "^1.0.15",       // Tokenization
  "simple-git": "^3.27.0"         // Repository ops
}

Troubleshooting

Connection Issues

# Test MongoDB connection
node -e "
  const { MongoClient } = require('mongodb');
  MongoClient.connect(process.env.MONGODB_URI)
    .then(() => console.log('✅ Connected'))
    .catch(err => console.error('❌', err.message));
"

# Test Voyage AI
curl -X POST https://api.voyageai.com/v1/embeddings \
  -H "Authorization: Bearer $VOYAGE_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"input": ["test"], "model": "voyage-context-3"}'

Index Issues

# Verify vector index
mongosh $MONGODB_URI --eval "
  db.documents.getSearchIndexes()
"

# Check document structure
mongosh $MONGODB_URI --eval "
  db.documents.findOne()
"

Performance Tuning

// Adjust for your use case
const tuning = {
  // Smaller batches for memory constraints
  batchSize: 16,
  
  // More candidates for precision
  numCandidates: 100,
  
  // Larger chunks for context
  chunkSize: 2000,
  
  // Disable for speed
  smartIndexing: false
};

Best Practices

Security

  • Store credentials in environment variables
  • Use least-privilege MongoDB user
  • Rotate API keys regularly
  • Enable MongoDB audit logging

Optimization

  • Index during off-peak hours
  • Use incremental updates
  • Monitor token usage
  • Cache frequent queries

Scaling

  • Horizontal sharding for large corpuses
  • Read replicas for search traffic
  • CDN for static assets
  • Queue system for processing

Contributing

Pull requests welcome. Please ensure:

  • TypeScript strict mode compliance
  • Test coverage >80%
  • Conventional commits
  • Documentation updates

License

MIT

Support


Built with MongoDB Atlas vector search and Voyage AI embeddings.