npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

@devpuccino/mcp-git-codebase

v1.0.1

Published

MCP server providing semantic code search and indexing for git repositories

Readme

npm version License: MIT Node.js Version status-badge

@devpuccino/mcp-git-codebase

An MCP (Model Context Protocol) server that provides semantic code search and intelligent indexing for git repositories. Enables AI-powered semantic search across codebases using vector embeddings to find relevant code snippets by intent, not just keywords.

Features

Semantic Search - Find code by meaning, not just keywords
🔍 Multi-Language Support - TypeScript, JavaScript, Python, Go, Java, Rust, and more
📊 Multiple Vector Databases - Qdrant, Pinecone, Chroma, Milvus, PostgreSQL with pgvector
🚀 Scalable Indexing - Handle repositories with 1M+ files and 100GB+ of code
⚙️ Background Processing - Queue indexing jobs via Redis/Bull
🌿 Branch-Aware - Search across specific branches or track changes over time
🎯 Precise Code Retrieval - Get exact code snippets with line-level precision.

Installation

Prerequisites

  • Node.js ≥ 18.0.0
  • Git (for repository operations)
  • One of the supported vector databases (Qdrant, Pinecone, Chroma, Milvus, or PostgreSQL)

Install Package

npm install @devpuccino/mcp-git-codebase

Quick Start

1. Configure Vector Database

Set your preferred vector database and its connection details:

# Qdrant (recommended for local development)
export VECTOR_DB_PROVIDER=qdrant
export QDRANT_URL=http://localhost:6333

# Or Pinecone
export VECTOR_DB_PROVIDER=pinecone
export PINECONE_API_KEY=your-api-key
export PINECONE_ENVIRONMENT=your-environment
export PINECONE_INDEX=your-index

# Or PostgreSQL with pgvector
export VECTOR_DB_PROVIDER=postgres
export DATABASE_URL=postgresql://user:password@localhost:5432/codebase

# Or Chroma
export VECTOR_DB_PROVIDER=chroma
export CHROMA_URL=http://localhost
export CHROMA_PORT=8000

# Or Milvus
export VECTOR_DB_PROVIDER=milvus
export MILVUS_HOST=localhost
export MILVUS_PORT=19530

2. Configure Embedding Model

# Ollama (default, local)
export EMBEDDING_PROVIDER=ollama
export OLLAMA_BASE_URL=http://localhost:11434
export OLLAMA_EMBEDDING_MODEL=bge-base-en-v1.5

# Or OpenAI (cloud)
export EMBEDDING_PROVIDER=openai
export OPENAI_API_KEY=sk-...
export OPENAI_EMBEDDING_MODEL=text-embedding-3-small

3. Use with Claude Code

Add to your Claude Code configuration (settings.json or settings.local.json):

Minimal Configuration (Qdrant + Ollama):

{
  "mcpServers": {
    "mcp-git-codebase": {
      "command": "npx",
      "args": ["@devpuccino/mcp-git-codebase"],
      "env": {
        "VECTOR_DB_PROVIDER": "qdrant",
        "QDRANT_URL": "http://localhost:6333",
        "EMBEDDING_PROVIDER": "ollama",
        "OLLAMA_BASE_URL": "http://localhost:11434",
        "OLLAMA_EMBEDDING_MODEL": "nomic-embed-text"
      }
    }
  }
}

Full Configuration Example:

{
  "mcpServers": {
    "git-codebase": {
      "command": "npx",
      "args": ["--legacy-peer-deps", "@devpuccino/mcp-git-codebase"],
      "env": {
        "VECTOR_DB_PROVIDER": "qdrant",
        "QDRANT_URL": "http://your-qdrant-host:6333",
        "QDRANT_API_KEY": "your-api-key-if-needed",
        "VECTOR_DB_COLLECTION_PREFIX": "codebase_",
        
        "EMBEDDING_PROVIDER": "ollama",
        "OLLAMA_BASE_URL": "http://your-ollama-host:11434",
        "OLLAMA_EMBEDDING_MODEL": "bge-base-en-v1.5",
        "EMBEDDING_TIMEOUT": "30000",
        
        "LLM_PROVIDER": "ollama",
        "OLLAMA_MODEL": "qwen2.5-coder:7b",
        "OLLAMA_TIMEOUT": "30000",
        "OLLAMA_MAX_RETRIES": "3",
        "INDEXING_LLM_ENABLED": "true",
        
        "REDIS_HOST": "your-redis-host",
        "REDIS_PORT": "6379",
        "REDIS_PASSWORD": "your-redis-password",
        "REDIS_DB": "0",
        
        "ENABLE_RERANKING": "true",
        "RERANKER_TYPE": "bm25",
        
        "CONSUMER_CONCURRENCY": "2",
        "STARTUP_BATCH_ENABLED": "true",
        "STARTUP_BATCH_LIMIT": "50",
        
        "LOG_LEVEL": "info"
      }
    }
  }
}

Production Configuration (Pinecone + OpenAI):

{
  "mcpServers": {
    "mcp-git-codebase": {
      "command": "npx",
      "args": ["@devpuccino/mcp-git-codebase"],
      "env": {
        "VECTOR_DB_PROVIDER": "pinecone",
        "PINECONE_API_KEY": "your-pinecone-api-key",
        "PINECONE_ENVIRONMENT": "us-east-1",
        "PINECONE_INDEX": "your-index-name",
        
        "EMBEDDING_PROVIDER": "openai",
        "OPENAI_API_KEY": "your-openai-api-key",
        "OPENAI_EMBEDDING_MODEL": "text-embedding-3-small",
        
        "LLM_PROVIDER": "openai",
        "OPENAI_LLM_MODEL": "gpt-4o-mini",
        
        "LOG_LEVEL": "warn"
      }
    }
  }
}

Tools

query_codebase

Perform semantic search across a git repository to find relevant code snippets by meaning.

Parameters:

  • query_sentence (required): Natural language search query or code snippet
  • project_path (required): Root directory of the git repository
  • branch (optional): Specific branch to search (default: current branch)
  • limit (optional): Max results to return, 1-20 (default: 5)
  • similarity_threshold (optional): Minimum similarity score, 0-1 (default: 0.6)
  • file_extensions (optional): Filter by file extensions (e.g., [".ts", ".tsx"])

Example:

{
  "query_sentence": "function to authenticate users with JWT tokens",
  "project_path": "/workspace/myapp",
  "limit": 5,
  "file_extensions": [".ts", ".tsx"]
}

get_code_snippet

Retrieve a specific code snippet from a file with line-level precision.

Parameters:

  • project_path (required): Root directory of the git repository
  • filepath (required): Relative path to the file
  • start_line (optional): Starting line number (1-indexed)
  • end_line (optional): Ending line number
  • include_line_numbers (optional): Show line numbers (default: true)

Example:

{
  "project_path": "/workspace/myapp",
  "filepath": "src/auth/index.ts",
  "start_line": 10,
  "end_line": 45,
  "include_line_numbers": true
}

sync_codebase

Index or re-index a git repository into the vector database.

Parameters:

  • project_path (required): Root directory of the git repository
  • branch (optional): Branch to sync (default: current branch)
  • file_extensions (optional): Only sync specific file types
  • background (optional): Queue as background job (default: false)
  • force (optional): Force full re-index from scratch (default: false)

Example:

{
  "project_path": "/workspace/myapp",
  "force": false,
  "background": true
}

update_codebase

Trigger indexing after code changes. Optionally commits to git.

Parameters:

  • project_path (required): Root directory of the git repository
  • commit_message (required): Message summarizing changes
  • changed_files (required): Array of changed files with change type
  • trigger_type (required): One of manual, post_generation, post_merge
  • skip_git_commit (optional): Skip git commit (default: false)
  • background (optional): Queue as background job (default: false)

Example:

{
  "project_path": "/workspace/myapp",
  "commit_message": "Update authentication module",
  "changed_files": [
    { "path": "src/auth/index.ts", "change_type": "modified" },
    { "path": "src/auth/jwt.ts", "change_type": "added" }
  ],
  "trigger_type": "manual",
  "background": false
}

Environment Variables

General Vector Database Configuration

| Variable | Default | Description | |----------|---------|-------------| | VECTOR_DB_PROVIDER | qdrant | Vector database type: qdrant, pinecone, chroma, milvus, postgres | | EMBEDDING_DIMENSION | 1536 | Dimension of embedding vectors (auto-detected from model, rarely needed) | | VECTOR_DB_COLLECTION_PREFIX | - | Optional prefix for collection names (useful for multi-tenant setups) |

Qdrant Configuration

| Variable | Default | Description | |----------|---------|-------------| | QDRANT_URL | http://localhost:6333 | Qdrant server URL | | QDRANT_API_KEY | - | Qdrant API key (for cloud/managed instances) | | QDRANT_COLLECTION | code_snippets | Collection name for storing embeddings |

Pinecone Configuration

| Variable | Default | Description | |----------|---------|-------------| | PINECONE_API_KEY | - | Pinecone API key (required) | | PINECONE_ENVIRONMENT | - | Pinecone environment/region (required) | | PINECONE_INDEX | code-snippets | Pinecone index name |

Chroma Configuration

| Variable | Default | Description | |----------|---------|-------------| | CHROMA_URL | http://localhost | Chroma server URL | | CHROMA_PORT | 8000 | Chroma server port | | CHROMA_COLLECTION | code_snippets | Collection name for storing embeddings |

Milvus Configuration

| Variable | Default | Description | |----------|---------|-------------| | MILVUS_HOST | localhost | Milvus server host | | MILVUS_PORT | 19530 | Milvus server port | | MILVUS_COLLECTION | code_snippets | Collection name for storing embeddings |

PostgreSQL (pgvector) Configuration

| Variable | Default | Description | |----------|---------|-------------| | DATABASE_URL | - | PostgreSQL connection string (required) | | POSTGRES_VECTOR_TABLE | code_snippets_vectors | Table name for storing vectors | | POSTGRES_EMBEDDING_COLUMN | embedding | Column name for embedding vectors |

Embedding Model Configuration

| Variable | Default | Description | |----------|---------|-------------| | EMBEDDING_PROVIDER | ollama | Embedding provider: openai, ollama | | EMBEDDING_DIMENSION | 1536 | Dimension of embedding vectors (auto-detected from model if not set) | | EMBEDDING_TIMEOUT | 30000 | Timeout for embedding API requests (milliseconds) | | EMBEDDING_BATCH_SIZE | 10 | Number of items to embed per batch | | EMBEDDING_MAX_RETRIES | 3 | Maximum retry attempts for failed embedding requests |

OpenAI Embedding Configuration

| Variable | Default | Description | |----------|---------|-------------| | OPENAI_API_KEY | - | OpenAI API key (required for OpenAI provider) | | OPENAI_EMBEDDING_MODEL | text-embedding-3-small | OpenAI embedding model to use | | OPENAI_BASE_URL | https://api.openai.com | OpenAI API base URL (for custom endpoints) |

Ollama Embedding Configuration

| Variable | Default | Description | |----------|---------|-------------| | OLLAMA_BASE_URL | http://localhost:11434 | Ollama server URL | | OLLAMA_EMBEDDING_MODEL | bge-base-en-v1.5 | Ollama embedding model to use |

Common Ollama embedding models:

  • bge-base-en-v1.5 (768 dimensions) - default, good balance
  • bge-large-en-v1.5 (1024 dimensions) - higher quality
  • nomic-embed-text (768 dimensions) - fast and efficient
  • mxbai-embed-large (1024 dimensions) - high quality

LLM Provider Configuration

| Variable | Default | Description | |----------|---------|-------------| | LLM_PROVIDER | ollama | LLM provider for code analysis: openai, ollama | | LLM_TIMEOUT | 8000 | Timeout for LLM API requests (milliseconds) | | LLM_MAX_RETRIES | 2 | Maximum retry attempts for failed LLM requests | | INDEXING_LLM_ENABLED | true | Enable LLM-based metadata generation during indexing |

OpenAI LLM Configuration

| Variable | Default | Description | |----------|---------|-------------| | OPENAI_LLM_MODEL | gpt-4o-mini | OpenAI model for code analysis and summaries |

Ollama LLM Configuration

| Variable | Default | Description | |----------|---------|-------------| | OLLAMA_MODEL | qwen2.5-coder:7b | Ollama model for code analysis and summaries | | OLLAMA_TIMEOUT | 30000 | Timeout for Ollama API requests (milliseconds) | | OLLAMA_MAX_RETRIES | 3 | Maximum retry attempts for failed Ollama requests |

Common Ollama LLM models:

  • qwen2.5-coder:7b - default, excellent for code analysis
  • mistral - fast and capable, good for quick tasks
  • llama3 - Meta's Llama 3, general purpose
  • codellama - Meta's Code Llama, specialized for code generation

Reranker Configuration

| Variable | Default | Description | |----------|---------|-------------| | ENABLE_RERANKING | false | Enable composite reranking for improved search results | | RERANKER_TYPE | bm25 | Reranker type: bm25 (keyword-based) or qwen3 (semantic) | | RERANK_API_URL | - | Reranker API endpoint (required if RERANKER_TYPE=qwen3) | | RERANK_TIMEOUT_MS | 5000 | Request timeout in milliseconds |

Reranker Types:

  • bm25 - Keyword-based reranking (fast, no external API needed)
  • qwen3 - Semantic reranking using Qwen3 model (requires RERANK_API_URL)

Redis Configuration

| Variable | Default | Description | |----------|---------|-------------| | REDIS_URL | - | Full Redis connection URL (e.g., redis://localhost:6379). Takes precedence over individual settings | | REDIS_HOST | localhost | Redis server host (used if REDIS_URL not set) | | REDIS_PORT | 6379 | Redis server port (used if REDIS_URL not set) | | REDIS_PASSWORD | - | Redis password for authentication (optional) | | REDIS_DB | 0 | Redis database number (0-15) |

Background Processing (Bull Queue)

| Variable | Default | Description | |----------|---------|-------------| | CONSUMER_CONCURRENCY | 1 | Number of concurrent jobs to process | | PROCESSING_TIMEOUT | 300000 | Job timeout in milliseconds (default: 5 minutes) | | STARTUP_BATCH_ENABLED | true | Enable batch processing of queued jobs on startup | | STARTUP_BATCH_LIMIT | 50 | Maximum jobs to process in startup batch |

Note: Background processing requires a running Redis server. Use REDIS_URL for simple setups or individual settings (REDIS_HOST, REDIS_PORT, REDIS_PASSWORD, REDIS_DB) for more control.

Logging

| Variable | Default | Description | |----------|---------|-------------| | LOG_LEVEL | info | Log level: debug, info, warn, error | | LOG_FORMAT | json | Log format: json or text |

Architecture

┌─────────────────────────────────────────────────────────┐
│              Claude Code / MCP Client                    │
└──────────────────────┬──────────────────────────────────┘
                       │ MCP Protocol (JSON-RPC)
                       │
┌──────────────────────▼──────────────────────────────────┐
│          MCP Git Codebase Server                        │
│  ┌─────────────┬──────────────┬────────────────────┐  │
│  │   Tools     │   Indexing   │   Background Jobs  │  │
│  │  (4 tools)  │  Pipeline    │   (Bull + Redis)   │  │
│  └─────────────┴──────────────┴────────────────────┘  │
└──────────────────────┬──────────────────────────────────┘
                       │
        ┌──────────────┼──────────────┐
        │              │              │
        ▼              ▼              ▼
    Git Repo    Vector Database  Embedding Model
    (local)     (Qdrant/etc)      (OpenAI/Ollama)

Data Flow

  1. Indexing Pipeline

    • Extract code units (functions, classes, etc.) using tree-sitter parsers
    • Generate embeddings via selected provider
    • Store in vector database with metadata
    • Track indexing state and checkpoints
  2. Query Pipeline

    • Convert query to embedding
    • Perform vector similarity search
    • Re-rank results with BM25/custom rerankers
    • Return top matches with context
  3. Background Processing

    • Bull job queue backed by Redis
    • Async job processing with retry logic
    • Failed job persistence and recovery

Supported Languages

  • TypeScript / JavaScript
  • Python
  • Go
  • Java
  • Rust
  • C/C++ (via tree-sitter)
  • Ruby
  • PHP
  • Kotlin
  • Scala
  • Swift
  • Bash
  • Robot Framework

Configuration Examples

# Start Qdrant (requires Docker)
docker run -p 6333:6333 qdrant/qdrant

# Set environment variables
export VECTOR_DB_PROVIDER=qdrant
export QDRANT_URL=http://localhost:6333
export EMBEDDING_PROVIDER=ollama
export OLLAMA_BASE_URL=http://localhost:11434
export OLLAMA_EMBEDDING_MODEL=nomic-embed-text

# Start server
npx @devpuccino/mcp-git-codebase

Production with Pinecone

# Set environment variables
export VECTOR_DB_PROVIDER=pinecone
export PINECONE_API_KEY=your-production-key
export PINECONE_ENVIRONMENT=your-environment
export PINECONE_INDEX=prod-codebase
export EMBEDDING_PROVIDER=openai
export OPENAI_API_KEY=sk-...
export OPENAI_EMBEDDING_MODEL=text-embedding-3-small

# Start server
npx @devpuccino/mcp-git-codebase

PostgreSQL with pgvector

# Set environment variables
export VECTOR_DB_PROVIDER=postgres
export DATABASE_URL=postgresql://user:password@host:5432/codebase_db
export EMBEDDING_PROVIDER=openai
export OPENAI_API_KEY=sk-...
export OPENAI_EMBEDDING_MODEL=text-embedding-3-small

# Start server
npx @devpuccino/mcp-git-codebase

Performance Considerations

  • Embedding Generation: Largest cost factor (~100ms per code unit)
  • Vector Search: Sub-100ms for typical queries
  • Code Extraction: ~50-200ms per file depending on size
  • Indexing Speed: ~1000-2000 code units per minute

Optimization Tips:

  • Use background=true for large codebases
  • Set appropriate CONSUMER_CONCURRENCY based on resources
  • Implement incremental indexing via update_codebase
  • Filter by file_extensions to reduce scope
  • Use higher similarity_threshold if too many results

Troubleshooting

Connection Issues

# Verify vector database is running
curl http://localhost:6333/health  # Qdrant
curl http://localhost:8000/api/v1/heartbeat  # Chroma

# Check logs
export LOG_LEVEL=debug
npx @devpuccino/mcp-git-codebase

Embedding Model Issues

# Verify Ollama is running
curl http://localhost:11434/api/tags

# Or verify OpenAI API key
echo $OPENAI_API_KEY

Out of Memory

  • Reduce CONSUMER_CONCURRENCY
  • Process smaller repositories first
  • Enable background=true for large syncs

Development

# Clone repository
git clone https://github.com/devpuccino/mcp-git-codebase.git
cd mcp-git-codebase

# Install dependencies
npm install

# Build
npm run build

# Run tests
npm test

# Start in development mode
npm run dev

License

MIT

Support

For issues, questions, or feature requests, please visit the GitHub repository.