@devpuccino/mcp-git-codebase

v1.0.1

Published

a month ago

MCP server providing semantic code search and indexing for git repositories

0High
0Medium
0Low

devpuccino

mcp git codebase semantic-search vector-db

@devpuccino/mcp-git-codebase

An MCP (Model Context Protocol) server that provides semantic code search and intelligent indexing for git repositories. Enables AI-powered semantic search across codebases using vector embeddings to find relevant code snippets by intent, not just keywords.

Features

✨ Semantic Search - Find code by meaning, not just keywords
🔍 Multi-Language Support - TypeScript, JavaScript, Python, Go, Java, Rust, and more
📊 Multiple Vector Databases - Qdrant, Pinecone, Chroma, Milvus, PostgreSQL with pgvector
🚀 Scalable Indexing - Handle repositories with 1M+ files and 100GB+ of code
⚙️ Background Processing - Queue indexing jobs via Redis/Bull
🌿 Branch-Aware - Search across specific branches or track changes over time
🎯 Precise Code Retrieval - Get exact code snippets with line-level precision.

Installation

Prerequisites

Node.js ≥ 18.0.0
Git (for repository operations)
One of the supported vector databases (Qdrant, Pinecone, Chroma, Milvus, or PostgreSQL)

Install Package

npm install @devpuccino/mcp-git-codebase

Quick Start

1. Configure Vector Database

Set your preferred vector database and its connection details:

# Qdrant (recommended for local development)
export VECTOR_DB_PROVIDER=qdrant
export QDRANT_URL=http://localhost:6333

# Or Pinecone
export VECTOR_DB_PROVIDER=pinecone
export PINECONE_API_KEY=your-api-key
export PINECONE_ENVIRONMENT=your-environment
export PINECONE_INDEX=your-index

# Or PostgreSQL with pgvector
export VECTOR_DB_PROVIDER=postgres
export DATABASE_URL=postgresql://user:password@localhost:5432/codebase

# Or Chroma
export VECTOR_DB_PROVIDER=chroma
export CHROMA_URL=http://localhost
export CHROMA_PORT=8000

# Or Milvus
export VECTOR_DB_PROVIDER=milvus
export MILVUS_HOST=localhost
export MILVUS_PORT=19530

2. Configure Embedding Model

# Ollama (default, local)
export EMBEDDING_PROVIDER=ollama
export OLLAMA_BASE_URL=http://localhost:11434
export OLLAMA_EMBEDDING_MODEL=bge-base-en-v1.5

# Or OpenAI (cloud)
export EMBEDDING_PROVIDER=openai
export OPENAI_API_KEY=sk-...
export OPENAI_EMBEDDING_MODEL=text-embedding-3-small

3. Use with Claude Code

Add to your Claude Code configuration (settings.json or settings.local.json):

Minimal Configuration (Qdrant + Ollama):

{
  "mcpServers": {
    "mcp-git-codebase": {
      "command": "npx",
      "args": ["@devpuccino/mcp-git-codebase"],
      "env": {
        "VECTOR_DB_PROVIDER": "qdrant",
        "QDRANT_URL": "http://localhost:6333",
        "EMBEDDING_PROVIDER": "ollama",
        "OLLAMA_BASE_URL": "http://localhost:11434",
        "OLLAMA_EMBEDDING_MODEL": "nomic-embed-text"
      }
    }
  }
}

Full Configuration Example:

{
  "mcpServers": {
    "git-codebase": {
      "command": "npx",
      "args": ["--legacy-peer-deps", "@devpuccino/mcp-git-codebase"],
      "env": {
        "VECTOR_DB_PROVIDER": "qdrant",
        "QDRANT_URL": "http://your-qdrant-host:6333",
        "QDRANT_API_KEY": "your-api-key-if-needed",
        "VECTOR_DB_COLLECTION_PREFIX": "codebase_",
        
        "EMBEDDING_PROVIDER": "ollama",
        "OLLAMA_BASE_URL": "http://your-ollama-host:11434",
        "OLLAMA_EMBEDDING_MODEL": "bge-base-en-v1.5",
        "EMBEDDING_TIMEOUT": "30000",
        
        "LLM_PROVIDER": "ollama",
        "OLLAMA_MODEL": "qwen2.5-coder:7b",
        "OLLAMA_TIMEOUT": "30000",
        "OLLAMA_MAX_RETRIES": "3",
        "INDEXING_LLM_ENABLED": "true",
        
        "REDIS_HOST": "your-redis-host",
        "REDIS_PORT": "6379",
        "REDIS_PASSWORD": "your-redis-password",
        "REDIS_DB": "0",
        
        "ENABLE_RERANKING": "true",
        "RERANKER_TYPE": "bm25",
        
        "CONSUMER_CONCURRENCY": "2",
        "STARTUP_BATCH_ENABLED": "true",
        "STARTUP_BATCH_LIMIT": "50",
        
        "LOG_LEVEL": "info"
      }
    }
  }
}

Production Configuration (Pinecone + OpenAI):

{
  "mcpServers": {
    "mcp-git-codebase": {
      "command": "npx",
      "args": ["@devpuccino/mcp-git-codebase"],
      "env": {
        "VECTOR_DB_PROVIDER": "pinecone",
        "PINECONE_API_KEY": "your-pinecone-api-key",
        "PINECONE_ENVIRONMENT": "us-east-1",
        "PINECONE_INDEX": "your-index-name",
        
        "EMBEDDING_PROVIDER": "openai",
        "OPENAI_API_KEY": "your-openai-api-key",
        "OPENAI_EMBEDDING_MODEL": "text-embedding-3-small",
        
        "LLM_PROVIDER": "openai",
        "OPENAI_LLM_MODEL": "gpt-4o-mini",
        
        "LOG_LEVEL": "warn"
      }
    }
  }
}

Tools

`query_codebase`

Perform semantic search across a git repository to find relevant code snippets by meaning.

Parameters:

query_sentence (required): Natural language search query or code snippet
project_path (required): Root directory of the git repository
branch (optional): Specific branch to search (default: current branch)
limit (optional): Max results to return, 1-20 (default: 5)
similarity_threshold (optional): Minimum similarity score, 0-1 (default: 0.6)
file_extensions (optional): Filter by file extensions (e.g., [".ts", ".tsx"])

Example:

{
  "query_sentence": "function to authenticate users with JWT tokens",
  "project_path": "/workspace/myapp",
  "limit": 5,
  "file_extensions": [".ts", ".tsx"]
}

`get_code_snippet`

Retrieve a specific code snippet from a file with line-level precision.

Parameters:

project_path (required): Root directory of the git repository
filepath (required): Relative path to the file
start_line (optional): Starting line number (1-indexed)
end_line (optional): Ending line number
include_line_numbers (optional): Show line numbers (default: true)

Example:

{
  "project_path": "/workspace/myapp",
  "filepath": "src/auth/index.ts",
  "start_line": 10,
  "end_line": 45,
  "include_line_numbers": true
}

`sync_codebase`

Index or re-index a git repository into the vector database.

Parameters:

project_path (required): Root directory of the git repository
branch (optional): Branch to sync (default: current branch)
file_extensions (optional): Only sync specific file types
background (optional): Queue as background job (default: false)
force (optional): Force full re-index from scratch (default: false)

Example:

{
  "project_path": "/workspace/myapp",
  "force": false,
  "background": true
}

`update_codebase`

Trigger indexing after code changes. Optionally commits to git.

Parameters:

project_path (required): Root directory of the git repository
commit_message (required): Message summarizing changes
changed_files (required): Array of changed files with change type
trigger_type (required): One of manual, post_generation, post_merge
skip_git_commit (optional): Skip git commit (default: false)
background (optional): Queue as background job (default: false)

Example:

{
  "project_path": "/workspace/myapp",
  "commit_message": "Update authentication module",
  "changed_files": [
    { "path": "src/auth/index.ts", "change_type": "modified" },
    { "path": "src/auth/jwt.ts", "change_type": "added" }
  ],
  "trigger_type": "manual",
  "background": false
}

Environment Variables

General Vector Database Configuration

| Variable | Default | Description | |----------|---------|-------------| | VECTOR_DB_PROVIDER | qdrant | Vector database type: qdrant, pinecone, chroma, milvus, postgres | | EMBEDDING_DIMENSION | 1536 | Dimension of embedding vectors (auto-detected from model, rarely needed) | | VECTOR_DB_COLLECTION_PREFIX | - | Optional prefix for collection names (useful for multi-tenant setups) |

Qdrant Configuration

| Variable | Default | Description | |----------|---------|-------------| | QDRANT_URL | http://localhost:6333 | Qdrant server URL | | QDRANT_API_KEY | - | Qdrant API key (for cloud/managed instances) | | QDRANT_COLLECTION | code_snippets | Collection name for storing embeddings |

Pinecone Configuration

| Variable | Default | Description | |----------|---------|-------------| | PINECONE_API_KEY | - | Pinecone API key (required) | | PINECONE_ENVIRONMENT | - | Pinecone environment/region (required) | | PINECONE_INDEX | code-snippets | Pinecone index name |

Chroma Configuration

| Variable | Default | Description | |----------|---------|-------------| | CHROMA_URL | http://localhost | Chroma server URL | | CHROMA_PORT | 8000 | Chroma server port | | CHROMA_COLLECTION | code_snippets | Collection name for storing embeddings |

Milvus Configuration

| Variable | Default | Description | |----------|---------|-------------| | MILVUS_HOST | localhost | Milvus server host | | MILVUS_PORT | 19530 | Milvus server port | | MILVUS_COLLECTION | code_snippets | Collection name for storing embeddings |

PostgreSQL (pgvector) Configuration

| Variable | Default | Description | |----------|---------|-------------| | DATABASE_URL | - | PostgreSQL connection string (required) | | POSTGRES_VECTOR_TABLE | code_snippets_vectors | Table name for storing vectors | | POSTGRES_EMBEDDING_COLUMN | embedding | Column name for embedding vectors |

Embedding Model Configuration

| Variable | Default | Description | |----------|---------|-------------| | EMBEDDING_PROVIDER | ollama | Embedding provider: openai, ollama | | EMBEDDING_DIMENSION | 1536 | Dimension of embedding vectors (auto-detected from model if not set) | | EMBEDDING_TIMEOUT | 30000 | Timeout for embedding API requests (milliseconds) | | EMBEDDING_BATCH_SIZE | 10 | Number of items to embed per batch | | EMBEDDING_MAX_RETRIES | 3 | Maximum retry attempts for failed embedding requests |

OpenAI Embedding Configuration

| Variable | Default | Description | |----------|---------|-------------| | OPENAI_API_KEY | - | OpenAI API key (required for OpenAI provider) | | OPENAI_EMBEDDING_MODEL | text-embedding-3-small | OpenAI embedding model to use | | OPENAI_BASE_URL | https://api.openai.com | OpenAI API base URL (for custom endpoints) |

Ollama Embedding Configuration

| Variable | Default | Description | |----------|---------|-------------| | OLLAMA_BASE_URL | http://localhost:11434 | Ollama server URL | | OLLAMA_EMBEDDING_MODEL | bge-base-en-v1.5 | Ollama embedding model to use |

Common Ollama embedding models:

bge-base-en-v1.5 (768 dimensions) - default, good balance
bge-large-en-v1.5 (1024 dimensions) - higher quality
nomic-embed-text (768 dimensions) - fast and efficient
mxbai-embed-large (1024 dimensions) - high quality

LLM Provider Configuration

| Variable | Default | Description | |----------|---------|-------------| | LLM_PROVIDER | ollama | LLM provider for code analysis: openai, ollama | | LLM_TIMEOUT | 8000 | Timeout for LLM API requests (milliseconds) | | LLM_MAX_RETRIES | 2 | Maximum retry attempts for failed LLM requests | | INDEXING_LLM_ENABLED | true | Enable LLM-based metadata generation during indexing |

OpenAI LLM Configuration

| Variable | Default | Description | |----------|---------|-------------| | OPENAI_LLM_MODEL | gpt-4o-mini | OpenAI model for code analysis and summaries |

Ollama LLM Configuration

| Variable | Default | Description | |----------|---------|-------------| | OLLAMA_MODEL | qwen2.5-coder:7b | Ollama model for code analysis and summaries | | OLLAMA_TIMEOUT | 30000 | Timeout for Ollama API requests (milliseconds) | | OLLAMA_MAX_RETRIES | 3 | Maximum retry attempts for failed Ollama requests |

Common Ollama LLM models:

qwen2.5-coder:7b - default, excellent for code analysis
mistral - fast and capable, good for quick tasks
llama3 - Meta's Llama 3, general purpose
codellama - Meta's Code Llama, specialized for code generation

Reranker Configuration

| Variable | Default | Description | |----------|---------|-------------| | ENABLE_RERANKING | false | Enable composite reranking for improved search results | | RERANKER_TYPE | bm25 | Reranker type: bm25 (keyword-based) or qwen3 (semantic) | | RERANK_API_URL | - | Reranker API endpoint (required if RERANKER_TYPE=qwen3) | | RERANK_TIMEOUT_MS | 5000 | Request timeout in milliseconds |

Reranker Types:

bm25 - Keyword-based reranking (fast, no external API needed)
qwen3 - Semantic reranking using Qwen3 model (requires RERANK_API_URL)

Redis Configuration

| Variable | Default | Description | |----------|---------|-------------| | REDIS_URL | - | Full Redis connection URL (e.g., redis://localhost:6379). Takes precedence over individual settings | | REDIS_HOST | localhost | Redis server host (used if REDIS_URL not set) | | REDIS_PORT | 6379 | Redis server port (used if REDIS_URL not set) | | REDIS_PASSWORD | - | Redis password for authentication (optional) | | REDIS_DB | 0 | Redis database number (0-15) |

Background Processing (Bull Queue)

| Variable | Default | Description | |----------|---------|-------------| | CONSUMER_CONCURRENCY | 1 | Number of concurrent jobs to process | | PROCESSING_TIMEOUT | 300000 | Job timeout in milliseconds (default: 5 minutes) | | STARTUP_BATCH_ENABLED | true | Enable batch processing of queued jobs on startup | | STARTUP_BATCH_LIMIT | 50 | Maximum jobs to process in startup batch |

Note: Background processing requires a running Redis server. Use REDIS_URL for simple setups or individual settings (REDIS_HOST, REDIS_PORT, REDIS_PASSWORD, REDIS_DB) for more control.

Logging

| Variable | Default | Description | |----------|---------|-------------| | LOG_LEVEL | info | Log level: debug, info, warn, error | | LOG_FORMAT | json | Log format: json or text |

Architecture

┌─────────────────────────────────────────────────────────┐
│              Claude Code / MCP Client                    │
└──────────────────────┬──────────────────────────────────┘
                       │ MCP Protocol (JSON-RPC)
                       │
┌──────────────────────▼──────────────────────────────────┐
│          MCP Git Codebase Server                        │
│  ┌─────────────┬──────────────┬────────────────────┐  │
│  │   Tools     │   Indexing   │   Background Jobs  │  │
│  │  (4 tools)  │  Pipeline    │   (Bull + Redis)   │  │
│  └─────────────┴──────────────┴────────────────────┘  │
└──────────────────────┬──────────────────────────────────┘
                       │
        ┌──────────────┼──────────────┐
        │              │              │
        ▼              ▼              ▼
    Git Repo    Vector Database  Embedding Model
    (local)     (Qdrant/etc)      (OpenAI/Ollama)

Data Flow

Indexing Pipeline
- Extract code units (functions, classes, etc.) using tree-sitter parsers
- Generate embeddings via selected provider
- Store in vector database with metadata
- Track indexing state and checkpoints
Query Pipeline
- Convert query to embedding
- Perform vector similarity search
- Re-rank results with BM25/custom rerankers
- Return top matches with context
Background Processing
- Bull job queue backed by Redis
- Async job processing with retry logic
- Failed job persistence and recovery

Supported Languages

TypeScript / JavaScript
Python
Go
Java
Rust
C/C++ (via tree-sitter)
Ruby
PHP
Kotlin
Scala
Swift
Bash
Robot Framework

Configuration Examples

# Start Qdrant (requires Docker)
docker run -p 6333:6333 qdrant/qdrant

# Set environment variables
export VECTOR_DB_PROVIDER=qdrant
export QDRANT_URL=http://localhost:6333
export EMBEDDING_PROVIDER=ollama
export OLLAMA_BASE_URL=http://localhost:11434
export OLLAMA_EMBEDDING_MODEL=nomic-embed-text

# Start server
npx @devpuccino/mcp-git-codebase

Production with Pinecone

# Set environment variables
export VECTOR_DB_PROVIDER=pinecone
export PINECONE_API_KEY=your-production-key
export PINECONE_ENVIRONMENT=your-environment
export PINECONE_INDEX=prod-codebase
export EMBEDDING_PROVIDER=openai
export OPENAI_API_KEY=sk-...
export OPENAI_EMBEDDING_MODEL=text-embedding-3-small

# Start server
npx @devpuccino/mcp-git-codebase

PostgreSQL with pgvector

# Set environment variables
export VECTOR_DB_PROVIDER=postgres
export DATABASE_URL=postgresql://user:password@host:5432/codebase_db
export EMBEDDING_PROVIDER=openai
export OPENAI_API_KEY=sk-...
export OPENAI_EMBEDDING_MODEL=text-embedding-3-small

# Start server
npx @devpuccino/mcp-git-codebase

Performance Considerations

Embedding Generation: Largest cost factor (~100ms per code unit)
Vector Search: Sub-100ms for typical queries
Code Extraction: ~50-200ms per file depending on size
Indexing Speed: ~1000-2000 code units per minute

Optimization Tips:

Use background=true for large codebases
Set appropriate CONSUMER_CONCURRENCY based on resources
Implement incremental indexing via update_codebase
Filter by file_extensions to reduce scope
Use higher similarity_threshold if too many results

Troubleshooting

Connection Issues

# Verify vector database is running
curl http://localhost:6333/health  # Qdrant
curl http://localhost:8000/api/v1/heartbeat  # Chroma

# Check logs
export LOG_LEVEL=debug
npx @devpuccino/mcp-git-codebase

Embedding Model Issues

# Verify Ollama is running
curl http://localhost:11434/api/tags

# Or verify OpenAI API key
echo $OPENAI_API_KEY

Out of Memory

Reduce CONSUMER_CONCURRENCY
Process smaller repositories first
Enable background=true for large syncs

Development

# Clone repository
git clone https://github.com/devpuccino/mcp-git-codebase.git
cd mcp-git-codebase

# Install dependencies
npm install

# Build
npm run build

# Run tests
npm test

# Start in development mode
npm run dev

License

MIT

Support

For issues, questions, or feature requests, please visit the GitHub repository.

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

@devpuccino/mcp-git-codebase

Features

Installation

Prerequisites

Install Package

Quick Start

1. Configure Vector Database

2. Configure Embedding Model

3. Use with Claude Code

Tools

query_codebase

get_code_snippet

sync_codebase

update_codebase

Environment Variables

General Vector Database Configuration

Qdrant Configuration

Pinecone Configuration

Chroma Configuration

Milvus Configuration

PostgreSQL (pgvector) Configuration

Embedding Model Configuration

OpenAI Embedding Configuration

Ollama Embedding Configuration

LLM Provider Configuration

OpenAI LLM Configuration

Ollama LLM Configuration

Reranker Configuration

Redis Configuration

Background Processing (Bull Queue)

Logging

Architecture

Data Flow

Supported Languages

Configuration Examples

Production with Pinecone

PostgreSQL with pgvector

Performance Considerations

Troubleshooting

Connection Issues

Embedding Model Issues

Out of Memory

Development

License

Support

`query_codebase`

`get_code_snippet`

`sync_codebase`

`update_codebase`