@devpuccino/mcp-git-codebase
v1.0.1
Published
MCP server providing semantic code search and indexing for git repositories
Maintainers
Readme
@devpuccino/mcp-git-codebase
An MCP (Model Context Protocol) server that provides semantic code search and intelligent indexing for git repositories. Enables AI-powered semantic search across codebases using vector embeddings to find relevant code snippets by intent, not just keywords.
Features
✨ Semantic Search - Find code by meaning, not just keywords
🔍 Multi-Language Support - TypeScript, JavaScript, Python, Go, Java, Rust, and more
📊 Multiple Vector Databases - Qdrant, Pinecone, Chroma, Milvus, PostgreSQL with pgvector
🚀 Scalable Indexing - Handle repositories with 1M+ files and 100GB+ of code
⚙️ Background Processing - Queue indexing jobs via Redis/Bull
🌿 Branch-Aware - Search across specific branches or track changes over time
🎯 Precise Code Retrieval - Get exact code snippets with line-level precision.
Installation
Prerequisites
- Node.js ≥ 18.0.0
- Git (for repository operations)
- One of the supported vector databases (Qdrant, Pinecone, Chroma, Milvus, or PostgreSQL)
Install Package
npm install @devpuccino/mcp-git-codebaseQuick Start
1. Configure Vector Database
Set your preferred vector database and its connection details:
# Qdrant (recommended for local development)
export VECTOR_DB_PROVIDER=qdrant
export QDRANT_URL=http://localhost:6333
# Or Pinecone
export VECTOR_DB_PROVIDER=pinecone
export PINECONE_API_KEY=your-api-key
export PINECONE_ENVIRONMENT=your-environment
export PINECONE_INDEX=your-index
# Or PostgreSQL with pgvector
export VECTOR_DB_PROVIDER=postgres
export DATABASE_URL=postgresql://user:password@localhost:5432/codebase
# Or Chroma
export VECTOR_DB_PROVIDER=chroma
export CHROMA_URL=http://localhost
export CHROMA_PORT=8000
# Or Milvus
export VECTOR_DB_PROVIDER=milvus
export MILVUS_HOST=localhost
export MILVUS_PORT=195302. Configure Embedding Model
# Ollama (default, local)
export EMBEDDING_PROVIDER=ollama
export OLLAMA_BASE_URL=http://localhost:11434
export OLLAMA_EMBEDDING_MODEL=bge-base-en-v1.5
# Or OpenAI (cloud)
export EMBEDDING_PROVIDER=openai
export OPENAI_API_KEY=sk-...
export OPENAI_EMBEDDING_MODEL=text-embedding-3-small3. Use with Claude Code
Add to your Claude Code configuration (settings.json or settings.local.json):
Minimal Configuration (Qdrant + Ollama):
{
"mcpServers": {
"mcp-git-codebase": {
"command": "npx",
"args": ["@devpuccino/mcp-git-codebase"],
"env": {
"VECTOR_DB_PROVIDER": "qdrant",
"QDRANT_URL": "http://localhost:6333",
"EMBEDDING_PROVIDER": "ollama",
"OLLAMA_BASE_URL": "http://localhost:11434",
"OLLAMA_EMBEDDING_MODEL": "nomic-embed-text"
}
}
}
}Full Configuration Example:
{
"mcpServers": {
"git-codebase": {
"command": "npx",
"args": ["--legacy-peer-deps", "@devpuccino/mcp-git-codebase"],
"env": {
"VECTOR_DB_PROVIDER": "qdrant",
"QDRANT_URL": "http://your-qdrant-host:6333",
"QDRANT_API_KEY": "your-api-key-if-needed",
"VECTOR_DB_COLLECTION_PREFIX": "codebase_",
"EMBEDDING_PROVIDER": "ollama",
"OLLAMA_BASE_URL": "http://your-ollama-host:11434",
"OLLAMA_EMBEDDING_MODEL": "bge-base-en-v1.5",
"EMBEDDING_TIMEOUT": "30000",
"LLM_PROVIDER": "ollama",
"OLLAMA_MODEL": "qwen2.5-coder:7b",
"OLLAMA_TIMEOUT": "30000",
"OLLAMA_MAX_RETRIES": "3",
"INDEXING_LLM_ENABLED": "true",
"REDIS_HOST": "your-redis-host",
"REDIS_PORT": "6379",
"REDIS_PASSWORD": "your-redis-password",
"REDIS_DB": "0",
"ENABLE_RERANKING": "true",
"RERANKER_TYPE": "bm25",
"CONSUMER_CONCURRENCY": "2",
"STARTUP_BATCH_ENABLED": "true",
"STARTUP_BATCH_LIMIT": "50",
"LOG_LEVEL": "info"
}
}
}
}Production Configuration (Pinecone + OpenAI):
{
"mcpServers": {
"mcp-git-codebase": {
"command": "npx",
"args": ["@devpuccino/mcp-git-codebase"],
"env": {
"VECTOR_DB_PROVIDER": "pinecone",
"PINECONE_API_KEY": "your-pinecone-api-key",
"PINECONE_ENVIRONMENT": "us-east-1",
"PINECONE_INDEX": "your-index-name",
"EMBEDDING_PROVIDER": "openai",
"OPENAI_API_KEY": "your-openai-api-key",
"OPENAI_EMBEDDING_MODEL": "text-embedding-3-small",
"LLM_PROVIDER": "openai",
"OPENAI_LLM_MODEL": "gpt-4o-mini",
"LOG_LEVEL": "warn"
}
}
}
}Tools
query_codebase
Perform semantic search across a git repository to find relevant code snippets by meaning.
Parameters:
query_sentence(required): Natural language search query or code snippetproject_path(required): Root directory of the git repositorybranch(optional): Specific branch to search (default: current branch)limit(optional): Max results to return, 1-20 (default: 5)similarity_threshold(optional): Minimum similarity score, 0-1 (default: 0.6)file_extensions(optional): Filter by file extensions (e.g.,[".ts", ".tsx"])
Example:
{
"query_sentence": "function to authenticate users with JWT tokens",
"project_path": "/workspace/myapp",
"limit": 5,
"file_extensions": [".ts", ".tsx"]
}get_code_snippet
Retrieve a specific code snippet from a file with line-level precision.
Parameters:
project_path(required): Root directory of the git repositoryfilepath(required): Relative path to the filestart_line(optional): Starting line number (1-indexed)end_line(optional): Ending line numberinclude_line_numbers(optional): Show line numbers (default: true)
Example:
{
"project_path": "/workspace/myapp",
"filepath": "src/auth/index.ts",
"start_line": 10,
"end_line": 45,
"include_line_numbers": true
}sync_codebase
Index or re-index a git repository into the vector database.
Parameters:
project_path(required): Root directory of the git repositorybranch(optional): Branch to sync (default: current branch)file_extensions(optional): Only sync specific file typesbackground(optional): Queue as background job (default: false)force(optional): Force full re-index from scratch (default: false)
Example:
{
"project_path": "/workspace/myapp",
"force": false,
"background": true
}update_codebase
Trigger indexing after code changes. Optionally commits to git.
Parameters:
project_path(required): Root directory of the git repositorycommit_message(required): Message summarizing changeschanged_files(required): Array of changed files with change typetrigger_type(required): One ofmanual,post_generation,post_mergeskip_git_commit(optional): Skip git commit (default: false)background(optional): Queue as background job (default: false)
Example:
{
"project_path": "/workspace/myapp",
"commit_message": "Update authentication module",
"changed_files": [
{ "path": "src/auth/index.ts", "change_type": "modified" },
{ "path": "src/auth/jwt.ts", "change_type": "added" }
],
"trigger_type": "manual",
"background": false
}Environment Variables
General Vector Database Configuration
| Variable | Default | Description |
|----------|---------|-------------|
| VECTOR_DB_PROVIDER | qdrant | Vector database type: qdrant, pinecone, chroma, milvus, postgres |
| EMBEDDING_DIMENSION | 1536 | Dimension of embedding vectors (auto-detected from model, rarely needed) |
| VECTOR_DB_COLLECTION_PREFIX | - | Optional prefix for collection names (useful for multi-tenant setups) |
Qdrant Configuration
| Variable | Default | Description |
|----------|---------|-------------|
| QDRANT_URL | http://localhost:6333 | Qdrant server URL |
| QDRANT_API_KEY | - | Qdrant API key (for cloud/managed instances) |
| QDRANT_COLLECTION | code_snippets | Collection name for storing embeddings |
Pinecone Configuration
| Variable | Default | Description |
|----------|---------|-------------|
| PINECONE_API_KEY | - | Pinecone API key (required) |
| PINECONE_ENVIRONMENT | - | Pinecone environment/region (required) |
| PINECONE_INDEX | code-snippets | Pinecone index name |
Chroma Configuration
| Variable | Default | Description |
|----------|---------|-------------|
| CHROMA_URL | http://localhost | Chroma server URL |
| CHROMA_PORT | 8000 | Chroma server port |
| CHROMA_COLLECTION | code_snippets | Collection name for storing embeddings |
Milvus Configuration
| Variable | Default | Description |
|----------|---------|-------------|
| MILVUS_HOST | localhost | Milvus server host |
| MILVUS_PORT | 19530 | Milvus server port |
| MILVUS_COLLECTION | code_snippets | Collection name for storing embeddings |
PostgreSQL (pgvector) Configuration
| Variable | Default | Description |
|----------|---------|-------------|
| DATABASE_URL | - | PostgreSQL connection string (required) |
| POSTGRES_VECTOR_TABLE | code_snippets_vectors | Table name for storing vectors |
| POSTGRES_EMBEDDING_COLUMN | embedding | Column name for embedding vectors |
Embedding Model Configuration
| Variable | Default | Description |
|----------|---------|-------------|
| EMBEDDING_PROVIDER | ollama | Embedding provider: openai, ollama |
| EMBEDDING_DIMENSION | 1536 | Dimension of embedding vectors (auto-detected from model if not set) |
| EMBEDDING_TIMEOUT | 30000 | Timeout for embedding API requests (milliseconds) |
| EMBEDDING_BATCH_SIZE | 10 | Number of items to embed per batch |
| EMBEDDING_MAX_RETRIES | 3 | Maximum retry attempts for failed embedding requests |
OpenAI Embedding Configuration
| Variable | Default | Description |
|----------|---------|-------------|
| OPENAI_API_KEY | - | OpenAI API key (required for OpenAI provider) |
| OPENAI_EMBEDDING_MODEL | text-embedding-3-small | OpenAI embedding model to use |
| OPENAI_BASE_URL | https://api.openai.com | OpenAI API base URL (for custom endpoints) |
Ollama Embedding Configuration
| Variable | Default | Description |
|----------|---------|-------------|
| OLLAMA_BASE_URL | http://localhost:11434 | Ollama server URL |
| OLLAMA_EMBEDDING_MODEL | bge-base-en-v1.5 | Ollama embedding model to use |
Common Ollama embedding models:
bge-base-en-v1.5(768 dimensions) - default, good balancebge-large-en-v1.5(1024 dimensions) - higher qualitynomic-embed-text(768 dimensions) - fast and efficientmxbai-embed-large(1024 dimensions) - high quality
LLM Provider Configuration
| Variable | Default | Description |
|----------|---------|-------------|
| LLM_PROVIDER | ollama | LLM provider for code analysis: openai, ollama |
| LLM_TIMEOUT | 8000 | Timeout for LLM API requests (milliseconds) |
| LLM_MAX_RETRIES | 2 | Maximum retry attempts for failed LLM requests |
| INDEXING_LLM_ENABLED | true | Enable LLM-based metadata generation during indexing |
OpenAI LLM Configuration
| Variable | Default | Description |
|----------|---------|-------------|
| OPENAI_LLM_MODEL | gpt-4o-mini | OpenAI model for code analysis and summaries |
Ollama LLM Configuration
| Variable | Default | Description |
|----------|---------|-------------|
| OLLAMA_MODEL | qwen2.5-coder:7b | Ollama model for code analysis and summaries |
| OLLAMA_TIMEOUT | 30000 | Timeout for Ollama API requests (milliseconds) |
| OLLAMA_MAX_RETRIES | 3 | Maximum retry attempts for failed Ollama requests |
Common Ollama LLM models:
qwen2.5-coder:7b- default, excellent for code analysismistral- fast and capable, good for quick tasksllama3- Meta's Llama 3, general purposecodellama- Meta's Code Llama, specialized for code generation
Reranker Configuration
| Variable | Default | Description |
|----------|---------|-------------|
| ENABLE_RERANKING | false | Enable composite reranking for improved search results |
| RERANKER_TYPE | bm25 | Reranker type: bm25 (keyword-based) or qwen3 (semantic) |
| RERANK_API_URL | - | Reranker API endpoint (required if RERANKER_TYPE=qwen3) |
| RERANK_TIMEOUT_MS | 5000 | Request timeout in milliseconds |
Reranker Types:
bm25- Keyword-based reranking (fast, no external API needed)qwen3- Semantic reranking using Qwen3 model (requiresRERANK_API_URL)
Redis Configuration
| Variable | Default | Description |
|----------|---------|-------------|
| REDIS_URL | - | Full Redis connection URL (e.g., redis://localhost:6379). Takes precedence over individual settings |
| REDIS_HOST | localhost | Redis server host (used if REDIS_URL not set) |
| REDIS_PORT | 6379 | Redis server port (used if REDIS_URL not set) |
| REDIS_PASSWORD | - | Redis password for authentication (optional) |
| REDIS_DB | 0 | Redis database number (0-15) |
Background Processing (Bull Queue)
| Variable | Default | Description |
|----------|---------|-------------|
| CONSUMER_CONCURRENCY | 1 | Number of concurrent jobs to process |
| PROCESSING_TIMEOUT | 300000 | Job timeout in milliseconds (default: 5 minutes) |
| STARTUP_BATCH_ENABLED | true | Enable batch processing of queued jobs on startup |
| STARTUP_BATCH_LIMIT | 50 | Maximum jobs to process in startup batch |
Note: Background processing requires a running Redis server. Use REDIS_URL for simple setups or individual settings (REDIS_HOST, REDIS_PORT, REDIS_PASSWORD, REDIS_DB) for more control.
Logging
| Variable | Default | Description |
|----------|---------|-------------|
| LOG_LEVEL | info | Log level: debug, info, warn, error |
| LOG_FORMAT | json | Log format: json or text |
Architecture
┌─────────────────────────────────────────────────────────┐
│ Claude Code / MCP Client │
└──────────────────────┬──────────────────────────────────┘
│ MCP Protocol (JSON-RPC)
│
┌──────────────────────▼──────────────────────────────────┐
│ MCP Git Codebase Server │
│ ┌─────────────┬──────────────┬────────────────────┐ │
│ │ Tools │ Indexing │ Background Jobs │ │
│ │ (4 tools) │ Pipeline │ (Bull + Redis) │ │
│ └─────────────┴──────────────┴────────────────────┘ │
└──────────────────────┬──────────────────────────────────┘
│
┌──────────────┼──────────────┐
│ │ │
▼ ▼ ▼
Git Repo Vector Database Embedding Model
(local) (Qdrant/etc) (OpenAI/Ollama)Data Flow
Indexing Pipeline
- Extract code units (functions, classes, etc.) using tree-sitter parsers
- Generate embeddings via selected provider
- Store in vector database with metadata
- Track indexing state and checkpoints
Query Pipeline
- Convert query to embedding
- Perform vector similarity search
- Re-rank results with BM25/custom rerankers
- Return top matches with context
Background Processing
- Bull job queue backed by Redis
- Async job processing with retry logic
- Failed job persistence and recovery
Supported Languages
- TypeScript / JavaScript
- Python
- Go
- Java
- Rust
- C/C++ (via tree-sitter)
- Ruby
- PHP
- Kotlin
- Scala
- Swift
- Bash
- Robot Framework
Configuration Examples
# Start Qdrant (requires Docker)
docker run -p 6333:6333 qdrant/qdrant
# Set environment variables
export VECTOR_DB_PROVIDER=qdrant
export QDRANT_URL=http://localhost:6333
export EMBEDDING_PROVIDER=ollama
export OLLAMA_BASE_URL=http://localhost:11434
export OLLAMA_EMBEDDING_MODEL=nomic-embed-text
# Start server
npx @devpuccino/mcp-git-codebaseProduction with Pinecone
# Set environment variables
export VECTOR_DB_PROVIDER=pinecone
export PINECONE_API_KEY=your-production-key
export PINECONE_ENVIRONMENT=your-environment
export PINECONE_INDEX=prod-codebase
export EMBEDDING_PROVIDER=openai
export OPENAI_API_KEY=sk-...
export OPENAI_EMBEDDING_MODEL=text-embedding-3-small
# Start server
npx @devpuccino/mcp-git-codebasePostgreSQL with pgvector
# Set environment variables
export VECTOR_DB_PROVIDER=postgres
export DATABASE_URL=postgresql://user:password@host:5432/codebase_db
export EMBEDDING_PROVIDER=openai
export OPENAI_API_KEY=sk-...
export OPENAI_EMBEDDING_MODEL=text-embedding-3-small
# Start server
npx @devpuccino/mcp-git-codebasePerformance Considerations
- Embedding Generation: Largest cost factor (~100ms per code unit)
- Vector Search: Sub-100ms for typical queries
- Code Extraction: ~50-200ms per file depending on size
- Indexing Speed: ~1000-2000 code units per minute
Optimization Tips:
- Use
background=truefor large codebases - Set appropriate
CONSUMER_CONCURRENCYbased on resources - Implement incremental indexing via
update_codebase - Filter by
file_extensionsto reduce scope - Use higher
similarity_thresholdif too many results
Troubleshooting
Connection Issues
# Verify vector database is running
curl http://localhost:6333/health # Qdrant
curl http://localhost:8000/api/v1/heartbeat # Chroma
# Check logs
export LOG_LEVEL=debug
npx @devpuccino/mcp-git-codebaseEmbedding Model Issues
# Verify Ollama is running
curl http://localhost:11434/api/tags
# Or verify OpenAI API key
echo $OPENAI_API_KEYOut of Memory
- Reduce
CONSUMER_CONCURRENCY - Process smaller repositories first
- Enable
background=truefor large syncs
Development
# Clone repository
git clone https://github.com/devpuccino/mcp-git-codebase.git
cd mcp-git-codebase
# Install dependencies
npm install
# Build
npm run build
# Run tests
npm test
# Start in development mode
npm run devLicense
MIT
Support
For issues, questions, or feature requests, please visit the GitHub repository.
