@wildcard-ai/deepcodex

v0.0.8

Published

8 months ago

Advanced codebase indexing and semantic search MCP server

0High
0Medium
0Low

kaushik-wildcard

smahorker

mcp claude-code semantic-search codebase-indexing ast-parsing vector-search jina-ai turbopuffer

Intelligent Context MCP

A comprehensive Model Context Protocol (MCP) server by Wildcard Corporation that provides advanced codebase indexing and semantic search with slash commands and natural language interface for Claude Code.

🎯 Features

🚀 Dual Interface Options

Slash Commands: /index, /search, /status, /clear, /context, /deps, /help
Natural Language: "Find authentication functions", "Index my codebase"
Traditional MCP Tools: Direct tool calls for programmatic access

🧠 Advanced Intelligence

Tree-sitter semantic chunking - AST-based code chunking creating meaningful units (complete functions, classes, interfaces)
Smart token limit handling - Automatic content truncation with intelligent boundary detection for API limits
Content quality filtering - Excludes test files, generated code, and low-quality content
Dependency graph analysis - Cross-file relationship mapping and context expansion
Incremental indexing - Only re-indexes changed files and dependencies
Multi-strategy search - Semantic, hybrid, BM25, and structural search options

🤖 AI-Powered Features

Result Reranking - Jina reranker-v2 optimizes relevance scoring
Local BM25 Search - SQLite full-text search for exact keyword matching
Hybrid Fusion - Combines vector similarity with BM25 for best results

🔧 Production Ready

Real API integration - Jina AI embeddings + Turbopuffer vector storage
Multi-language support - 30+ programming languages with Tree-sitter parsing
Error handling - Graceful degradation and comprehensive error reporting
Performance optimized - Batch processing and intelligent caching

🚀 Installation & Setup

1. Install Dependencies

npm install
npm run build

2. Get API Keys

Required APIs:

Jina AI: Get your API key from Jina AI - For embeddings and reranking
Turbopuffer: Get your API key from Turbopuffer - For vector storage

3. Add to Claude Code

# Install via npm
npm install -g @wildcard-corp/intelligent-context-mcp

# Add to Claude Code
claude mcp add intelligent-context \\
  -e JINA_API_KEY=your-jina-key \\
  -e TURBOPUFFER_API_KEY=your-turbopuffer-key \\
  -- npx @wildcard-corp/intelligent-context-mcp

🎮 Usage

Slash Commands (Recommended)

Execute commands using slash syntax:

# Index your codebase with intelligent chunking
/index /path/to/your/project

# Search with semantic understanding
/search authentication implementation
/search user registration flow  
/search database connection setup

# Check indexing status
/status
/status /specific/project/path

# Get focused context for specific files or symbols
/context src/auth.js --with-deps
/context UserService --window=10

# Analyze dependencies
/deps src/user.js --reverse
/deps AuthController --graph

# Clear index data
/clear --confirm
/clear /path/to/project --confirm

# Get help
/help
/help search

Natural Language Interface

Use conversational queries:

"Find all authentication functions"
"Show me the user registration flow"
"Index my codebase at /path/to/project"
"What's the status of my index?"
"Get context for the login function"

Traditional MCP Tools

Direct tool calls for programmatic access:

execute_slash_command - Execute any slash command
natural_language_query - Process natural language queries
index_codebase_intelligent - Direct indexing (legacy)
search_with_intelligence - Direct search (legacy)

⚙️ Configuration

Environment Variables

| Variable | Required | Description | |----------|----------|-------------| | JINA_API_KEY | ✅ | Your Jina AI API key for embeddings | | TURBOPUFFER_API_KEY | ✅ | Your Turbopuffer API key for vector storage | | LOG_LEVEL | ❌ | Logging level: debug, info, warn, error (default: info) | | CODEX_CONTEXT_DATA_DIR | ❌ | Data storage directory (default: ~/.codex-context) |

API Integration Details

Jina AI Embeddings: Uses jina-embeddings-v3 model with 1024 dimensions
Turbopuffer Storage: Vector storage with cosine distance similarity
Tree-sitter Parsing: AST parsing for accurate symbol extraction

🏗️ Architecture

Enhanced Two-Layer Design

graph TD
    A[Claude Code CLI] --> B[Enhanced MCP Interface]
    B --> C{Interface Type}
    C -->|Slash Commands| D[Command Parser]
    C -->|Natural Language| E[NL Interpreter] 
    C -->|Direct Tools| F[Tool Handler]
    D --> G[Integration Bridge]
    E --> G
    F --> G
    G --> H[Core Components]
    H --> I[IndexingOrchestrator]
    H --> J[SemanticSearchEngine]
    H --> K[TreeSitterSymbolExtractor]
    I --> L[Real APIs]
    J --> L
    K --> L

Core Components

Enhanced MCP Interface (src/enhanced-mcp.ts)
- Primary interface with slash commands and natural language
- MCP protocol handling and state management
- Command registry with extensible architecture
Integration Bridge (src/standalone-mcp-integration.ts)
- Connects interface layer with core components
- Real API integration (Jina AI + Turbopuffer)
- Data format conversion and error handling
Core Intelligence Components (src/core/)
- IndexingOrchestrator: Advanced codebase indexing
- SemanticSearchEngine: Multi-strategy intelligent search
- TreeSitterSymbolExtractor: AST-based symbol extraction
- ContentFilterProvider: Quality filtering and noise reduction
- IncrementalIndexer: Efficient change detection and updates

📊 Performance & Quality

Indexing Performance

Symbol-boundary chunking: No arbitrary line splits
Content filtering: Excludes ~40% of noise (tests, configs, generated code)
Incremental updates: Only processes changed files
Batch processing: Optimized API calls and vector uploads

Search Quality

Multi-stage ranking: Vector similarity + optional reranking
Dependency expansion: Finds related code across file boundaries
Context windows: Configurable context around matches
Symbol awareness: Understands code structure and relationships

🔍 Examples

Indexing a React Project

/index /path/to/react-app --force
# ✅ Successfully indexed 1,247 files into 3,821 intelligent chunks
# 🔍 Ready for intelligent search with `/search <query>`

Finding Authentication Code

/search user authentication login
# 🔍 Found 15 results (234ms):
# 
# **src/auth/AuthService.ts:45-67** (0.923)
# ```typescript
# async authenticateUser(credentials: LoginCredentials): Promise<AuthResult> {
#   const user = await this.userRepository.findByEmail(credentials.email);
#   if (!user || !await this.verifyPassword(credentials.password, user.hashedPassword)) {
#     throw new UnauthorizedError('Invalid credentials');
#   }
#   return this.generateAuthTokens(user);
# }
# ```

Getting Context with Dependencies

/context src/auth/AuthService.ts --with-deps
# 📋 Context for AuthService with dependencies:
# - Depends on: UserRepository, TokenService, PasswordHasher
# - Used by: LoginController, SignupController, AuthMiddleware
# - Related symbols: authenticateUser, verifyPassword, generateTokens

🎉 Benefits

For Developers

Intuitive interface with familiar slash commands
Natural language queries for non-technical stakeholders
Comprehensive context with dependency awareness
Fast search with semantic understanding

For Teams

Consistent indexing with quality filtering
Cross-file understanding via dependency analysis
Incremental updates for active development
Production ready with robust error handling

🚀 Advanced Usage

Custom Search Strategies

# Semantic search for concepts
/search "how does authentication work" --type=semantic

# Structural search for patterns  
/search "function.*login.*password" --type=structural

# Hybrid search combining both
/search authentication --type=hybrid

Dependency Analysis

# Find all files that depend on AuthService
/deps AuthService --reverse

# Show dependency graph for authentication module
/deps src/auth/ --graph --depth=3

Context Windows

# Get 20 lines of context around matches
/search database connection --window=20

# Focus on specific symbol types
/context UserService --focus=functions --with-deps

🛠️ Development

Project Structure

├── src/                    # Source code
│   ├── core/              # Core indexing and search logic
│   ├── services/          # Search and utility services  
│   ├── utils/             # Utilities (Logger, FileUtils, etc.)
│   └── types/             # TypeScript type definitions
├── tests/                 # Essential test files
├── scripts/               # Utility scripts
└── dist/                  # Compiled JavaScript output

Running Tests

# Set environment variables
export JINA_API_KEY="your_jina_api_key"
export TURBOPUFFER_API_KEY="your_turbopuffer_key"

# Test semantic chunking quality
node tests/final-chunking-validation.mjs

# Test search integration end-to-end
node tests/test-improved-search-quality.mjs

# Test search result quality directly
node tests/test-search-results-direct.mjs

# Test MCP server functionality  
node tests/test-mcp-tools-directly.mjs

Key Components

TreeSitterChunkExtractor - AST-based semantic code chunking
HybridSearchService - Vector + BM25 search fusion
StandaloneCodexMcp - Main MCP server integration
IndexingOrchestrator - Coordinates the indexing pipeline

🎯 What Makes This Special

Unlike simple text search tools, this MCP provides:

True code understanding via AST parsing and symbol extraction
Quality-focused indexing that filters out noise automatically
Dependency-aware search that finds related code across files
Multiple interfaces to suit different user preferences
Production deployment with real API integrations
Incremental efficiency for large, active codebases

Perfect for teams using Claude Code who need intelligent codebase exploration and context-aware development assistance.

Ready to enhance your codebase exploration with Claude Code! 🚀