npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2025 – Pkg Stats / Ryan Hefner

@gianged/cindex

v1.2.0

Published

Semantic code search and context retrieval MCP server for large codebases

Readme

cindex

Semantic code search and context retrieval for large codebases

A Model Context Protocol (MCP) server that provides intelligent code search and context retrieval for Claude Code. Handles 1M+ lines of code with accuracy-first design.

Features

  • Semantic Search - Vector embeddings for intelligent code discovery
  • Hybrid Search - Combines vector similarity with PostgreSQL full-text search for better natural language query handling
  • 9-Stage Retrieval Pipeline - Scope filtering → query → files → chunks → symbols → imports → APIs → dedup → assembly
  • Multi-Project Support - Monorepo, microservices, and reference repository indexing
  • Scope Filtering - Global, repository, service, and boundary-aware search modes
  • API Contract Search - Semantic search for REST/GraphQL/gRPC endpoints
  • Query Caching - LRU cache with 80%+ hit rate (cached queries ~50ms)
  • Progress Notifications - Real-time 9-stage pipeline tracking
  • Incremental Indexing - Only re-index changed files
  • Import Chain Analysis - Automatic dependency resolution
  • Deduplication - Remove duplicate utility functions
  • Large Codebase Support - Efficiently handles 1M+ LoC
  • Claude Code Integration - Native MCP server with 17 tools
  • Accuracy-First - Default settings optimized for relevance
  • Configurable Models - Swap embedding/LLM models via env vars

Performance

  • Indexing Speed: 300-600 files/min (with LLM summaries)
  • Query Speed: First query ~800ms, cached queries ~50ms
  • Cache Hit Rate: 80%+ for repeated queries
  • Codebase Scale: Efficiently handles 1M+ lines of code
  • Memory Efficient: LRU caching with configurable limits
  • Real-Time Progress: 9-stage pipeline notifications

Supported Languages

12 languages with full tree-sitter parsing: TypeScript, JavaScript, Python, Java, Go, Rust, C, C++, C#, PHP, Ruby, Kotlin. Swift and other languages use regex fallback parsing.

Prerequisites

Before installing cindex, you need:

1. PostgreSQL with pgvector

PostgreSQL 16+ with pgvector extension for vector similarity search:

# Ubuntu/Debian
sudo apt install postgresql-16 postgresql-16-pgvector

# macOS
brew install postgresql@16 pgvector

# Start PostgreSQL
sudo systemctl start postgresql  # Linux
brew services start postgresql@16  # macOS

2. Ollama with Models

Ollama for local LLM inference with two models:

Embedding Model (for vector generation):

# Install Ollama
curl https://ollama.ai/install.sh | sh

# Pull embedding model (bge-m3:567m recommended)
ollama pull bge-m3:567m

Coding Model (for file summaries and analysis):

# Pull coding model (qwen2.5-coder:7b recommended)
ollama pull qwen2.5-coder:7b

# Alternative for faster indexing (lower quality):
# ollama pull qwen2.5-coder:1.5b

Model Options:

  • Embedding: bge-m3:567m (1024 dims, 8K context) - Best accuracy
  • Summary: qwen2.5-coder:7b (32K context) - High quality, RTX 4060+ recommended
  • Summary: qwen2.5-coder:3b (32K context) - Balanced
  • Summary: qwen2.5-coder:1.5b (32K context) - Fast indexing, lower quality

Installation

Database Setup

Create and initialize the cindex database:

# Create database
createdb cindex_rag_codebase

# Initialize schema (after installing cindex - see next section)

Install MCP Server

Add cindex to Claude Code using the CLI. You can install for personal use (user scope) or share with your team (project scope).

Quick Install (Personal Use)

Install for all your projects:

claude mcp add cindex --scope user --transport stdio \
  --env POSTGRES_PASSWORD="your_password" \
  -- npx -y @gianged/cindex

Team Install (Shared via Git)

Install for the current project (creates .mcp.json in project root):

claude mcp add cindex --scope project --transport stdio \
  --env POSTGRES_PASSWORD="your_password" \
  -- npx -y @gianged/cindex

Note: For project scope, set POSTGRES_PASSWORD as an environment variable on your system and reference it in the command. Never commit actual secrets to version control.

Custom Configuration

Add additional environment variables using multiple --env flags:

claude mcp add cindex --scope user --transport stdio \
  --env POSTGRES_PASSWORD="your_password" \
  --env POSTGRES_HOST="localhost" \
  --env POSTGRES_DB="cindex_rag_codebase" \
  --env EMBEDDING_MODEL="bge-m3:567m" \
  --env SUMMARY_MODEL="qwen2.5-coder:7b" \
  -- npx -y @gianged/cindex

See Environment Variables section below for all available configuration options.

Manual Configuration (Alternative)

If you prefer to manually edit configuration files, you can add cindex to:

User Scope (~/.claude.json):

{
  "mcpServers": {
    "cindex": {
      "type": "stdio",
      "command": "npx",
      "args": ["-y", "@gianged/cindex"],
      "env": {
        "POSTGRES_PASSWORD": "your_password"
      }
    }
  }
}

Project Scope (.mcp.json in project root):

{
  "mcpServers": {
    "cindex": {
      "type": "stdio",
      "command": "npx",
      "args": ["-y", "@gianged/cindex"],
      "env": {
        "POSTGRES_HOST": "${POSTGRES_HOST:-localhost}",
        "POSTGRES_PORT": "${POSTGRES_PORT:-5432}",
        "POSTGRES_DB": "${POSTGRES_DB:-cindex_rag_codebase}",
        "POSTGRES_USER": "${POSTGRES_USER:-postgres}",
        "POSTGRES_PASSWORD": "${POSTGRES_PASSWORD}"
      }
    }
  }
}

Initialize Database Schema

After configuring MCP, initialize the database schema:

# Download schema file
curl -o database.sql https://raw.githubusercontent.com/gianged/cindex/main/database.sql

# Apply schema
psql cindex_rag_codebase < database.sql

Start Using

  1. Open Claude Code
  2. Use the index_repository tool to index your codebase
  3. Use search_codebase to find relevant code

Environment Variables

All configuration is done through environment variables in your MCP config file.

Model Configuration

| Variable | Default | Range | Description | | -------------------------- | ------------------------ | ----------- | -------------------------------------------- | | EMBEDDING_MODEL | bge-m3:567m | - | Ollama embedding model for vector generation | | EMBEDDING_DIMENSIONS | 1024 | 1-4096 | Vector dimensions (must match model output) | | EMBEDDING_CONTEXT_WINDOW | 4096 | 512-131072 | Token limit for embedding model | | SUMMARY_MODEL | qwen2.5-coder:7b | - | Ollama model for file summaries | | SUMMARY_CONTEXT_WINDOW | 4096 | 512-131072 | Token limit for summary model | | OLLAMA_HOST | http://localhost:11434 | - | Ollama API endpoint | | OLLAMA_TIMEOUT | 30000 | 1000-300000 | Request timeout in milliseconds |

Context Window Notes:

  • Default 4096 matches Ollama's default and is sufficient (cindex uses first 100 lines per file)
  • Higher values = more VRAM usage + slower inference
  • qwen2.5-coder:7b supports up to 32K tokens
  • bge-m3:567m supports up to 8K tokens
  • Increase only if you encounter issues with large files

Database Configuration

| Variable | Default | Range | Description | | -------------------------- | --------------------- | ------- | ------------------------------- | | POSTGRES_HOST | localhost | - | PostgreSQL server hostname | | POSTGRES_PORT | 5432 | 1-65535 | PostgreSQL server port | | POSTGRES_DB | cindex_rag_codebase | - | Database name | | POSTGRES_USER | postgres | - | Database user | | POSTGRES_PASSWORD | required | - | Database password (must be set) | | POSTGRES_MAX_CONNECTIONS | 10 | 1-100 | Maximum connection pool size |

Performance Tuning

| Variable | Default | Range | Description | | ---------------------------- | ------- | ------- | ---------------------------------------------------- | | HNSW_EF_SEARCH | 300 | 10-1000 | HNSW search quality (higher = more accurate, slower) | | HNSW_EF_CONSTRUCTION | 200 | 10-1000 | HNSW index quality (higher = better index) | | SIMILARITY_THRESHOLD | 0.3 | 0.0-1.0 | Minimum similarity for file-level retrieval | | CHUNK_SIMILARITY_THRESHOLD | 0.2 | 0.0-1.0 | Minimum similarity for chunk-level retrieval | | DEDUP_THRESHOLD | 0.92 | 0.0-1.0 | Similarity threshold for deduplication | | HYBRID_VECTOR_WEIGHT | 0.7 | 0.0-1.0 | Weight for vector similarity in hybrid search | | HYBRID_KEYWORD_WEIGHT | 0.3 | 0.0-1.0 | Weight for keyword (BM25) score in hybrid search | | IMPORT_DEPTH | 3 | 1-10 | Maximum import chain traversal depth | | WORKSPACE_DEPTH | 2 | 1-10 | Maximum workspace dependency depth | | SERVICE_DEPTH | 1 | 1-10 | Maximum service dependency depth |

Indexing Configuration

| Variable | Default | Range | Description | | ------------------ | ------- | ---------- | ---------------------------------- | | MAX_FILE_SIZE | 5000 | 100-100000 | Maximum file size in lines | | INCLUDE_MARKDOWN | false | true/false | Include markdown files in indexing |

Feature Flags

| Variable | Default | Range | Description | | ------------------------------- | ------- | ---------- | --------------------------------------- | | ENABLE_WORKSPACE_DETECTION | true | true/false | Detect monorepo workspaces | | ENABLE_SERVICE_DETECTION | true | true/false | Detect microservices | | ENABLE_MULTI_REPO | false | true/false | Enable multi-repository support | | ENABLE_API_ENDPOINT_DETECTION | true | true/false | Parse API contracts (REST/GraphQL/gRPC) | | ENABLE_HYBRID_SEARCH | true | true/false | Combine vector + full-text search |

Example Configurations

Minimal Configuration

Only the required password:

{
  "mcpServers": {
    "cindex": {
      "type": "stdio",
      "command": "npx",
      "args": ["-y", "@gianged/cindex"],
      "env": {
        "POSTGRES_PASSWORD": "your_password"
      }
    }
  }
}

Full Configuration

All available settings with defaults shown:

{
  "mcpServers": {
    "cindex": {
      "type": "stdio",
      "command": "npx",
      "args": ["-y", "@gianged/cindex"],
      "env": {
        "EMBEDDING_MODEL": "bge-m3:567m",
        "EMBEDDING_DIMENSIONS": "1024",
        "EMBEDDING_CONTEXT_WINDOW": "4096",
        "SUMMARY_MODEL": "qwen2.5-coder:7b",
        "SUMMARY_CONTEXT_WINDOW": "4096",
        "OLLAMA_HOST": "http://localhost:11434",
        "POSTGRES_HOST": "localhost",
        "POSTGRES_PORT": "5432",
        "POSTGRES_DB": "cindex_rag_codebase",
        "POSTGRES_USER": "postgres",
        "POSTGRES_PASSWORD": "your_password",
        "HNSW_EF_SEARCH": "300",
        "HNSW_EF_CONSTRUCTION": "200",
        "SIMILARITY_THRESHOLD": "0.3",
        "CHUNK_SIMILARITY_THRESHOLD": "0.2",
        "DEDUP_THRESHOLD": "0.92"
      }
    }
  }
}

Speed-First Configuration

For faster indexing with lower quality:

{
  "mcpServers": {
    "cindex": {
      "type": "stdio",
      "command": "npx",
      "args": ["-y", "@gianged/cindex"],
      "env": {
        "POSTGRES_PASSWORD": "your_password",
        "SUMMARY_MODEL": "qwen2.5-coder:1.5b",
        "SUMMARY_CONTEXT_WINDOW": "4096",
        "HNSW_EF_SEARCH": "100",
        "HNSW_EF_CONSTRUCTION": "64",
        "SIMILARITY_THRESHOLD": "0.4",
        "CHUNK_SIMILARITY_THRESHOLD": "0.25",
        "DEDUP_THRESHOLD": "0.95"
      }
    }
  }
}

Performance:

  • Indexing: 500-1000 files/min (vs 300-600 files/min default)
  • Query Time: <500ms (vs <800ms default)
  • Relevance: >85% in top 10 results (vs >92% default)

Recommended Settings

RTX 4060 / 8GB VRAM (Tested Configuration)

| Setting | Value | Notes | | ---------------------------- | ------------------ | ---------------------------------- | | EMBEDDING_MODEL | bge-m3:567m | Best accuracy/speed balance | | SUMMARY_MODEL | qwen2.5-coder:7b | Good summaries, fits in VRAM | | EMBEDDING_CONTEXT_WINDOW | 4096 | Default, sufficient for most files | | HNSW_EF_SEARCH | 300 | High accuracy retrieval | | SIMILARITY_THRESHOLD | 0.3 | File-level retrieval threshold | | CHUNK_SIMILARITY_THRESHOLD | 0.2 | Chunk-level retrieval threshold | | DEDUP_THRESHOLD | 0.92 | Prevent duplicate results |

Performance Expectations

  • Indexing: ~30 files/min (~70 chunks/min)
  • Search: <1 second per query
  • Codebase: Tested with 40k LoC (112 files)

Managing Configuration

Verify Installation

List all installed MCP servers:

claude mcp list

View cindex configuration:

claude mcp get cindex

Update Configuration

To update environment variables, remove and re-add with new settings:

claude mcp remove cindex
claude mcp add cindex --scope user --transport stdio \
  --env POSTGRES_PASSWORD="your_password" \
  --env SUMMARY_MODEL="qwen2.5-coder:3b" \
  -- npx -y @gianged/cindex

Switch to Speed-First Mode

For faster indexing with lower quality, use these settings:

claude mcp remove cindex
claude mcp add cindex --scope user --transport stdio \
  --env POSTGRES_PASSWORD="your_password" \
  --env SUMMARY_MODEL="qwen2.5-coder:1.5b" \
  --env HNSW_EF_SEARCH="100" \
  --env HNSW_EF_CONSTRUCTION="64" \
  --env SIMILARITY_THRESHOLD="0.4" \
  --env CHUNK_SIMILARITY_THRESHOLD="0.25" \
  --env DEDUP_THRESHOLD="0.95" \
  -- npx -y @gianged/cindex

Performance:

  • Indexing: 500-1000 files/min (vs 300-600 files/min default)
  • Query Time: <500ms (vs <800ms default)
  • Relevance: >85% in top 10 results (vs >92% default)

Remove Server

claude mcp remove cindex

MCP Tools

Status: 17 of 17 tools implemented

All tools provide structured output with syntax highlighting and comprehensive metadata.

Core Search Tools

search_codebase

Semantic code search with multi-stage retrieval and dependency analysis.

Parameters:

  • query (required) - Natural language search query
  • scope - Search scope: 'global', 'repository', 'service', or 'workspace'
  • repo_id - Filter by repository ID
  • service_id - Filter by service ID
  • workspace_id - Filter by workspace ID
  • max_results - Maximum results (1-100, default: 20)
  • similarity_threshold - Minimum similarity (0.0-1.0, default: 0.75)
  • include_dependencies - Include imported dependencies (default: false)

Returns: Markdown-formatted results with file paths, line numbers, code snippets, and relevance scores.

get_file_context

Get complete context for a specific file including callers, callees, and import chain.

Parameters:

  • file_path (required) - Absolute or relative file path
  • repo_id - Repository ID (optional if file path is unique)
  • include_callers - Include functions that call this file (default: true)
  • include_callees - Include functions called by this file (default: true)
  • include_imports - Include import chain (default: true)
  • max_depth - Import chain depth (1-5, default: 2)

Returns: File summary, symbols, dependencies, and related code context.

find_symbol_definition

Locate symbol definitions and optionally show usages across the codebase.

Parameters:

  • symbol_name (required) - Function, class, or variable name
  • repo_id - Filter by repository ID
  • file_path - Filter by file path
  • symbol_type - Filter by type: 'function', 'class', 'variable', 'interface', etc.
  • include_usages - Show where symbol is used (default: false)
  • max_usages - Maximum usage results (1-100, default: 50)

Returns: Symbol definitions with file paths, line numbers, signatures, and optional usage locations.

Repository Management Tools

index_repository

Index or re-index a repository with progress notifications and multi-project support.

Parameters:

  • repo_path (required) - Absolute path to repository root
  • repo_id - Repository identifier (default: directory name)
  • repo_type - Repository type: 'monolithic', 'microservice', 'monorepo', 'library', 'reference', or 'documentation'
  • force_reindex - Force full re-index (default: false, uses incremental indexing)
  • detect_workspaces - Detect monorepo workspaces (default: true)
  • detect_services - Detect microservices (default: true)
  • detect_api_endpoints - Parse API contracts (default: true)
  • service_config - Manual service configuration (optional)
  • version - Repository version for reference repos (e.g., 'v10.3.0')
  • metadata - Additional metadata (e.g., { upstream_url: '...' })

Returns: Indexing statistics including files indexed, chunks created, symbols extracted, workspaces/services detected, and timing information.

delete_repository

Delete one or more indexed repositories and all associated data.

Parameters:

  • repo_ids (required) - Array of repository IDs to delete

Returns: Deletion confirmation with statistics (files, chunks, symbols, workspaces, services removed).

list_indexed_repos

List all indexed repositories with optional metadata, workspace counts, and service counts.

Parameters:

  • include_metadata - Include repository metadata (default: true)
  • include_workspace_count - Include workspace count for monorepos (default: true)
  • include_service_count - Include service count for microservices (default: true)
  • repo_type_filter - Filter by repository type

Returns: List of repositories with IDs, types, file counts, last indexed time, and optional metadata.

Monorepo Tools

list_workspaces

List all workspaces in indexed repositories for monorepo support.

Parameters:

  • repo_id - Filter by repository ID (optional)
  • include_dependencies - Include dependency information (default: false)
  • include_metadata - Include package.json metadata (default: false)

Returns: List of workspaces with package names, paths, file counts, and optional dependencies.

get_workspace_context

Get full context for a workspace including dependencies and dependents.

Parameters:

  • workspace_id - Workspace ID (use list_workspaces to find)
  • package_name - Package name (alternative to workspace_id)
  • repo_id - Repository ID (required if using package_name)
  • include_dependencies - Include workspace dependencies (default: true)
  • include_dependents - Include workspaces that depend on this one (default: true)
  • dependency_depth - Dependency tree depth (1-5, default: 2)

Returns: Workspace metadata, dependency tree, dependent workspaces, and file list.

find_cross_workspace_usages

Find workspace package usages across the monorepo.

Parameters:

  • workspace_id - Source workspace ID
  • package_name - Source package name (alternative to workspace_id)
  • symbol_name - Specific symbol to track (optional)
  • include_indirect - Include indirect usages (default: false)
  • max_depth - Dependency chain depth (1-5, default: 2)

Returns: List of workspaces using the target package/symbol with file locations.

Microservice Tools

list_services

List all services across indexed repositories for microservice support.

Parameters:

  • repo_id - Filter by repository ID (optional)
  • service_type - Filter by type: 'docker', 'serverless', 'mobile' (optional)
  • include_dependencies - Include service dependencies (default: false)
  • include_api_endpoints - Include API endpoint counts (default: false)

Returns: List of services with IDs, names, types, file counts, and optional API information.

get_service_context

Get full context for a service including API contracts and dependencies.

Parameters:

  • service_id - Service ID (use list_services to find)
  • service_name - Service name (alternative to service_id)
  • repo_id - Repository ID (required if using service_name)
  • include_dependencies - Include service dependencies (default: true)
  • include_dependents - Include services that depend on this one (default: true)
  • include_api_contracts - Include API endpoint definitions (default: true)
  • dependency_depth - Dependency tree depth (1-5, default: 1)

Returns: Service metadata, API contracts (REST/GraphQL/gRPC), dependency graph, and file list.

find_cross_service_calls

Find inter-service API calls across microservices.

Parameters:

  • source_service_id - Source service ID (optional)
  • target_service_id - Target service ID (optional)
  • endpoint_pattern - Endpoint regex pattern (e.g., /api/users/.*, optional)
  • include_reverse - Also show calls in reverse direction (default: false)

Returns: List of inter-service API calls with endpoints, HTTP methods, and call counts.

API Contract Tools

search_api_contracts

Search API endpoints across services with semantic understanding.

Parameters:

  • query (required) - API search query (e.g., "user authentication endpoint")
  • api_types - Filter by type: ['rest', 'graphql', 'grpc'] (default: all)
  • service_filter - Filter by service IDs (optional)
  • repo_filter - Filter by repository IDs (optional)
  • include_deprecated - Include deprecated endpoints (default: false)
  • max_results - Maximum results (1-100, default: 20)
  • similarity_threshold - Minimum similarity (0.0-1.0, default: 0.70)

Returns: API endpoints with paths, HTTP methods, service names, implementation files, and similarity scores.

Reference & Documentation Tools

Tools for searching reference materials including markdown documentation (syntax references, Context7-fetched docs) AND reference repository code (indexed frameworks/libraries).

index_documentation

Index markdown files for documentation search. Works with explicit paths only.

Parameters:

  • paths (required) - Array of file or directory paths to index (e.g., ['syntax.md', '/docs/libraries/'])
  • doc_id - Document identifier (default: derived from path)
  • tags - Tags for filtering (e.g., ['typescript', 'react'])
  • force_reindex - Force re-index even if unchanged (default: false)

Returns: Indexing statistics including files indexed, sections created, code blocks extracted, and timing.

Workflow:

  1. Fetch documentation (e.g., from Context7)
  2. Save to markdown file
  3. Index with index_documentation
  4. Search with search_references

search_references

Search reference materials including markdown documentation AND reference repository code. Combines both sources for comprehensive reference search.

Parameters:

  • query (required) - Natural language search query
  • doc_ids - Filter by document IDs (optional)
  • tags - Filter by documentation tags (optional)
  • include_docs - Include markdown documentation results (default: true)
  • include_code - Include reference repository code results (default: true)
  • max_results - Maximum results per source (1-50, default: 10)
  • include_code_blocks - Include code blocks from documentation (default: true)
  • similarity_threshold - Minimum similarity (0.0-1.0, default: 0.65)

Returns: Combined results from both documentation chunks and reference repository code, with heading breadcrumbs, content snippets, code blocks, file paths, and relevance scores.

Note: Reference repositories are indexed using index_repository with repo_type: 'reference'. They are excluded from search_codebase by default and only searchable via search_references.

list_documentation

List all indexed documentation with metadata.

Parameters:

  • doc_ids - Filter by document IDs (optional)
  • tags - Filter by tags (optional)

Returns: List of indexed documents with file counts, section counts, code block counts, and indexed timestamps.

delete_documentation

Delete indexed documentation by document ID.

Parameters:

  • doc_ids (required) - Array of document IDs to delete

Returns: Deletion confirmation with chunks and files removed.


See docs/overview.md for complete tool documentation including multi-project/monorepo/microservice architecture details.

Architecture

Hybrid Search

Combines vector similarity search with PostgreSQL full-text search (tsvector/ts_rank_cd) for improved natural language query handling:

hybrid_score = (0.7 * vector_similarity) + (0.3 * keyword_score)
  • Vector search - Semantic understanding via embeddings
  • Keyword search - Exact term matching via PostgreSQL full-text search
  • Configurable weights via HYBRID_VECTOR_WEIGHT and HYBRID_KEYWORD_WEIGHT
  • Disable with ENABLE_HYBRID_SEARCH=false to use vector-only search

Multi-Stage Retrieval

  1. File-Level - Find relevant files via summary embeddings + full-text search
  2. Chunk-Level - Locate specific code chunks (functions/classes)
  3. Symbol Resolution - Resolve imported symbols and dependencies
  4. Import Expansion - Build dependency graph (max 3 levels)
  5. Deduplication - Remove redundant code from results

Indexing Pipeline

  1. File discovery (respects .gitignore)
  2. Tree-sitter parsing (with regex fallback)
  3. Semantic chunking (functions, classes, blocks)
  4. LLM-based file summaries (configurable model)
  5. Embedding generation (configurable model)
  6. Full-text search vector generation (tsvector)
  7. PostgreSQL + pgvector storage

Performance Characteristics

Accuracy-First Mode (Default)

  • Indexing: 300-600 files/min
  • Query Time: <800ms
  • Relevance: >92% in top 10 results
  • Context Noise: <2%

Speed-First Mode

  • Indexing: 500-1000 files/min
  • Query Time: <500ms
  • Relevance: >85% in top 10 results

System Requirements

  • Node.js 22+ (for MCP server)
  • PostgreSQL 16+ with pgvector extension
  • Ollama with models installed
  • Disk Space: ~1GB per 100k LoC indexed
  • RAM: 8GB minimum (16GB+ recommended for large codebases)
  • GPU: Optional but recommended (RTX 3060+ for qwen2.5-coder:7b)

Troubleshooting

"Vector dimension mismatch"

Update EMBEDDING_DIMENSIONS in MCP config to match your model, then update vector dimensions in database.sql.

"Connection refused" to PostgreSQL

Check POSTGRES_HOST and POSTGRES_PORT in MCP config. Verify PostgreSQL is running:

sudo systemctl status postgresql  # Linux
brew services list  # macOS

"Model not found" in Ollama

Pull the required models:

ollama pull bge-m3:567m
ollama pull qwen2.5-coder:7b

Verify models are available:

ollama list

Slow indexing

  • Use smaller summary model: qwen2.5-coder:1.5b instead of 7b
  • Reduce HNSW_EF_CONSTRUCTION to 64
  • Enable incremental indexing (default)

Low accuracy results

  • Increase HNSW_EF_SEARCH to 300-400
  • Raise SIMILARITY_THRESHOLD to 0.4-0.5 for stricter file matching
  • Raise CHUNK_SIMILARITY_THRESHOLD to 0.3-0.4 for stricter chunk matching
  • Use better summary model: qwen2.5-coder:3b or 7b
  • Lower DEDUP_THRESHOLD to 0.90-0.92

Documentation

See docs/overview.md for detailed documentation including:

  • Complete architecture details
  • Database schema
  • Configuration reference
  • Implementation guide
  • Performance tuning

Development

git clone https://github.com/gianged/cindex.git
cd cindex
npm install
npm run build
npm test

Implementation Status

  • Phase 1 (100%) - Database schema & type system
  • Phase 2 (100%) - File discovery, parsing, chunking, workspace/service detection
  • Phase 3 (100%) - Embeddings, summaries, API parsing, 12-language support, Docker/serverless/mobile detection
  • Phase 4 (100%) - Multi-stage retrieval pipeline (9-stage)
  • Phase 5 (100%) - MCP tools (17 of 17 implemented)
  • Phase 6 (100%) - Incremental indexing, optimization, testing

Overall: 100% complete

License

MIT

Author

gianged - Yup, it's me

Contributing

Contributions welcome! Please open an issue or PR on GitHub.

Acknowledgments

Built with: