@gianged/cindex

v1.2.0

Published

2 months ago

Semantic code search and context retrieval MCP server for large codebases

0High
0Medium
0Low

mcp model-context-protocol claude code-search semantic-search rag vector-search embeddings codebase-indexing claude-code llm ai developer-tools

cindex

Semantic code search and context retrieval for large codebases

A Model Context Protocol (MCP) server that provides intelligent code search and context retrieval for Claude Code. Handles 1M+ lines of code with accuracy-first design.

Features

Semantic Search - Vector embeddings for intelligent code discovery
Hybrid Search - Combines vector similarity with PostgreSQL full-text search for better natural language query handling
9-Stage Retrieval Pipeline - Scope filtering → query → files → chunks → symbols → imports → APIs → dedup → assembly
Multi-Project Support - Monorepo, microservices, and reference repository indexing
Scope Filtering - Global, repository, service, and boundary-aware search modes
API Contract Search - Semantic search for REST/GraphQL/gRPC endpoints
Query Caching - LRU cache with 80%+ hit rate (cached queries ~50ms)
Progress Notifications - Real-time 9-stage pipeline tracking
Incremental Indexing - Only re-index changed files
Import Chain Analysis - Automatic dependency resolution
Deduplication - Remove duplicate utility functions
Large Codebase Support - Efficiently handles 1M+ LoC
Claude Code Integration - Native MCP server with 17 tools
Accuracy-First - Default settings optimized for relevance
Configurable Models - Swap embedding/LLM models via env vars

Performance

Indexing Speed: 300-600 files/min (with LLM summaries)
Query Speed: First query ~800ms, cached queries ~50ms
Cache Hit Rate: 80%+ for repeated queries
Codebase Scale: Efficiently handles 1M+ lines of code
Memory Efficient: LRU caching with configurable limits
Real-Time Progress: 9-stage pipeline notifications

Supported Languages

12 languages with full tree-sitter parsing: TypeScript, JavaScript, Python, Java, Go, Rust, C, C++, C#, PHP, Ruby, Kotlin. Swift and other languages use regex fallback parsing.

Prerequisites

Before installing cindex, you need:

1. PostgreSQL with pgvector

PostgreSQL 16+ with pgvector extension for vector similarity search:

# Ubuntu/Debian
sudo apt install postgresql-16 postgresql-16-pgvector

# macOS
brew install postgresql@16 pgvector

# Start PostgreSQL
sudo systemctl start postgresql  # Linux
brew services start postgresql@16  # macOS

2. Ollama with Models

Ollama for local LLM inference with two models:

Embedding Model (for vector generation):

# Install Ollama
curl https://ollama.ai/install.sh | sh

# Pull embedding model (bge-m3:567m recommended)
ollama pull bge-m3:567m

Coding Model (for file summaries and analysis):

# Pull coding model (qwen2.5-coder:7b recommended)
ollama pull qwen2.5-coder:7b

# Alternative for faster indexing (lower quality):
# ollama pull qwen2.5-coder:1.5b

Model Options:

Embedding: bge-m3:567m (1024 dims, 8K context) - Best accuracy
Summary: qwen2.5-coder:7b (32K context) - High quality, RTX 4060+ recommended
Summary: qwen2.5-coder:3b (32K context) - Balanced
Summary: qwen2.5-coder:1.5b (32K context) - Fast indexing, lower quality

Installation

Database Setup

Create and initialize the cindex database:

# Create database
createdb cindex_rag_codebase

# Initialize schema (after installing cindex - see next section)

Install MCP Server

Add cindex to Claude Code using the CLI. You can install for personal use (user scope) or share with your team (project scope).

Quick Install (Personal Use)

Install for all your projects:

claude mcp add cindex --scope user --transport stdio \
  --env POSTGRES_PASSWORD="your_password" \
  -- npx -y @gianged/cindex

Team Install (Shared via Git)

Install for the current project (creates .mcp.json in project root):

claude mcp add cindex --scope project --transport stdio \
  --env POSTGRES_PASSWORD="your_password" \
  -- npx -y @gianged/cindex

Note: For project scope, set POSTGRES_PASSWORD as an environment variable on your system and reference it in the command. Never commit actual secrets to version control.

Custom Configuration

Add additional environment variables using multiple --env flags:

claude mcp add cindex --scope user --transport stdio \
  --env POSTGRES_PASSWORD="your_password" \
  --env POSTGRES_HOST="localhost" \
  --env POSTGRES_DB="cindex_rag_codebase" \
  --env EMBEDDING_MODEL="bge-m3:567m" \
  --env SUMMARY_MODEL="qwen2.5-coder:7b" \
  -- npx -y @gianged/cindex

See Environment Variables section below for all available configuration options.

Manual Configuration (Alternative)

If you prefer to manually edit configuration files, you can add cindex to:

User Scope (~/.claude.json):

{
  "mcpServers": {
    "cindex": {
      "type": "stdio",
      "command": "npx",
      "args": ["-y", "@gianged/cindex"],
      "env": {
        "POSTGRES_PASSWORD": "your_password"
      }
    }
  }
}

Project Scope (.mcp.json in project root):

{
  "mcpServers": {
    "cindex": {
      "type": "stdio",
      "command": "npx",
      "args": ["-y", "@gianged/cindex"],
      "env": {
        "POSTGRES_HOST": "${POSTGRES_HOST:-localhost}",
        "POSTGRES_PORT": "${POSTGRES_PORT:-5432}",
        "POSTGRES_DB": "${POSTGRES_DB:-cindex_rag_codebase}",
        "POSTGRES_USER": "${POSTGRES_USER:-postgres}",
        "POSTGRES_PASSWORD": "${POSTGRES_PASSWORD}"
      }
    }
  }
}

Initialize Database Schema

After configuring MCP, initialize the database schema:

# Download schema file
curl -o database.sql https://raw.githubusercontent.com/gianged/cindex/main/database.sql

# Apply schema
psql cindex_rag_codebase < database.sql

Start Using

Open Claude Code
Use the index_repository tool to index your codebase
Use search_codebase to find relevant code

Environment Variables

All configuration is done through environment variables in your MCP config file.

Model Configuration

| Variable | Default | Range | Description | | -------------------------- | ------------------------ | ----------- | -------------------------------------------- | | EMBEDDING_MODEL | bge-m3:567m | - | Ollama embedding model for vector generation | | EMBEDDING_DIMENSIONS | 1024 | 1-4096 | Vector dimensions (must match model output) | | EMBEDDING_CONTEXT_WINDOW | 4096 | 512-131072 | Token limit for embedding model | | SUMMARY_MODEL | qwen2.5-coder:7b | - | Ollama model for file summaries | | SUMMARY_CONTEXT_WINDOW | 4096 | 512-131072 | Token limit for summary model | | OLLAMA_HOST | http://localhost:11434 | - | Ollama API endpoint | | OLLAMA_TIMEOUT | 30000 | 1000-300000 | Request timeout in milliseconds |

Context Window Notes:

Default 4096 matches Ollama's default and is sufficient (cindex uses first 100 lines per file)
Higher values = more VRAM usage + slower inference
qwen2.5-coder:7b supports up to 32K tokens
bge-m3:567m supports up to 8K tokens
Increase only if you encounter issues with large files

Database Configuration

| Variable | Default | Range | Description | | -------------------------- | --------------------- | ------- | ------------------------------- | | POSTGRES_HOST | localhost | - | PostgreSQL server hostname | | POSTGRES_PORT | 5432 | 1-65535 | PostgreSQL server port | | POSTGRES_DB | cindex_rag_codebase | - | Database name | | POSTGRES_USER | postgres | - | Database user | | POSTGRES_PASSWORD | required | - | Database password (must be set) | | POSTGRES_MAX_CONNECTIONS | 10 | 1-100 | Maximum connection pool size |

Performance Tuning

| Variable | Default | Range | Description | | ---------------------------- | ------- | ------- | ---------------------------------------------------- | | HNSW_EF_SEARCH | 300 | 10-1000 | HNSW search quality (higher = more accurate, slower) | | HNSW_EF_CONSTRUCTION | 200 | 10-1000 | HNSW index quality (higher = better index) | | SIMILARITY_THRESHOLD | 0.3 | 0.0-1.0 | Minimum similarity for file-level retrieval | | CHUNK_SIMILARITY_THRESHOLD | 0.2 | 0.0-1.0 | Minimum similarity for chunk-level retrieval | | DEDUP_THRESHOLD | 0.92 | 0.0-1.0 | Similarity threshold for deduplication | | HYBRID_VECTOR_WEIGHT | 0.7 | 0.0-1.0 | Weight for vector similarity in hybrid search | | HYBRID_KEYWORD_WEIGHT | 0.3 | 0.0-1.0 | Weight for keyword (BM25) score in hybrid search | | IMPORT_DEPTH | 3 | 1-10 | Maximum import chain traversal depth | | WORKSPACE_DEPTH | 2 | 1-10 | Maximum workspace dependency depth | | SERVICE_DEPTH | 1 | 1-10 | Maximum service dependency depth |

Indexing Configuration

| Variable | Default | Range | Description | | ------------------ | ------- | ---------- | ---------------------------------- | | MAX_FILE_SIZE | 5000 | 100-100000 | Maximum file size in lines | | INCLUDE_MARKDOWN | false | true/false | Include markdown files in indexing |

Feature Flags

| Variable | Default | Range | Description | | ------------------------------- | ------- | ---------- | --------------------------------------- | | ENABLE_WORKSPACE_DETECTION | true | true/false | Detect monorepo workspaces | | ENABLE_SERVICE_DETECTION | true | true/false | Detect microservices | | ENABLE_MULTI_REPO | false | true/false | Enable multi-repository support | | ENABLE_API_ENDPOINT_DETECTION | true | true/false | Parse API contracts (REST/GraphQL/gRPC) | | ENABLE_HYBRID_SEARCH | true | true/false | Combine vector + full-text search |

Example Configurations

Minimal Configuration

Only the required password:

{
  "mcpServers": {
    "cindex": {
      "type": "stdio",
      "command": "npx",
      "args": ["-y", "@gianged/cindex"],
      "env": {
        "POSTGRES_PASSWORD": "your_password"
      }
    }
  }
}

Full Configuration

All available settings with defaults shown:

{
  "mcpServers": {
    "cindex": {
      "type": "stdio",
      "command": "npx",
      "args": ["-y", "@gianged/cindex"],
      "env": {
        "EMBEDDING_MODEL": "bge-m3:567m",
        "EMBEDDING_DIMENSIONS": "1024",
        "EMBEDDING_CONTEXT_WINDOW": "4096",
        "SUMMARY_MODEL": "qwen2.5-coder:7b",
        "SUMMARY_CONTEXT_WINDOW": "4096",
        "OLLAMA_HOST": "http://localhost:11434",
        "POSTGRES_HOST": "localhost",
        "POSTGRES_PORT": "5432",
        "POSTGRES_DB": "cindex_rag_codebase",
        "POSTGRES_USER": "postgres",
        "POSTGRES_PASSWORD": "your_password",
        "HNSW_EF_SEARCH": "300",
        "HNSW_EF_CONSTRUCTION": "200",
        "SIMILARITY_THRESHOLD": "0.3",
        "CHUNK_SIMILARITY_THRESHOLD": "0.2",
        "DEDUP_THRESHOLD": "0.92"
      }
    }
  }
}

Speed-First Configuration

For faster indexing with lower quality:

{
  "mcpServers": {
    "cindex": {
      "type": "stdio",
      "command": "npx",
      "args": ["-y", "@gianged/cindex"],
      "env": {
        "POSTGRES_PASSWORD": "your_password",
        "SUMMARY_MODEL": "qwen2.5-coder:1.5b",
        "SUMMARY_CONTEXT_WINDOW": "4096",
        "HNSW_EF_SEARCH": "100",
        "HNSW_EF_CONSTRUCTION": "64",
        "SIMILARITY_THRESHOLD": "0.4",
        "CHUNK_SIMILARITY_THRESHOLD": "0.25",
        "DEDUP_THRESHOLD": "0.95"
      }
    }
  }
}

Performance:

Indexing: 500-1000 files/min (vs 300-600 files/min default)
Query Time: <500ms (vs <800ms default)
Relevance: >85% in top 10 results (vs >92% default)

Recommended Settings

RTX 4060 / 8GB VRAM (Tested Configuration)

| Setting | Value | Notes | | ---------------------------- | ------------------ | ---------------------------------- | | EMBEDDING_MODEL | bge-m3:567m | Best accuracy/speed balance | | SUMMARY_MODEL | qwen2.5-coder:7b | Good summaries, fits in VRAM | | EMBEDDING_CONTEXT_WINDOW | 4096 | Default, sufficient for most files | | HNSW_EF_SEARCH | 300 | High accuracy retrieval | | SIMILARITY_THRESHOLD | 0.3 | File-level retrieval threshold | | CHUNK_SIMILARITY_THRESHOLD | 0.2 | Chunk-level retrieval threshold | | DEDUP_THRESHOLD | 0.92 | Prevent duplicate results |

Performance Expectations

Indexing: ~30 files/min (~70 chunks/min)
Search: <1 second per query
Codebase: Tested with 40k LoC (112 files)

Managing Configuration

Verify Installation

List all installed MCP servers:

claude mcp list

View cindex configuration:

claude mcp get cindex

Update Configuration

To update environment variables, remove and re-add with new settings:

claude mcp remove cindex
claude mcp add cindex --scope user --transport stdio \
  --env POSTGRES_PASSWORD="your_password" \
  --env SUMMARY_MODEL="qwen2.5-coder:3b" \
  -- npx -y @gianged/cindex

Switch to Speed-First Mode

For faster indexing with lower quality, use these settings:

claude mcp remove cindex
claude mcp add cindex --scope user --transport stdio \
  --env POSTGRES_PASSWORD="your_password" \
  --env SUMMARY_MODEL="qwen2.5-coder:1.5b" \
  --env HNSW_EF_SEARCH="100" \
  --env HNSW_EF_CONSTRUCTION="64" \
  --env SIMILARITY_THRESHOLD="0.4" \
  --env CHUNK_SIMILARITY_THRESHOLD="0.25" \
  --env DEDUP_THRESHOLD="0.95" \
  -- npx -y @gianged/cindex

Performance:

Indexing: 500-1000 files/min (vs 300-600 files/min default)
Query Time: <500ms (vs <800ms default)
Relevance: >85% in top 10 results (vs >92% default)

Remove Server

claude mcp remove cindex

MCP Tools

Status: 17 of 17 tools implemented

All tools provide structured output with syntax highlighting and comprehensive metadata.

Core Search Tools

`search_codebase`

Semantic code search with multi-stage retrieval and dependency analysis.

Parameters:

query (required) - Natural language search query
scope - Search scope: 'global', 'repository', 'service', or 'workspace'
repo_id - Filter by repository ID
service_id - Filter by service ID
workspace_id - Filter by workspace ID
max_results - Maximum results (1-100, default: 20)
similarity_threshold - Minimum similarity (0.0-1.0, default: 0.75)
include_dependencies - Include imported dependencies (default: false)

Returns: Markdown-formatted results with file paths, line numbers, code snippets, and relevance scores.

`get_file_context`

Get complete context for a specific file including callers, callees, and import chain.

Parameters:

file_path (required) - Absolute or relative file path
repo_id - Repository ID (optional if file path is unique)
include_callers - Include functions that call this file (default: true)
include_callees - Include functions called by this file (default: true)
include_imports - Include import chain (default: true)
max_depth - Import chain depth (1-5, default: 2)

Returns: File summary, symbols, dependencies, and related code context.

`find_symbol_definition`

Locate symbol definitions and optionally show usages across the codebase.

Parameters:

symbol_name (required) - Function, class, or variable name
repo_id - Filter by repository ID
file_path - Filter by file path
symbol_type - Filter by type: 'function', 'class', 'variable', 'interface', etc.
include_usages - Show where symbol is used (default: false)
max_usages - Maximum usage results (1-100, default: 50)

Returns: Symbol definitions with file paths, line numbers, signatures, and optional usage locations.

Repository Management Tools

`index_repository`

Index or re-index a repository with progress notifications and multi-project support.

Parameters:

repo_path (required) - Absolute path to repository root
repo_id - Repository identifier (default: directory name)
repo_type - Repository type: 'monolithic', 'microservice', 'monorepo', 'library', 'reference', or 'documentation'
force_reindex - Force full re-index (default: false, uses incremental indexing)
detect_workspaces - Detect monorepo workspaces (default: true)
detect_services - Detect microservices (default: true)
detect_api_endpoints - Parse API contracts (default: true)
service_config - Manual service configuration (optional)
version - Repository version for reference repos (e.g., 'v10.3.0')
metadata - Additional metadata (e.g., { upstream_url: '...' })

Returns: Indexing statistics including files indexed, chunks created, symbols extracted, workspaces/services detected, and timing information.

`delete_repository`

Delete one or more indexed repositories and all associated data.

Parameters:

repo_ids (required) - Array of repository IDs to delete

Returns: Deletion confirmation with statistics (files, chunks, symbols, workspaces, services removed).

`list_indexed_repos`

List all indexed repositories with optional metadata, workspace counts, and service counts.

Parameters:

include_metadata - Include repository metadata (default: true)
include_workspace_count - Include workspace count for monorepos (default: true)
include_service_count - Include service count for microservices (default: true)
repo_type_filter - Filter by repository type

Returns: List of repositories with IDs, types, file counts, last indexed time, and optional metadata.

Monorepo Tools

`list_workspaces`

List all workspaces in indexed repositories for monorepo support.

Parameters:

repo_id - Filter by repository ID (optional)
include_dependencies - Include dependency information (default: false)
include_metadata - Include package.json metadata (default: false)

Returns: List of workspaces with package names, paths, file counts, and optional dependencies.

`get_workspace_context`

Get full context for a workspace including dependencies and dependents.

Parameters:

workspace_id - Workspace ID (use list_workspaces to find)
package_name - Package name (alternative to workspace_id)
repo_id - Repository ID (required if using package_name)
include_dependencies - Include workspace dependencies (default: true)
include_dependents - Include workspaces that depend on this one (default: true)
dependency_depth - Dependency tree depth (1-5, default: 2)

Returns: Workspace metadata, dependency tree, dependent workspaces, and file list.

`find_cross_workspace_usages`

Find workspace package usages across the monorepo.

Parameters:

workspace_id - Source workspace ID
package_name - Source package name (alternative to workspace_id)
symbol_name - Specific symbol to track (optional)
include_indirect - Include indirect usages (default: false)
max_depth - Dependency chain depth (1-5, default: 2)

Returns: List of workspaces using the target package/symbol with file locations.

Microservice Tools

`list_services`

List all services across indexed repositories for microservice support.

Parameters:

repo_id - Filter by repository ID (optional)
service_type - Filter by type: 'docker', 'serverless', 'mobile' (optional)
include_dependencies - Include service dependencies (default: false)
include_api_endpoints - Include API endpoint counts (default: false)

Returns: List of services with IDs, names, types, file counts, and optional API information.

`get_service_context`

Get full context for a service including API contracts and dependencies.

Parameters:

service_id - Service ID (use list_services to find)
service_name - Service name (alternative to service_id)
repo_id - Repository ID (required if using service_name)
include_dependencies - Include service dependencies (default: true)
include_dependents - Include services that depend on this one (default: true)
include_api_contracts - Include API endpoint definitions (default: true)
dependency_depth - Dependency tree depth (1-5, default: 1)

Returns: Service metadata, API contracts (REST/GraphQL/gRPC), dependency graph, and file list.

`find_cross_service_calls`

Find inter-service API calls across microservices.

Parameters:

source_service_id - Source service ID (optional)
target_service_id - Target service ID (optional)
endpoint_pattern - Endpoint regex pattern (e.g., /api/users/.*, optional)
include_reverse - Also show calls in reverse direction (default: false)

Returns: List of inter-service API calls with endpoints, HTTP methods, and call counts.

API Contract Tools

`search_api_contracts`

Search API endpoints across services with semantic understanding.

Parameters:

query (required) - API search query (e.g., "user authentication endpoint")
api_types - Filter by type: ['rest', 'graphql', 'grpc'] (default: all)
service_filter - Filter by service IDs (optional)
repo_filter - Filter by repository IDs (optional)
include_deprecated - Include deprecated endpoints (default: false)
max_results - Maximum results (1-100, default: 20)
similarity_threshold - Minimum similarity (0.0-1.0, default: 0.70)

Returns: API endpoints with paths, HTTP methods, service names, implementation files, and similarity scores.

Reference & Documentation Tools

Tools for searching reference materials including markdown documentation (syntax references, Context7-fetched docs) AND reference repository code (indexed frameworks/libraries).

`index_documentation`

Index markdown files for documentation search. Works with explicit paths only.

Parameters:

paths (required) - Array of file or directory paths to index (e.g., ['syntax.md', '/docs/libraries/'])
doc_id - Document identifier (default: derived from path)
tags - Tags for filtering (e.g., ['typescript', 'react'])
force_reindex - Force re-index even if unchanged (default: false)

Returns: Indexing statistics including files indexed, sections created, code blocks extracted, and timing.

Workflow:

Fetch documentation (e.g., from Context7)
Save to markdown file
Index with index_documentation
Search with search_references

`search_references`

Search reference materials including markdown documentation AND reference repository code. Combines both sources for comprehensive reference search.

Parameters:

query (required) - Natural language search query
doc_ids - Filter by document IDs (optional)
tags - Filter by documentation tags (optional)
include_docs - Include markdown documentation results (default: true)
include_code - Include reference repository code results (default: true)
max_results - Maximum results per source (1-50, default: 10)
include_code_blocks - Include code blocks from documentation (default: true)
similarity_threshold - Minimum similarity (0.0-1.0, default: 0.65)

Returns: Combined results from both documentation chunks and reference repository code, with heading breadcrumbs, content snippets, code blocks, file paths, and relevance scores.

Note: Reference repositories are indexed using index_repository with repo_type: 'reference'. They are excluded from search_codebase by default and only searchable via search_references.

`list_documentation`

List all indexed documentation with metadata.

Parameters:

doc_ids - Filter by document IDs (optional)
tags - Filter by tags (optional)

Returns: List of indexed documents with file counts, section counts, code block counts, and indexed timestamps.

`delete_documentation`

Delete indexed documentation by document ID.

Parameters:

doc_ids (required) - Array of document IDs to delete

Returns: Deletion confirmation with chunks and files removed.

See docs/overview.md for complete tool documentation including multi-project/monorepo/microservice architecture details.

Architecture

Hybrid Search

Combines vector similarity search with PostgreSQL full-text search (tsvector/ts_rank_cd) for improved natural language query handling:

hybrid_score = (0.7 * vector_similarity) + (0.3 * keyword_score)

Vector search - Semantic understanding via embeddings
Keyword search - Exact term matching via PostgreSQL full-text search
Configurable weights via HYBRID_VECTOR_WEIGHT and HYBRID_KEYWORD_WEIGHT
Disable with ENABLE_HYBRID_SEARCH=false to use vector-only search

Multi-Stage Retrieval

File-Level - Find relevant files via summary embeddings + full-text search
Chunk-Level - Locate specific code chunks (functions/classes)
Symbol Resolution - Resolve imported symbols and dependencies
Import Expansion - Build dependency graph (max 3 levels)
Deduplication - Remove redundant code from results

Indexing Pipeline

File discovery (respects .gitignore)
Tree-sitter parsing (with regex fallback)
Semantic chunking (functions, classes, blocks)
LLM-based file summaries (configurable model)
Embedding generation (configurable model)
Full-text search vector generation (tsvector)
PostgreSQL + pgvector storage

Performance Characteristics

Accuracy-First Mode (Default)

Indexing: 300-600 files/min
Query Time: <800ms
Relevance: >92% in top 10 results
Context Noise: <2%

Speed-First Mode

Indexing: 500-1000 files/min
Query Time: <500ms
Relevance: >85% in top 10 results

System Requirements

Node.js 22+ (for MCP server)
PostgreSQL 16+ with pgvector extension
Ollama with models installed
Disk Space: ~1GB per 100k LoC indexed
RAM: 8GB minimum (16GB+ recommended for large codebases)
GPU: Optional but recommended (RTX 3060+ for qwen2.5-coder:7b)

Troubleshooting

"Vector dimension mismatch"

Update EMBEDDING_DIMENSIONS in MCP config to match your model, then update vector dimensions in database.sql.

"Connection refused" to PostgreSQL

Check POSTGRES_HOST and POSTGRES_PORT in MCP config. Verify PostgreSQL is running:

sudo systemctl status postgresql  # Linux
brew services list  # macOS

"Model not found" in Ollama

Pull the required models:

ollama pull bge-m3:567m
ollama pull qwen2.5-coder:7b

Verify models are available:

ollama list

Slow indexing

Use smaller summary model: qwen2.5-coder:1.5b instead of 7b
Reduce HNSW_EF_CONSTRUCTION to 64
Enable incremental indexing (default)

Low accuracy results

Increase HNSW_EF_SEARCH to 300-400
Raise SIMILARITY_THRESHOLD to 0.4-0.5 for stricter file matching
Raise CHUNK_SIMILARITY_THRESHOLD to 0.3-0.4 for stricter chunk matching
Use better summary model: qwen2.5-coder:3b or 7b
Lower DEDUP_THRESHOLD to 0.90-0.92

Documentation

See docs/overview.md for detailed documentation including:

Complete architecture details
Database schema
Configuration reference
Implementation guide
Performance tuning

Development

git clone https://github.com/gianged/cindex.git
cd cindex
npm install
npm run build
npm test

Implementation Status

Phase 1 (100%) - Database schema & type system
Phase 2 (100%) - File discovery, parsing, chunking, workspace/service detection
Phase 3 (100%) - Embeddings, summaries, API parsing, 12-language support, Docker/serverless/mobile detection
Phase 4 (100%) - Multi-stage retrieval pipeline (9-stage)
Phase 5 (100%) - MCP tools (17 of 17 implemented)
Phase 6 (100%) - Incremental indexing, optimization, testing

Overall: 100% complete

License

MIT

Author

gianged - Yup, it's me

Contributing

Contributions welcome! Please open an issue or PR on GitHub.

Acknowledgments

Built with:

Model Context Protocol by Anthropic
pgvector for vector search
Ollama for local LLM inference
tree-sitter for code parsing

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

cindex

Features

Performance

Supported Languages

Prerequisites

1. PostgreSQL with pgvector

2. Ollama with Models

Installation

Database Setup

Install MCP Server

Quick Install (Personal Use)

Team Install (Shared via Git)

Custom Configuration

Manual Configuration (Alternative)

Initialize Database Schema

Start Using

Environment Variables

Model Configuration

Database Configuration

Performance Tuning

Indexing Configuration

Feature Flags

Example Configurations

Minimal Configuration

Full Configuration

Speed-First Configuration

Recommended Settings

RTX 4060 / 8GB VRAM (Tested Configuration)

Performance Expectations

Managing Configuration

Verify Installation

Update Configuration

Switch to Speed-First Mode

Remove Server

MCP Tools

Core Search Tools

search_codebase

get_file_context

find_symbol_definition

Repository Management Tools

index_repository

delete_repository

list_indexed_repos

Monorepo Tools

list_workspaces

get_workspace_context

find_cross_workspace_usages

Microservice Tools

list_services

get_service_context

find_cross_service_calls

API Contract Tools

search_api_contracts

Reference & Documentation Tools

index_documentation

search_references

list_documentation

delete_documentation

Architecture

Hybrid Search

Multi-Stage Retrieval

Indexing Pipeline

Performance Characteristics

Accuracy-First Mode (Default)

Speed-First Mode

System Requirements

Troubleshooting

"Vector dimension mismatch"

"Connection refused" to PostgreSQL

"Model not found" in Ollama

Slow indexing

Low accuracy results

Documentation

Development

`search_codebase`

`get_file_context`

`find_symbol_definition`

`index_repository`

`delete_repository`

`list_indexed_repos`

`list_workspaces`

`get_workspace_context`

`find_cross_workspace_usages`

`list_services`

`get_service_context`

`find_cross_service_calls`

`search_api_contracts`

`index_documentation`

`search_references`

`list_documentation`

`delete_documentation`