@iceinvein/code-intelligence-mcp-standalone

v3.3.0

Published

17 days ago

Code Intelligence MCP Server - Standalone HTTP mode for multi-client setups

Downloads

602

0High
0Medium
0Low

iceinvein

mcp model-context-protocol code-intelligence code-search semantic-search code-navigation llm ai-coding tree-sitter code-indexer claude cursor opencode standalone

Code Intelligence MCP Server

Give your AI coding agent a deep understanding of your codebase.

A local code indexing engine that gives LLM agents like Claude Code, Cursor, Trae, and OpenCode semantic search, call graphs, type hierarchies, and impact analysis across your codebase. Written in Rust with Metal GPU acceleration.

Zero config. Runs via npx. Indexes in the background.

Install

Claude Code

claude mcp add code-intelligence -- npx -y @iceinvein/code-intelligence-mcp

{
  "mcpServers": {
    "code-intelligence": {
      "command": "npx",
      "args": ["-y", "@iceinvein/code-intelligence-mcp"],
      "env": {}
    }
  }
}

Cursor

Add to .cursor/mcp.json:

{
  "mcpServers": {
    "code-intelligence": {
      "command": "npx",
      "args": ["-y", "@iceinvein/code-intelligence-mcp"]
    }
  }
}

OpenCode / Trae

Add to opencode.json:

{
  "mcp": {
    "code-intelligence": {
      "type": "local",
      "command": ["npx", "-y", "@iceinvein/code-intelligence-mcp"],
      "enabled": true
    }
  }
}

On first launch, the server downloads three models, ~3.2 GB total: the embedding model (Jina Code 1.5b, ~1.5 GB), the description LLM (Qwen2.5-Coder-1.5B, ~1.0 GB), and the cross-encoder reranker (bge-reranker-v2-m3, ~600 MB). Indexing then runs in the background. Models are cached in ~/.code-intelligence/models/.

What It Does

Unlike basic text search (grep/ripgrep), this server builds a local knowledge graph of your code and exposes it through 32 MCP tools.

| Capability | How It Works | |---|---| | Hybrid search | BM25 keyword search (Tantivy) + semantic vector search (LanceDB, jina-code-embeddings-1.5b, 1536-dim Matryoshka) merged via Reciprocal Rank Fusion | | Cross-encoder reranking | bge-reranker-v2-m3 re-scores top candidates (llama.cpp + Metal) for precision tuning | | On-device LLM descriptions | Qwen2.5-Coder-1.5B generates natural-language summaries for every symbol, bridging the gap between how you search ("auth handler") and how code is named (authenticate_request) | | Graph intelligence | Call hierarchies, type graphs, dependency trees, and PageRank-based importance scoring | | Impact analysis | Find all code affected by a change, with optional git co-change history for confidence scoring | | Smart ranking | Test detection, export boosting, directory semantics, intent detection, edge expansion, framework-pattern injection, score-gap filtering, sub-query coverage | | Multi-repo | Index and search across multiple repositories simultaneously, including cross-repo dependency exploration | | Auto-reindex | OS-native file watching (FSEvents) keeps the index fresh as you code |

Tools (32)

Upgrade note (3.0.0): search_code no longer assembles a context markdown bundle by default. Pass context: "snippets" for compact per-hit code, or context: "full" to restore the v2 behavior. See Migration below.

Search & Navigation

| Tool | What It Does | |---|---| | search_code | Semantic + keyword hybrid search. Handles natural language ("how does auth work?") and structural queries ("class User"). Pass context: "snippets" or "full" to receive source code alongside hits. | | get_definition | Jump to a symbol's full definition | | find_references | Find all usages of a function, class, or variable | | get_call_hierarchy | Upstream callers and downstream callees | | get_type_graph | Inheritance chains, type aliases, implements relationships | | explore_dependency_graph | Module-level import/export dependencies | | get_file_symbols | All symbols defined in a file | | get_usage_examples | Real-world usage examples from the codebase | | get_context_bundle | Pre-assembled context bundle (definitions, call chains, tests, similar code) for a task description, in one call |

Analysis

| Tool | What It Does | |---|---| | find_affected_code | Reverse dependency analysis — what breaks if this changes? | | predict_impact | Like find_affected_code but also factors in git co-change history for confidence scoring | | trace_data_flow | Follow variable reads and writes through the code | | find_similar_code | Semantically similar code to a given symbol | | get_similarity_cluster | Symbols in the same semantic cluster | | find_duplicates | Groups of semantically near-duplicate symbols based on embedding clusters | | find_dead_code | Symbols with zero incoming references — candidates for safe removal | | explain_search | Scoring breakdown explaining why results ranked as they did | | summarize_file | File summary with symbol counts and key exports | | get_module_summary | All exported symbols from a module with signatures |

Testing, Frameworks & Discovery

| Tool | What It Does | |---|---| | find_tests_for_symbol | Find tests that cover a given symbol | | search_todos | Search TODO/FIXME comments | | search_decorators | Find TypeScript/JavaScript decorators | | search_framework_patterns | Find framework-specific patterns (routes, middleware, WebSocket handlers) | | find_undocumented_symbols | Symbols missing LLM-generated descriptions, ranked by importance | | find_stale_descriptions | Symbols whose LLM descriptions are out of sync with the current code (content-hash mismatch) |

Cross-Repo (standalone mode)

| Tool | What It Does | |---|---| | search_across_repos | Run a single query across all indexed repos, merged by score | | explore_cross_repo_dependencies | Walk dependency edges that cross repo boundaries |

Index Management & Learning

| Tool | What It Does | |---|---| | hydrate_symbols | Load full context for a set of symbol IDs | | report_selection | Feedback loop — tell the server which result was useful | | report_file_access | Tell the server when a file is viewed/edited; feeds file-affinity ranking | | refresh_index | Manually trigger re-indexing | | get_index_stats | Index statistics (files, symbols, edges, last updated) |

Supported Languages

Rust, TypeScript/TSX, JavaScript, Python, Go, Java, C, C++

Standalone Mode (Multi-Client)

By default each MCP client spawns its own server process. If you run multiple clients (e.g. 5 Claude Code sessions across 3 repos), standalone mode loads the models once and shares them:

npx @iceinvein/code-intelligence-mcp-standalone

Then point all clients to http://localhost:3333/mcp:

claude mcp add --transport http code-intelligence http://localhost:3333/mcp

{
  "mcpServers": {
    "code-intelligence": {
      "url": "http://localhost:3333/mcp"
    }
  }
}

{
  "mcp": {
    "code-intelligence": {
      "type": "remote",
      "url": "http://localhost:3333/mcp",
      "enabled": true
    }
  }
}

The server auto-detects each client's workspace via the MCP roots capability. Separate indexes are maintained per repo, models are shared.

┌──────────┐  ┌──────────┐  ┌──────────┐
│ Claude A  │  │ Cursor B │  │  Trae C  │
└─────┬─────┘  └─────┬─────┘  └─────┬─────┘
      │              │              │
      └──────── POST /mcp ─────────┘
                     │
        ┌────────────┴────────────┐
        │   Standalone Server     │
        │  (shared models, once)  │
        ├─────────────────────────┤
        │ Repo A   Repo B  Repo C│
        │ indexes  indexes indexes│
        └─────────────────────────┘

Configuration

Works out of the box with no configuration. All settings are optional environment variables.

Core:

| Variable | Default | Description | |---|---|---| | WATCH_MODE | true | Auto-reindex on file changes | | INDEX_PATTERNS | **/*.ts,**/*.rs,... | Glob patterns to index | | EXCLUDE_PATTERNS | **/node_modules/**,... | Glob patterns to exclude | | REPO_ROOTS | — | Comma-separated paths for multi-repo |

Embeddings:

| Variable | Default | Description | |---|---|---| | EMBEDDINGS_BACKEND | llamacpp | llamacpp or hash (fast testing, no model download) | | EMBEDDINGS_DEVICE | metal | metal (GPU) or cpu |

Ranking:

| Variable | Default | Description | |---|---|---| | HYBRID_ALPHA | 0.7 | Vector vs keyword weight (0 = all keyword, 1 = all vector) | | RANK_EXPORTED_BOOST | 1.0 | Boost for exported/public symbols | | RANK_TEST_PENALTY | 0.1 | Penalty multiplier for test files | | RANK_POPULARITY_WEIGHT | 0.05 | PageRank influence on ranking |

Context:

| Variable | Default | Description | |---|---|---| | MAX_CONTEXT_TOKENS | 8192 | Token budget for assembled context | | MAX_CONTEXT_BYTES | 200000 | Byte-based fallback limit |

Learning (off by default):

| Variable | Default | Description | |---|---|---| | LEARNING_ENABLED | false | Track user selections to personalize results | | LEARNING_SELECTION_BOOST | 0.1 | Max boost from selection history | | LEARNING_FILE_AFFINITY_BOOST | 0.05 | Max boost from file access frequency |

[server]
host = "127.0.0.1"
port = 3333

[embeddings]
backend = "llamacpp"
device = "metal"

[repos.defaults]
index_patterns = "**/*.ts,**/*.tsx,**/*.rs,**/*.py,**/*.go"
exclude_patterns = "**/node_modules/**,**/dist/**,**/.git/**"
watch_mode = true

[lifecycle]
warm_ttl_seconds = 300      # How long idle repos stay in memory

Priority: CLI flags > Environment variables > server.toml > Defaults

How Ranking Works

The search pipeline runs keyword search (BM25) and semantic vector search in parallel, merges them with Reciprocal Rank Fusion, then applies structural signals:

Intent detection — "struct User" boosts definitions, "who calls login" triggers graph lookup, "User schema" boosts models 50-75x
Query decomposition — "authentication and authorization" automatically splits into sub-queries; sub-query coverage ensures each term has at least one matching result
LLM-enriched index — on-device Qwen2.5-Coder generates descriptions bridging vocabulary gaps between how you search and how code is named
Cross-encoder reranker — bge-reranker-v2-m3 re-scores top candidates for precision (always-on by default, disable with RERANKER_ENABLED=false)
PageRank — graph-based importance scoring identifies central, heavily-used symbols
Morphological expansion — watch matches watcher, index matches reindex
Framework-pattern injection — route, middleware, and handler patterns surface alongside symbol matches
Multi-layer test detection — file paths, symbol names, and AST-level analysis (#[test], mod tests)
Edge expansion — high-ranking symbols pull in structurally related code (callers, type members)
Export boost — public API surface ranks above private helpers
Score-gap detection — drops trailing results that fall off a relevance cliff
Token-aware truncation — context assembly keeps query-relevant lines within token budgets

For the full deep dive, see System Architecture.

Data Storage

All data lives in ~/.code-intelligence/:

~/.code-intelligence/
├── models/                     # Shared across repos (~3.2 GB total)
│   ├── jina-code-embeddings-1.5b-gguf/   # ~1.5 GB, 1536-dim Matryoshka, Q8_0
│   ├── qwen2.5-coder-1.5b-gguf/          # ~1.0 GB, Q4_K_M, description LLM
│   └── bge-reranker-v2-m3-gguf/          # ~600 MB, Q8_0, cross-encoder reranker
├── repos/
│   ├── registry.json           # Tracks all known repos
│   └── <hash>/                 # Per-repo (SHA256 of repo path)
│       ├── code-intelligence.db   # SQLite (symbols, edges, metadata, descriptions)
│       ├── tantivy-index/         # BM25 full-text search
│       └── vectors/               # LanceDB vector embeddings
├── logs/
└── server.toml                 # Standalone config (optional)

Development

cargo build --release
cargo test                                        # Full test suite
EMBEDDINGS_BACKEND=hash cargo test                # Fast (no model download)
./scripts/start_mcp.sh                            # Start MCP server

src/
├── indexer/          # File scanning, Tree-Sitter parsing, symbol extraction, embeddings, LLM descriptions
├── storage/          # SQLite, Tantivy (BM25), LanceDB (vectors)
├── retrieval/        # Hybrid search, ranking signals, RRF, context assembly, reranker, HyDE
├── graph/            # PageRank, call hierarchy, type graphs, dependency graph
├── handlers/         # MCP tool implementations
├── server/           # MCP protocol routing (embedded + standalone)
├── tools/            # Tool definitions (32 MCP tools)
├── embeddings/       # jina-code-embeddings-1.5b (GGUF via llama.cpp + Metal)
├── llm/              # Qwen2.5-Coder-1.5B (GGUF via llama.cpp + Metal)
├── reranker/         # bge-reranker-v2-m3 cross-encoder (GGUF via llama.cpp + Metal)
└── path/             # UTF-8 path normalization (camino)

Migration: v2 → v3

search_code previously returned both ranked hits and a context markdown bundle (source code for top hits + auto-expanded "Examples" / "Related" symbols). The bundle was always assembled, even when callers only needed the ranked list, and could exceed 30 KB per call.

In v3.0.0, search_code is a discovery tool by default. It returns hits only. Source code is opt-in via the new context parameter:

| context value | What you get | Typical size (limit=5) | |---|---|---| | "none" (default) | hits array only — no source code, no graph expansion | ~600 B | | "snippets" | hits with a snippet field on each (signature + first 8 body lines) | ~2-4 KB | | "full" | Legacy v2 behavior: context markdown bundle with graph expansion | ~15 KB |

To restore v2 behavior, pass context: "full" on every call.

For most agent workflows, "snippets" is the recommended setting: enough code to ground the next decision, without rendering an entire markdown bundle. Agents that need full source for selected hits should call hydrate_symbols(ids[]) after search_code.

The web UI and cross-repo aggregator continue to request context: "full" internally; only the public MCP search_code tool default has changed.

License

MIT