semantic-code-mcp
v2.3.0
Published
AI-powered semantic code search for coding agents. MCP server with multi-provider embeddings and hybrid search.
Maintainers
Readme
Semantic Code MCP
AI-powered semantic code search for coding agents. An MCP server with non-blocking background indexing, multi-provider embeddings (Gemini, Vertex AI, OpenAI, local), and Milvus / Zilliz Cloud vector storage — designed for multi-agent concurrent access.
Run Claude Code, Codex, Copilot, and Antigravity against the same code index simultaneously. Indexing runs in the background; search works immediately while indexing continues.
Ask "where do we handle authentication?" and find code that uses
login,session,verifyCredentials— even when no file contains the word "authentication."
Quick Start
npx -y semantic-code-mcp@latest --workspace /path/to/your/projectMCP config:
{
"mcpServers": {
"semantic-code-mcp": {
"command": "npx",
"args": ["-y", "semantic-code-mcp@latest", "--workspace", "/path/to/project"]
}
}
}graph LR
A["Claude Code"] --> M["Milvus Standalone<br/>(Docker)"]
B["Codex"] --> M
C["Copilot"] --> M
D["Antigravity"] --> M
M --> V["Shared Vector Index"]Why
Traditional grep and keyword search break down when you don't know the exact terms used in the codebase. Semantic search bridges that gap:
- Concept matching —
"error handling"findstry/catch,onRejected,fallbackpatterns - Typo-tolerant —
"embeding modle"still finds embedding model code - Hybrid scoring — semantic similarity (0.7 weight) + lexical exact/partial match boost (up to +1.5)
- Search dedup — per-file result limiting (default 2) prevents a single large file from dominating results
- Context-aware chunking — AST-based (Tree-sitter) or smart regex splitting preserves code structure
- Fast — progressive indexing lets you search while the codebase is still being indexed
Based on Cursor's research showing semantic search improves AI agent performance by 12.5%.
Setup
{
"mcpServers": {
"semantic-code-mcp": {
"command": "npx",
"args": ["-y", "semantic-code-mcp@latest", "--workspace", "/path/to/project"]
}
}
}Claude Code: ~/.claude/settings.local.json → mcpServers
Claude Desktop: ~/Library/Application Support/Claude/claude_desktop_config.json
Create .vscode/mcp.json in your project root:
{
"servers": {
"semantic-code-mcp": {
"command": "npx",
"args": ["-y", "semantic-code-mcp@latest", "--workspace", "${workspaceFolder}"]
}
}
}VS Code and Cursor support
${workspaceFolder}. Windsurf requires absolute paths.
~/.codex/config.toml:
[mcp_servers.semantic-code-mcp]
command = "npx"
args = ["-y", "semantic-code-mcp@latest", "--workspace", "/path/to/project"]~/.gemini/antigravity/mcp_config.json:
{
"mcpServers": {
"semantic-code-mcp": {
"command": "npx",
"args": ["-y", "semantic-code-mcp@latest", "--workspace", "/path/to/project"]
}
}
}For monorepos or workspaces with 1000+ files, a shell wrapper script gives you:
- Real-time logs — see indexing progress, error details, 429 retry status
- No MCP timeout — long-running index operations won't be killed
- Environment isolation — pin provider credentials per project
Create start-semantic-code-mcp.sh:
#!/bin/bash
export SMART_CODING_WORKSPACE="/path/to/monorepo"
export SMART_CODING_EMBEDDING_PROVIDER="vertex"
export SMART_CODING_VECTOR_STORE_PROVIDER="milvus"
export SMART_CODING_MILVUS_ADDRESS="http://localhost:19530"
export GOOGLE_APPLICATION_CREDENTIALS="/path/to/service-account.json"
export SMART_CODING_VERTEX_PROJECT="your-gcp-project-id"
cd /path/to/semantic-code-mcp
exec node index.jschmod +x start-semantic-code-mcp.shThen reference in your MCP config:
{
"semantic-code-mcp": {
"command": "/absolute/path/to/start-semantic-code-mcp.sh",
"args": []
}
}When to use shell scripts over npx:
- Monorepo with multiple sub-projects sharing one index
- 1000+ files requiring long initial indexing
- Debugging 429 rate-limit or gRPC errors (need real-time stderr)
- Pinning specific provider credentials per workspace
Features
Multi-Provider Embeddings
| Provider | Model | Privacy | Speed | | --------------------- | ----------------------- | ---------- | ------------- | | Local (default) | nomic-embed-text-v1.5 | 100% local | ~50ms/chunk | | Gemini | gemini-embedding-001 | API call | Fast, batched | | OpenAI | text-embedding-3-small | API call | Fast | | OpenAI-compatible | Any compatible endpoint | Varies | Varies | | Vertex AI | Google Cloud models | GCP | Fast |
Flexible Vector Storage
- SQLite (default) — zero-config, single-file
.smart-coding-cache/embeddings.db - Milvus — scalable ANN search for large codebases or shared team indexes
Smart Code Chunking
Three modes to match your codebase:
smart(default) — regex-based, language-aware splittingast— Tree-sitter parsing for precise function/class boundariesline— simple fixed-size line chunks
Resource Throttling
CPU capped at 50% during indexing. Your machine stays responsive.
Multi-Agent Concurrent Access
Multiple AI agents (Claude Code, Codex, Copilot, Antigravity) can query the same vector index simultaneously via Milvus Standalone (Docker). No file locking, no index corruption.
Milvus Standalone runs 3 containers working together:
graph LR
A["semantic-code-mcp"] -->|"gRPC :19530"| M["milvus standalone"]
M -->|"object storage"| S["minio :9000"]
M -->|"metadata"| E["etcd :2379"]| Container | Role | Image |
| -------------- | ------------------------------------- | ----------------- |
| standalone | Vector engine (gRPC :19530) | milvusdb/milvus |
| etcd | Metadata store (cluster coordination) | coreos/etcd |
| minio | Object storage (index files, logs) | minio/minio |
Performance Guidelines
| Resource | Minimum | Recommended | | -------- | -------- | ----------------------------- | | RAM | 4 GB | 8 GB+ | | Disk | 10 GB | 50 GB+ (scales with codebase) | | CPU | 2 cores | 4+ cores | | Docker | v20+ | Latest |
⚠️ RAM is the critical bottleneck. Milvus Standalone idles at ~2.5 GB RAM across the 3 containers. Machines with < 4 GB will experience swap thrashing and gRPC timeouts. Check with
docker stats.
1. Install with Docker Compose
# docker-compose.yml
version: '3.5'
services:
etcd:
image: coreos/etcd:v3.5.18
environment:
ETCD_AUTO_COMPACTION_MODE: revision
ETCD_AUTO_COMPACTION_RETENTION: "1000"
ETCD_QUOTA_BACKEND_BYTES: "4294967296"
command: etcd -advertise-client-urls=http://127.0.0.1:2379 -listen-client-urls http://0.0.0.0:2379 --data-dir /etcd
volumes:
- etcd-data:/etcd
minio:
image: minio/minio:RELEASE.2023-03-20T20-16-18Z
environment:
MINIO_ACCESS_KEY: minioadmin
MINIO_SECRET_KEY: minioadmin
command: minio server /minio_data --console-address ":9001"
ports:
- "9000:9000"
- "9001:9001"
volumes:
- minio-data:/minio_data
standalone:
image: milvusdb/milvus:v2.5.1
command: ["milvus", "run", "standalone"]
environment:
ETCD_ENDPOINTS: etcd:2379
MINIO_ADDRESS: minio:9000
ports:
- "19530:19530"
- "9091:9091"
volumes:
- milvus-data:/var/lib/milvus
depends_on:
- etcd
- minio
volumes:
etcd-data:
minio-data:
milvus-data:2. Start & Verify
# Start all 3 containers
docker compose up -d
# Verify all 3 containers are running
docker compose ps
# NAME STATUS
# etcd running
# minio running
# standalone running (healthy)
# Check RAM usage (expect ~2.5 GB total idle)
docker stats --no-stream3. Configure MCP to use Milvus
{
"env": {
"SMART_CODING_VECTOR_STORE_PROVIDER": "milvus",
"SMART_CODING_MILVUS_ADDRESS": "http://localhost:19530"
}
}4. Verify connection
# Should return collection list (may be empty initially)
curl http://localhost:19530/v1/vector/collections5. Lifecycle Management
# Stop all containers (preserves data)
docker compose stop
# Restart after reboot
docker compose start
# Full reset (removes all indexed vectors)
docker compose down -v
# View logs for debugging
docker compose logs -f standalone6. Monitoring
- MinIO Console: http://localhost:9001 (minioadmin / minioadmin)
- Milvus Health: http://localhost:9091/healthz
- Container RAM:
docker stats --no-stream
Troubleshooting
| Symptom | Cause | Fix |
| ------------------------------------- | ---------------------------- | -------------------------------------------------------------------------------- |
| gRPC timeout / connection refused | Milvus not fully started | Wait 30–60s after docker compose up -d, check docker compose logs standalone |
| Swap thrashing, slow queries | < 4 GB RAM | Upgrade RAM or use SQLite for single-agent setups |
| etcd: mvcc: database space exceeded | etcd compaction backlog | docker compose restart etcd |
| Milvus OOM killed | RAM pressure from other apps | Close heavy apps or increase Docker memory limit |
SQLite vs Milvus: SQLite is single-process — only one agent can write at a time. Milvus handles concurrent reads/writes from multiple agents without conflicts. Use Milvus when running 2+ agents on the same codebase.
Tools
| Tool | Description |
| ---------------------- | ------------------------------------------------------------ |
| a_semantic_search | Find code by meaning. Hybrid semantic + exact match scoring. |
| b_index_codebase | Trigger manual reindex (normally automatic & incremental). |
| c_clear_cache | Reset embeddings cache entirely. |
| d_check_last_version | Look up latest package version from 20+ registries. |
| e_set_workspace | Switch project at runtime without restart. |
| f_get_status | Server health: version, index progress, config. |
IDE Setup
| IDE / App | Guide | ${workspaceFolder} |
| ------------------ | ----------------------------------------- | -------------------- |
| VS Code | Setup | ✅ |
| Cursor | Setup | ✅ |
| Windsurf | Setup | ❌ |
| Claude Desktop | Setup | ❌ |
| OpenCode | Setup | ❌ |
| Raycast | Setup | ❌ |
| Antigravity | Setup | ❌ |
Multi-Project
{
"mcpServers": {
"code-frontend": {
"command": "npx",
"args": ["-y", "semantic-code-mcp@latest", "--workspace", "/path/to/frontend"]
},
"code-backend": {
"command": "npx",
"args": ["-y", "semantic-code-mcp@latest", "--workspace", "/path/to/backend"]
}
}
}Configuration
All settings via environment variables. Prefix: SMART_CODING_.
Core
| Variable | Default | Description |
| ------------------------------- | --------- | ------------------------------------------------------------------------------------------------------- |
| SMART_CODING_VERBOSE | false | Detailed logging |
| SMART_CODING_MAX_RESULTS | 5 | Search results returned |
| SMART_CODING_BATCH_SIZE | 100 | Files per parallel batch |
| SMART_CODING_MAX_FILE_SIZE | 1048576 | Max file size (1MB) |
| SMART_CODING_CHUNK_SIZE | 25 | Lines per chunk |
| SMART_CODING_CHUNKING_MODE | smart | smart / ast / line |
| SMART_CODING_WATCH_FILES | false | Auto-reindex on changes |
| SMART_CODING_AUTO_INDEX_DELAY | false | Background index on startup. false=off (multi-agent safe), true=5s, or ms value. Single-agent only. |
| SMART_CODING_MAX_CPU_PERCENT | 50 | CPU cap during indexing |
Embedding Provider
| Variable | Default | Description |
| ---------------------------------- | -------------------------------- | -------------------------------------------------------------- |
| SMART_CODING_EMBEDDING_PROVIDER | local | local / gemini / openai / openai-compatible / vertex |
| SMART_CODING_EMBEDDING_MODEL | nomic-ai/nomic-embed-text-v1.5 | Model name |
| SMART_CODING_EMBEDDING_DIMENSION | 128 | MRL dimension (64–768) |
| SMART_CODING_DEVICE | auto | cpu / webgpu / auto |
Gemini
| Variable | Default | Description |
| --------------------------------- | ---------------------- | ----------------- |
| SMART_CODING_GEMINI_API_KEY | — | API key |
| SMART_CODING_GEMINI_MODEL | gemini-embedding-001 | Model |
| SMART_CODING_GEMINI_DIMENSIONS | 768 | Output dimensions |
| SMART_CODING_GEMINI_BATCH_SIZE | 24 | Micro-batch size |
| SMART_CODING_GEMINI_MAX_RETRIES | 3 | Retry count |
OpenAI / Compatible
| Variable | Default | Description |
| --------------------------------- | ------- | -------------------------- |
| SMART_CODING_EMBEDDING_API_KEY | — | API key |
| SMART_CODING_EMBEDDING_BASE_URL | — | Base URL (compatible only) |
Vertex AI
| Variable | Default | Description |
| ------------------------------ | ------------- | -------------- |
| SMART_CODING_VERTEX_PROJECT | — | GCP project ID |
| SMART_CODING_VERTEX_LOCATION | us-central1 | Region |
Vector Store
| Variable | Default | Description |
| ------------------------------------ | ------------------------- | -------------------------------------- |
| SMART_CODING_VECTOR_STORE_PROVIDER | sqlite | sqlite / milvus |
| SMART_CODING_MILVUS_ADDRESS | — | Milvus endpoint or Zilliz Cloud URI |
| SMART_CODING_MILVUS_TOKEN | — | Auth token (required for Zilliz Cloud) |
| SMART_CODING_MILVUS_DATABASE | default | Database name |
| SMART_CODING_MILVUS_COLLECTION | smart_coding_embeddings | Collection |
Zilliz Cloud (Managed Milvus)
For teams or serverless deployments, use Zilliz Cloud instead of self-hosted Docker:
{
"env": {
"SMART_CODING_VECTOR_STORE_PROVIDER": "milvus",
"SMART_CODING_MILVUS_ADDRESS": "https://in03-xxxx.api.gcp-us-west1.zillizcloud.com",
"SMART_CODING_MILVUS_TOKEN": "your-zilliz-api-key"
}
}| Feature | Milvus Standalone (Docker) | Zilliz Cloud | | ----------- | -------------------------- | --------------------------- | | Setup | Self-hosted, 3 containers | Managed SaaS | | RAM | ~2.5 GB idle | None (serverless) | | Multi-agent | ✅ via shared Docker | ✅ via shared endpoint | | Scaling | Manual | Auto-scaling | | Free tier | — | 2 collections, 1M vectors | | Best for | Local dev, single machine | Team use, CI/CD, production |
Get your Zilliz Cloud URI and API key from the Zilliz Console → Cluster → Connect.
Search Tuning
| Variable | Default | Description |
| --------------------------------- | ------- | ------------------------------------------------------------------------------------------- |
| SMART_CODING_SEMANTIC_WEIGHT | 0.7 | Semantic score weight (ANN similarity × this value) |
| SMART_CODING_EXACT_MATCH_BOOST | 1.5 | Boost added when query appears verbatim in chunk content |
| SMART_CODING_DEDUP_MAX_PER_FILE | 1 | Max results per file. Ensures maximum source diversity — one chunk per file. 0 = disabled |
Hybrid scoring formula: score = ANN_similarity × semanticWeight + lexicalBoost
| Match type | Boost value |
| ------------- | ------------------------------------ |
| Exact match | +exactMatchBoost (default +1.5) |
| Partial match | +(matchedWords / totalWords) × 0.3 |
| No match | +0 |
Example with Gemini + Milvus
{
"mcpServers": {
"semantic-code-mcp": {
"command": "npx",
"args": ["-y", "semantic-code-mcp@latest", "--workspace", "/path/to/project"],
"env": {
"SMART_CODING_EMBEDDING_PROVIDER": "gemini",
"SMART_CODING_GEMINI_API_KEY": "YOUR_KEY",
"SMART_CODING_VECTOR_STORE_PROVIDER": "milvus",
"SMART_CODING_MILVUS_ADDRESS": "http://localhost:19530"
}
}
}
}Architecture
graph TD
A["MCP Server — index.js"] --> B["Features"]
B --> B1["hybrid-search"]
B --> B2["index-codebase"]
B --> B3["set-workspace / get-status / clear-cache"]
B2 --> C["Code Chunking — AST or Smart Regex"]
C --> D["Embedding — Local / Gemini / Vertex / OpenAI"]
D --> E["Vector Store — SQLite or Milvus"]
B1 --> D
B1 --> EHow It Works
flowchart LR
A["📁 Source Files"] -->|"glob + .gitignore"| B["✂️ Smart/AST<br/>Chunking"]
B -->|language-aware| C["🧠 AI Embedding<br/>(Local or API)"]
C -->|vectors| D["💾 SQLite / Milvus<br/>Storage"]
D -->|incremental hash| D
E["🔍 Search Query"] -->|embed| C
C -->|"k×5 oversample"| F["📊 Hybrid Scoring<br/>semantic × 0.7<br/>+ lexical boost"]
F --> DD["🔄 Dedup<br/>max 2 per file"]
DD --> G["🎯 Top N Results<br/>with relevance scores"]
style A fill:#2d3748,color:#e2e8f0
style C fill:#553c9a,color:#e9d8fd
style D fill:#2a4365,color:#bee3f8
style F fill:#744210,color:#fefcbf
style DD fill:#553c9a,color:#e9d8fd
style G fill:#22543d,color:#c6f6d5Progressive indexing — search works immediately while indexing continues in the background. Only changed files are re-indexed on subsequent runs.
Incremental Indexing & Optimization
Semantic Code MCP uses a hash-based incremental indexing strategy to minimize redundant work:
flowchart TD
A["File discovered"] --> B{"Hash changed?"}
B -->|No| C["Skip — use cached vectors"]
B -->|Yes| D["Re-chunk & re-embed"]
D --> E["Update vector store"]
F["Deleted file detected"] --> G["Prune stale vectors"]
style C fill:#22543d,color:#c6f6d5
style D fill:#744210,color:#fefcbf
style G fill:#742a2a,color:#fed7d7How it works:
- File discovery — glob patterns with
.gitignore-aware filtering - Hash comparison — each file's
mtime + sizeis compared against the cached index - Delta processing — only changed/new files are chunked and embedded
- Stale pruning — deleted files are removed from the vector store automatically
- Reconciliation sweep — see below
- Progressive search — queries work immediately, even mid-indexing
Reconciliation Sweep
Hash-based pruning catches deletions during normal indexing, but can miss ghost vectors when:
- The hash cache (
file-hashes.json) is cleared (e.g.,c_clear_cache) - Files are moved outside the workspace
- A previous indexing job was interrupted
The reconciliation sweep runs automatically after each b_index_codebase to catch these edge cases:
flowchart LR
A["🔍 Query Milvus\n(all file paths)"] --> B{"File exists\non disk?"}
B -->|Yes| C["✅ Keep"]
B -->|No| D["🗑️ Delete vectors\nfilter: file == '...'"]
D --> E["📊 Report via\nf_get_status"]
style A fill:#2a4365,color:#bee3f8
style C fill:#22543d,color:#c6f6d5
style D fill:#742a2a,color:#fed7d7
style E fill:#744210,color:#fefcbfStatus response (via f_get_status):
{
"index": {
"status": "ready",
"lastReconcile": {
"orphans": 0,
"seconds": 0.43
}
}
}Reconciliation is independent of
file-hashes.json— it directly compares Milvus ↔ disk.
Performance characteristics:
| Scenario | Behavior | Typical Time |
| --------------------------- | ----------------- | ------------------------------ |
| First run (500 files) | Full index | ~30–60s (API), ~2–5min (local) |
| Subsequent run (no changes) | Hash check only | < 1s |
| 10 files changed | Incremental delta | ~2–5s |
| Branch switch | Partial re-index | ~5–15s |
| force=true | Full rebuild | Same as first run |
⚠️ Multi-agent warning: Auto-index is disabled by default to prevent concurrent Milvus writes when multiple agents share the same server. Set
SMART_CODING_AUTO_INDEX_DELAY=true(5s) only if a single agent connects to this MCP server. Useb_index_codebasefor explicit on-demand indexing in multi-agent setups.
MCP tool calls have timeout limits and don't expose real-time logs. For bulk operations (initial setup, full rebuild, migration), use the CLI reindex script directly:
cd /path/to/semantic-code-mcp
node reindex.js /path/to/workspace --forceWhen to use CLI over MCP tools:
| Scenario | Use |
| ---------------------------- | ----------------------------------- |
| Daily incremental updates | MCP b_index_codebase(force=false) |
| Initial workspace setup | CLI node reindex.js /path --force |
| Full rebuild after migration | CLI node reindex.js /path --force |
| 1000+ file bulk update | CLI (timeout-safe, real-time logs) |
| Debugging 429 / gRPC errors | CLI (stderr visible) |
The CLI reindex script uses the same incremental engine under the hood.
--forceonly forces re-embedding; it still uses the same hash-based delta for efficiency.
Non-Blocking Indexing Workflow
All indexing operations run in the background and return immediately. The agent can search while indexing continues.
sequenceDiagram
participant Agent
participant MCP as semantic-code-mcp
participant BG as Background Thread
participant Store as Milvus / SQLite
Agent->>MCP: b_index_codebase(force=false)
MCP->>BG: startBackgroundIndexing()
MCP-->>Agent: {status: "started", message: "..."}
Note over Agent: ⚡ Returns instantly
loop Poll every 2-3s
Agent->>MCP: f_get_status()
MCP-->>Agent: {index.status: "indexing", progress: "150/500 files"}
end
BG->>Store: upsert vectors
BG-->>MCP: done
Agent->>MCP: f_get_status()
MCP-->>Agent: {index.status: "ready"}
Agent->>MCP: a_semantic_search(query)
MCP-->>Agent: [results]Rules for agents:
- Always call
f_get_statusfirst — check workspace and indexing status - Use
e_set_workspaceif workspace is wrong — before any indexing - Poll
f_get_statusuntilindex.status: "ready"before relying on search results - Progressive search is supported —
a_semantic_searchworks during indexing with partial results SMART_CODING_AUTO_INDEX_DELAY=falseby default — useb_index_codebasefor explicit on-demand indexing in multi-agent setups
Indexing Architecture Internals
The sister project markdown-rag (Python/asyncio) wraps every sync operation in asyncio.to_thread() to prevent blocking the event loop. This project doesn't need that — here's why:
| Operation | Python asyncio | Node.js |
| ----------------------------- | ------------------------------------------- | ---------------------------------------------------------------- |
| File I/O (stat, readFile) | Sync by default — blocks event loop | Async by default — fs.promises.* runs on libuv thread pool |
| Network I/O (Milvus gRPC) | milvus_client.delete() — sync, blocks | Native async via Promises |
| CPU-bound (embedding) | GIL limits to_thread effectiveness | Worker threads — true multi-core parallelism |
| CPU-bound (chunking) | to_thread offload needed | Event loop yields between await points |
In Python, calling os.stat() or milvus_client.insert() inside an async def function freezes the entire event loop until the call completes. That's why markdown-rag needs 7 separate asyncio.to_thread() wrappers across its pipeline.
In Node.js, await fs.stat(file) dispatches to the libuv thread pool automatically. The event loop stays responsive and can handle other MCP requests (e.g., f_get_status, a_semantic_search) while file I/O executes in the background.
The only CPU-bound bottleneck — embedding computation — is offloaded to Worker threads (worker_threads module) for true multi-core parallelism. See processChunksWithWorkers() in features/index-codebase.js.
graph TD
A["MCP Request: b_index_codebase"] --> B["handleToolCall()"]
B --> C["startBackgroundIndexing() — fire-and-forget"]
C --> D["indexAll()"]
D --> E["discoverFiles() — async fs via libuv"]
E --> F["sortFilesByPriority() — async stat via libuv"]
F --> G["Per-file: fs.stat + fs.readFile — async"]
G --> H{"Workers available?"}
H -->|Yes| I["processChunksWithWorkers() — multi-core"]
H -->|No| J["processChunksSingleThreaded() — fallback"]
I --> K["Batch insert to vector store"]
J --> K
K --> L["cache.save()"]Unlike markdown-rag's explicit 3-way delta (new_files / modified_files / deleted_files), this project uses a 2-phase mtime→hash check that handles new and modified files in a single code path:
For each file:
1. mtime unchanged? → skip (definitely unchanged)
2. mtime changed → read content → compute hash
3. hash unchanged? → update cached mtime, skip
4. hash changed? → removeFileFromStore() + re-chunk + re-embed
5. Not in cache? → removeFileFromStore() + re-chunk + re-embed (new file)New files hit removeFileFromStore() (step 5), which is technically a no-op since there are no existing vectors for that file. This differs from markdown-rag, which explicitly skips delete for new files:
| Aspect | semantic-code-mcp | markdown-rag |
| ------------------------ | --------------------------------------- | -------------------------------------------------- |
| Delta classification | 2-way (changed vs unchanged) | 3-way (new / modified / deleted) |
| New file handling | removeFileFromStore() → no-op | Explicit skip — no delete call |
| Delete cost per file | SQLite DELETE WHERE file=? — <1ms | Milvus gRPC delete() — 10–50ms |
| Impact of 1000 new files | <1s total waste (negligible) | 10–50s waste (significant) |
| Deleted file pruning | Batch prune in indexAll() step 1.5 | get_index_delta_detailed() returns explicit list |
Why this design is acceptable: The vector store backend matters. With SQLite, a no-op delete is a sub-millisecond local operation. With Milvus (network gRPC), each no-op delete costs 10–50ms of round-trip time — that's why markdown-rag invested in the 3-way classification to eliminate 1,288 unnecessary gRPC calls.
Future consideration: If the Milvus backend (
milvus-cache.js) shows measurable overhead on large new-file batches, a 3-way delta classification can be introduced. Currently, benchmarks show no meaningful difference for typical codebases (<5,000 files).
Privacy
- Local mode: everything runs on your machine. Code never leaves your system.
- API mode: code chunks are sent to the embedding API for vectorization. No telemetry beyond provider API calls.
Agent Rules (AGENTS.md Integration)
This server has a mandatory search role defined in AGENTS.md:
## Search Role: semantic-code-mcp
- **Code semantic search**. Use after grep narrows scope, or when grep can't find the logic.
`a_semantic_search(query, maxResults=5)`. For duplicate detection: `maxResults=10`.
### DUAL-SEARCH MANDATE
You MUST use at least 2 tools per search. Single-tool search is FORBIDDEN.
### Decision Table
| What you need | 1st tool | 2nd tool (REQUIRED) | NEVER use |
| ------------------------- | ---------------- | ------------------- | --------- |
| Exact symbol / function | `grep` | Code RAG or view | Doc RAG |
| Code logic understanding | Code RAG | grep → `view_file` | Doc RAG |
| Config value across files | `grep --include` | Doc RAG | — |
### Parameters
- maxResults: quick=3, general=5, comprehensive/dedup=10.
- scopePath: ALWAYS set when target project is known.
- Query language: English for code search.
### Anti-patterns (FORBIDDEN)
- ❌ Doc RAG to find code locations → ✅ grep or Code RAG
- ❌ Code RAG for Korean workflow docs → ✅ Doc RAG
- ❌ Single-tool search → ✅ Always 2+ toolsSource:
AGENTS.md §3 Search,rag/SKILL.md,07.5-검색-도구-벤치마크
License
MIT License
Copyright (c) 2025 Omar Haris (original), bitkyc08 (modifications, 2026)
See LICENSE for full text.
About
This project is a fork of smart-coding-mcp by Omar Haris, heavily extended for production use.
Key additions over upstream:
- Multi-provider embeddings (Gemini, Vertex AI, OpenAI, OpenAI-compatible)
- Milvus vector store with ANN search for large codebases
- Hybrid search scoring (semantic × 0.7 + lexical boost up to +1.5)
- Per-file dedup in search results for diverse output
- AST-based code chunking via Tree-sitter
- Resource throttling (CPU cap at 50%)
- Runtime workspace switching (
e_set_workspace) - Package version checker across 20+ registries (
d_check_last_version) - Comprehensive IDE setup guides (VS Code, Cursor, Windsurf, Claude Desktop, Antigravity)
- Reconciliation sweep — post-index Milvus↔disk orphan cleanup
