codebase-indexer
v1.2.1
Published
Semantic codebase indexer with Ollama + Qdrant, MCP server for Claude Code
Downloads
362
Readme
codebase-indexer
/\_/\ codebase-indexer
( •.•) ─────────────────
⊂/ 🌰 \⊃ Semantic code search
/| |\ powered by AIA semantic codebase indexer that chunks your source code, generates embeddings via Ollama, and stores them in Qdrant for natural-language code search. Ships as an MCP server for Claude Code and Codex integration, and also works as a standalone CLI.
Features
- Semantic search — find code by meaning, not just keywords (
"retry logic with exponential backoff") - 18 languages with symbol-aware chunking (functions, classes, interfaces, etc.)
- 70+ file extensions recognized
- Incremental indexing — only re-indexes changed files (MD5 content hashing)
- MCP server — plug directly into Claude Code or Codex as a tool provider
- File watcher — auto-reindex on save (500ms debounce)
- Interactive CLI — squirrel mascot, spinners, colored output
- Zero config — sensible defaults, works out of the box with
npx
Quick Start
One-Line Install
The fastest way to get started — checks prerequisites, sets up infrastructure, indexes your project, and configures integrations interactively:
bash <(curl -fsSL https://raw.githubusercontent.com/ygtdgn/codebase-indexer/main/install.sh)Or equivalently:
curl -fsSL https://raw.githubusercontent.com/ygtdgn/codebase-indexer/main/install.sh | bashManual Setup
Prerequisites
1. Set up infrastructure
npx codebase-indexer initThis starts a Qdrant container via Docker Compose, verifies your Ollama connection, and pulls the embedding model if needed.
2. Index your project
npx codebase-indexer index ./your-project3. Search
npx codebase-indexer search "authentication middleware"4. Use with Claude Code or Codex
# Set up Claude Code integration (writes CLAUDE.md + .mcp.json)
npx codebase-indexer index ./your-project --setup-claude
# Set up Codex integration (writes AGENTS.md + .codex/config.toml)
npx codebase-indexer index ./your-project --setup-codex
# Set up both interactively
npx codebase-indexer index ./your-project --setup
# Global MCP setup (user-level config files)
npx codebase-indexer index ./your-project --setup --setup-globallyCLI Reference
Global Options
| Option | Default | Description |
|--------|---------|-------------|
| --ollama-url <url> | http://localhost:11434 | Ollama API URL |
| --qdrant-url <url> | http://localhost:6333 | Qdrant API URL |
| --model <name> | qwen3-embedding:0.6b | Embedding model name |
| --dim <number> | 512 | Embedding vector dimension |
| --dir <path> | . | Directory to index/watch |
| --collection <name> | codebase | Qdrant collection name |
| --no-watch | — | Disable file watching in MCP mode |
Commands
init
Set up Qdrant (Docker) and check Ollama connection.
codebase-indexer init- Creates
~/.codebase-indexer/docker-compose.yml - Starts Qdrant container (
qdrant/qdrant:latest) - Waits for health check (30s timeout)
- Verifies Ollama and auto-pulls the embedding model
index [directory]
Index a directory for semantic search.
codebase-indexer index ./my-project [options]| Option | Description |
|--------|-------------|
| --setup | Interactively choose setup targets (Claude/Codex) |
| --setup-claude | Write CLAUDE.md + .mcp.json for Claude Code |
| --setup-codex | Write AGENTS.md + .codex/config.toml for Codex |
| --setup-globally | Write MCP config to user-level files instead of project-local |
| --force | Re-index all files, ignoring cached hashes |
search <query>
Semantic search across the indexed codebase.
codebase-indexer search "database connection pooling" -k 5
codebase-indexer search "error handling" -l typescript| Option | Default | Description |
|--------|---------|-------------|
| -k, --top-k <number> | 10 | Number of results |
| -l, --language <lang> | — | Filter by language |
status
Check health of Ollama and Qdrant, show index statistics.
codebase-indexer statusconfig
Interactively edit persistent settings.
codebase-indexer configOpens an interactive menu to edit Ollama URL, Qdrant URL, embedding model, dimension, and collection name. Settings are saved to ~/.codebase-indexer/config.json and loaded by all commands automatically.
Default (no subcommand)
Start the MCP server over stdio.
codebase-indexer --dir ./my-projectMCP Server
When run without a subcommand (or via an MCP config), codebase-indexer starts as an MCP server using stdio JSON-RPC transport. It exposes five tools:
| Tool | Description |
|------|-------------|
| search_code | Semantic search with optional language and file_path_prefix filters |
| index_file | Index or re-index a single file |
| index_directory | Incrementally index an entire directory |
| get_index_status | Health check and index statistics |
| delete_file | Remove a file from the index |
Example .mcp.json
{
"mcpServers": {
"codebase-indexer": {
"command": "npx",
"args": ["codebase-indexer", "--dir", "/absolute/path/to/project"]
}
}
}Example usage in Claude Code
search_code({ query: "retry logic with exponential backoff", top_k: 5 })
search_code({ query: "error handling", language: "typescript", file_path_prefix: "src/api/" })
index_file({ path: "src/new-module.ts" })
index_directory({})
get_index_status({})
delete_file({ path: "src/old-module.ts" })Configuration
Settings are resolved in this order (last wins):
Hardcoded defaults → ~/.codebase-indexer/config.json → Environment variables → CLI flagsEnvironment Variables
| Variable | Maps to |
|----------|---------|
| OLLAMA_URL | --ollama-url |
| QDRANT_URL | --qdrant-url |
| EMBEDDING_MODEL | --model |
| EMBEDDING_DIM | --dim |
| COLLECTION_NAME | --collection |
Persistent Config
Run codebase-indexer config to interactively set values, or manually create ~/.codebase-indexer/config.json:
{
"ollamaUrl": "http://localhost:11434",
"qdrantUrl": "http://localhost:6333",
"model": "qwen3-embedding:0.6b",
"embeddingDim": 512,
"collectionName": "codebase"
}Collection Name Auto-Derivation
When using the default collection name (codebase) and indexing a specific directory, the collection name is automatically derived from the directory name:
codebase-indexer index ./my-cool-project
# → collection: "codebase-my-cool-project"Architecture
index.ts (CLI entry, commander.js)
→ cli/commands.ts (command handlers)
→ core/indexer.ts (orchestrator)
→ core/chunker.ts (symbol-based splitting, sliding window fallback)
→ core/embedder.ts (Ollama /api/embed client, MRL truncation + L2 normalize)
→ core/vectorstore.ts (Qdrant client, cosine similarity search)
→ mcp/server.ts (MCP stdio transport, 5 tools)
→ watcher/watcher.ts (chokidar, 500ms debounce, feeds into indexer)Pipeline
Discover files → Chunk code → Embed via Ollama → Store in QdrantFile discovery — uses
git ls-files(fast, respects.gitignore) with glob fallback. Filters by 54 code extensions, skips lock files, respects size limits (1 MB default).Chunking — dual strategy per file:
- Symbol-based: regex patterns detect functions, classes, interfaces, etc. for 18 languages
- Sliding window fallback: used when symbols cover <50% of the file. Default 1500 chars with 200-char overlap.
Embedding — batches of 16 chunks sent to Ollama's
/api/embedendpoint. Vectors are MRL-truncated to the target dimension and L2-normalized. Retry with exponential backoff (3 attempts).Storage — chunks upserted to Qdrant with deterministic IDs (
MD5(path:startLine:endLine)). Payload indices onfile_path,language, andchunk_typefor filtered search. Cosine similarity.Incremental indexing — each file's content hash is stored in the Qdrant payload. On re-index, unchanged files are skipped entirely.
Supported Languages (Symbol Detection)
| Language | Detected Symbols | |----------|-----------------| | TypeScript | functions, classes, interfaces, types, enums, arrow functions | | JavaScript | functions, classes, arrow functions, module.exports | | Python | functions, classes | | Go | functions, types (struct, interface) | | Rust | functions, structs, enums, traits, impl blocks | | Java | classes, interfaces, methods | | Kotlin | functions, classes, interfaces, objects | | Ruby | functions, classes, modules | | PHP | functions, classes, interfaces, traits | | Swift | functions, classes, structs, protocols, enums | | C# | methods, classes, interfaces | | Scala | functions, classes, traits, objects | | C | functions, structs, typedefs | | C++ | functions, classes, structs, namespaces, templates | | Elixir | functions, private functions, modules | | Haskell | type signatures, data types, classes, instances | | Dart | functions, classes, mixins | | Zig | functions, const structs |
All other recognized file types fall back to sliding window chunking.
File Watcher
In MCP mode, the file watcher is enabled by default:
- Watches for file
add,change, andunlinkevents - 500ms debounce before processing
- 300ms write-finish stability detection
- Batch processing with retry (max 3 attempts, exponential backoff)
- Automatically re-indexes modified files and removes deleted ones
Disable with --no-watch.
Development
git clone https://github.com/ygtdgn/codebase-indexer.git
cd codebase-indexer
npm install
npm run dev # tsx watch mode (auto-rebuild on save)Build
npm run build # tsc → dist/Test
npm test # run once (vitest)
npm run test:watch # watch modeTest coverage includes chunker (symbol detection, sliding window), embedder (truncation, normalization), file utilities (extension mapping, discovery), and hash functions (MD5, chunk IDs).
Project Structure
src/
├── index.ts # CLI entry point (commander.js)
├── config/
│ └── config.ts # Config interface, defaults, env vars, persistent config
├── cli/
│ ├── commands.ts # Command handlers (init, index, search, status, config)
│ ├── mascot.ts # Squirrel ASCII art with gradient colors
│ └── ui.ts # Spinners, progress formatting, result display
├── core/
│ ├── indexer.ts # Central orchestrator
│ ├── chunker.ts # Symbol-based + sliding window chunking
│ ├── embedder.ts # Ollama embedding client
│ └── vectorstore.ts # Qdrant CRUD operations
├── mcp/
│ └── server.ts # MCP stdio server (5 tools)
├── watcher/
│ └── watcher.ts # File change detection + auto-reindex
├── utils/
│ ├── files.ts # File discovery, language detection
│ ├── hash.ts # MD5, chunk IDs, index hashing
│ └── logger.ts # stderr logging (warn, error)
└── __tests__/
├── chunker.test.ts
├── embedder.test.ts
├── files.test.ts
└── hash.test.tsLicense
MIT
