@dreb/semantic-search
v2.6.1
Published
Semantic codebase search engine with embedding-based ranking and MCP server
Maintainers
Readme
@dreb/semantic-search
Semantic codebase search engine with embedding-based ranking and an MCP server. Extracts and indexes code using tree-sitter for AST-aware chunking and a transformer embedding model (all-MiniLM-L6-v2), then ranks results using 6-signal fusion via POEM.
Requirements
- Node.js 22+ — uses the built-in
node:sqlitemodule
Installation
npm install @dreb/semantic-searchClaude Code Plugin
The package ships as a Claude Code plugin. Add the dreb marketplace and install:
/plugin marketplace add aebrer/dreb
/plugin install semantic-search@drebOr from the CLI outside a session:
claude plugin marketplace add aebrer/dreb
claude plugin install semantic-search@drebAlternatively, register the MCP server directly without the plugin system:
claude mcp add --transport stdio semantic-search -- npx @dreb/semantic-search semantic-search-mcpFor local development from a cloned repo:
claude --plugin-dir ./packages/semantic-searchMCP Server
The package exposes a search tool over the Model Context Protocol (stdio transport). The tool accepts:
| Parameter | Required | Description |
| ------------ | -------- | ------------------------------------------------ |
| query | yes | Natural language, identifier, or path query |
| projectDir | yes | Absolute path to the project directory to search |
| path | no | Restrict search to files under this path |
| limit | no | Maximum results to return (default: 20) |
| rebuild | no | Force a clean index rebuild (default: false) |
Start the server standalone:
npx @dreb/semantic-search semantic-search-mcpHow Ranking Works
Results are ranked by fusing 6 independent signals using POEM (Pareto-Optimal Embedded Modeling) weights that vary per query type:
| Signal | Description | | --------------------- | -------------------------------------------------------------- | | BM25 | Keyword matching via FTS5 full-text search | | Cosine similarity | Embedding-based semantic similarity using all-MiniLM-L6-v2 | | Path match | Query terms appearing in the file path | | Symbol match | Query terms matching function, class, or type names | | Import graph | Proximity to high-scoring files in the import/dependency graph | | Git recency | Recently modified files ranked higher |
Queries are automatically classified as identifier, natural language, or path queries, and each type applies different POEM column weights. POEM constructs a Pareto front over all signal dimensions and assigns ranks based on dominance depth — no manual weight tuning required. See Pareto-Optimal Embedded Modeling for the theoretical foundation.
Library API
import { SearchEngine } from "@dreb/semantic-search";
const engine = new SearchEngine("/path/to/project", {
indexDir: "/custom/index/path", // default: <projectRoot>/.search-index
globalMemoryDir: "~/.dreb/memory", // additional directory to index
modelCacheDir: "~/.cache/models", // default: ~/.cache/semantic-search/models
visibleDirs: (root) => [`${root}/.special`], // extra dirs (bypasses .gitignore)
});
// First call builds the index (10-60s); subsequent calls are fast
const results = await engine.search("where is auth handled", {
limit: 20,
pathFilter: "src/",
onProgress: (phase, current, total) => console.log(`${phase}: ${current}/${total}`),
});
const stats = engine.getStats(); // { files, chunks } | null
await engine.resetIndex(); // delete index, next search rebuilds
await engine.close(); // dispose resources
SearchEngine.isAvailable(); // check for node:sqliteWhat Gets Indexed
- Code — tree-sitter AST chunks (functions, classes, methods, interfaces, etc.). TypeScript, JavaScript, Python, Go, Rust, Java, C, C++, GDScript.
- Text — Markdown (by heading), YAML/TOML (by key), JSON, plaintext (by paragraph). Also indexes Godot scene (
.tscn), resource (.tres), and project (.godot) files as plaintext. - Extra directories — via
globalMemoryDirorvisibleDirs, scanned even if gitignored.
The index is stored in .search-index/search.db at the project root (add .search-index/ to .gitignore).
License
MIT
