@comfanion/usethis_search
v4.5.1
Published
OpenCode plugin: semantic search with query decomposition, RRF merge, and context-efficient workspace (v4.5.0)
Readme
@comfanion/usethis_search
Semantic code search with graph-based context for OpenCode
Search code by meaning, not by text. Get related context automatically via code graph.
What is this?
An OpenCode plugin that adds smart search to your project:
- Semantic search — finds code by meaning, even when words don't match
- Hybrid search — combines vector similarity + BM25 keyword matching
- Graph-based context — automatically attaches related code (imports, calls, type references) to search results
- Two-phase indexing — BM25 + graph search available immediately (Phase 1), vector search after embedding (Phase 2)
- Simplified API — 5 parameters, smart filter parsing, config-driven defaults
- Automatic indexing — files are indexed on change, zero effort
- Local vectorization — works offline, no API keys needed
- Three indexes — separate for code, docs, and configs
Quick Start
Installation
npm install @comfanion/usethis_searchConfiguration
Add to opencode.json:
{
"plugin": ["@comfanion/usethis_search"]
}First Run
On OpenCode startup, the plugin automatically:
- Creates indexes for code and documentation
- Phase 1: chunks files, builds code graph (fast, parallel) — BM25 search available immediately
- Phase 2: embeds chunks into vectors — hybrid search available after completion
Indexing time estimates:
- < 100 files — ~1 min
- < 500 files — ~3 min
- 500+ files — ~10 min
Search API
The search tool has 5 parameters:
| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| query | string | required | What you're looking for (semantic) |
| index | string | "code" | Which index: code, docs, config |
| limit | number | 10 | Number of results |
| searchAll | boolean | false | Search across all indexes |
| filter | string | — | Filter by path or language |
Search examples
// Basic semantic search
search({ query: "authentication logic" })
// Search documentation
search({ query: "how to deploy", index: "docs" })
// Search all indexes
search({ query: "database connection", searchAll: true })
// Filter by directory
search({ query: "tenant management", filter: "internal/domain/" })
// Filter by language
search({ query: "event handling", filter: "*.go" })
search({ query: "middleware", filter: "go" })
// Combined: directory + language
search({ query: "API routes", filter: "internal/**/*.go" })
// Substring match on file path
search({ query: "metrics", filter: "service" })
// More results
search({ query: "error handling", limit: 20 })Filter syntax
The filter parameter is smart — it auto-detects what you mean:
| Input | Parsed as |
|-------|-----------|
| "internal/domain/" | Path prefix |
| "*.go" or ".go" | Language filter (go) |
| "go" or "python" | Language filter |
| "internal/**/*.go" | Path prefix + language |
| "service" | Substring match on file path |
Search output
Each result includes:
- Score breakdown:
Score: 0.619 (vec: 0.47, bm25: +0.04, kw: +0.11 | matched: "event", "correlation") - Rich metadata: language, function name, class name, heading context
- File grouping: best chunk per file + "N matching sections" count
- Related context: graph-expanded neighbors (imports, calls, type references)
- Confidence signal: warning when top score < 0.45
When vectors are not yet available (Phase 2 in progress), search automatically falls back to BM25-only mode with a banner notification.
Index Management
CLI
# Reindex everything
bunx usethis_search reindex
# Check status
bunx usethis_search status
# List indexes
bunx usethis_search list
# Clear index
bunx usethis_search clearTool API
// List all indexes with stats
codeindex({ action: "list" })
// Check specific index status
codeindex({ action: "status", index: "code" })
// Reindex
codeindex({ action: "reindex", index: "code" })Architecture
Two-Phase Indexing Pipeline
Phase 1 (fast, parallel, 5 workers):
file -> read -> chunk -> regex analyze -> graph edges -> ChunkStore (SQLite)
Result: BM25 + graph search available immediately
Phase 2 (batch, sequential):
ChunkStore chunks -> batch embed (32/batch) -> LanceDB
Result: vector/hybrid search becomes availableSearch Strategy (auto-detect)
Has vectors? -> hybrid search (vector + BM25 + graph + keyword rerank)
No vectors? -> BM25-only search (from ChunkStore + graph + keyword rerank)Storage Layout
.opencode/
vectors/
code/
lancedb/ # Vector embeddings (LanceDB)
chunks.db # Chunk content + metadata (SQLite, ChunkStore)
hashes.json # File hashes for change detection
docs/
lancedb/
chunks.db
hashes.json
graph/
code_graph.db # Code relationships (SQLite, GraphDB)
doc_graph.db # Doc relationships (SQLite, GraphDB)
vectorizer.yaml # Configuration
indexer.log # Indexing logModule Overview
| Module | Purpose |
|--------|---------|
| Core | |
| vectorizer/index.ts | CodebaseIndexer, two-phase pipeline, search, singleton pool |
| vectorizer/chunk-store.ts | SQLite chunk storage (BM25 without vectors) |
| vectorizer/graph-db.ts | SQLite triple store for code relationships |
| vectorizer/graph-builder.ts | Builds graph edges from code analysis |
| vectorizer/bm25-index.ts | Inverted index for keyword search |
| Chunking | |
| vectorizer/chunkers/code-chunker.ts | Function/class-aware splitting |
| vectorizer/chunkers/markdown-chunker.ts | Heading-aware splitting with hierarchy |
| vectorizer/chunkers/chunker-factory.ts | Routes to correct chunker by file type |
| Analysis | |
| vectorizer/analyzers/regex-analyzer.ts | Regex-based code analysis (imports, calls, types) |
| vectorizer/analyzers/lsp-analyzer.ts | LSP-based code analysis (definitions, references) |
| vectorizer/analyzers/lsp-client.ts | Language Server Protocol client |
| Search | |
| vectorizer/hybrid-search.ts | Merge vector + BM25 scores |
| vectorizer/query-cache.ts | LRU cache for query embeddings |
| vectorizer/content-cleaner.ts | Remove noise (TOC, breadcrumbs, markers) |
| vectorizer/metadata-extractor.ts | Extract file_type, language, tags, dates |
| Tracking | |
| vectorizer/search-metrics.ts | Search quality metrics |
| vectorizer/usage-tracker.ts | Usage provenance tracking |
| Tools | |
| tools/search.ts | Search tool (5 params, smart filter, score breakdown) |
| tools/codeindex.ts | Index management tool |
Graph-Based Context
The code graph tracks relationships between chunks:
- imports — file A imports module B
- calls — function A calls function B
- references — code references a type/interface
- implements — class implements an interface
- extends — class extends another class
- belongs_to — chunk belongs to file (structural)
When you search, results are automatically expanded with 1-hop graph neighbors. Related context is scored by edge_weight * cosine_similarity (or edge_weight * 0.7 in BM25-only mode) and filtered by min_relevance.
Singleton Indexer Pool
Multiple parallel searches share one CodebaseIndexer instance per (project, index) pair. No SQLite lock conflicts. Managed via getIndexer() / releaseIndexer() / destroyIndexer().
Configuration
Full config example
# .opencode/vectorizer.yaml
vectorizer:
enabled: true
auto_index: true
model: "Xenova/all-MiniLM-L6-v2"
debounce_ms: 1000
cleaning:
remove_toc: true
remove_frontmatter_metadata: false
remove_imports: false
remove_comments: false
chunking:
strategy: "semantic" # fixed | semantic
markdown:
split_by_headings: true
min_chunk_size: 200
max_chunk_size: 2000
preserve_heading_hierarchy: true
code:
split_by_functions: true
include_function_signature: true
min_chunk_size: 300
max_chunk_size: 1500
fixed:
max_chars: 1500
search:
hybrid: true
bm25_weight: 0.3
freshen: false # Don't re-index on every search
min_score: 0.35 # Minimum relevance cutoff
include_archived: false
default_limit: 10
graph:
enabled: true
max_related: 4 # Max related chunks per result
min_relevance: 0.5 # Min score for related context
semantic_edges: false # O(n^2) — enable only for small repos
semantic_edges_max_chunks: 500
lsp:
enabled: true
timeout_ms: 5000
read_intercept: true
quality:
enable_metrics: false
enable_cache: true
indexes:
code:
enabled: true
pattern: "**/*.{js,ts,jsx,tsx,mjs,cjs,py,go,rs,java,kt,swift,c,cpp,h,hpp,cs,rb,php,scala,clj}"
ignore:
- "**/node_modules/**"
- "**/.git/**"
- "**/dist/**"
- "**/build/**"
- "**/.opencode/**"
- "**/vendor/**"
hybrid: true
bm25_weight: 0.3
docs:
enabled: true
pattern: "docs/**/*.{md,mdx,txt,rst,adoc}"
hybrid: false
bm25_weight: 0.2
config:
enabled: false
pattern: "**/*.{yaml,yml,json,toml,ini,env,xml}"
hybrid: false
bm25_weight: 0.3
exclude:
- node_modules
- vendor
- dist
- build
- out
- __pycache__Disable automatic indexing
vectorizer:
auto_index: falseSkip auto-index via env
export OPENCODE_SKIP_AUTO_INDEX=1Debugging
Enable logs
export DEBUG=vectorizer
# or all logs
export DEBUG=*Indexing activity is logged to .opencode/indexer.log.
Technical Details
- Vectorization: @xenova/transformers (ONNX Runtime)
- Vector DB: LanceDB (local, serverless)
- Chunk Store: bun:sqlite (WAL mode, concurrent reads)
- Graph DB: bun:sqlite (WAL mode, triple store)
- Model:
Xenova/all-MiniLM-L6-v2(multilingual, 384 dimensions, ~23 MB) - Embedding speed: ~0.5 sec/file
- Phase 1 speed: ~0.05 sec/file (no embedding)
- Supported languages: JavaScript, TypeScript, Python, Go, Rust, Java, Kotlin, Swift, C/C++, C#, Ruby, PHP, Scala, Clojure
License
MIT
Made by the Comfanion team
