@comfanion/usethis_search

v4.5.1

Published

6 days ago

OpenCode plugin: semantic search with query decomposition, RRF merge, and context-efficient workspace (v4.5.0)

0High
0Medium
0Low

evgeniyst

@comfanion/usethis_search

Semantic code search with graph-based context for OpenCode

Search code by meaning, not by text. Get related context automatically via code graph.

What is this?

An OpenCode plugin that adds smart search to your project:

Semantic search — finds code by meaning, even when words don't match
Hybrid search — combines vector similarity + BM25 keyword matching
Graph-based context — automatically attaches related code (imports, calls, type references) to search results
Two-phase indexing — BM25 + graph search available immediately (Phase 1), vector search after embedding (Phase 2)
Simplified API — 5 parameters, smart filter parsing, config-driven defaults
Automatic indexing — files are indexed on change, zero effort
Local vectorization — works offline, no API keys needed
Three indexes — separate for code, docs, and configs

Quick Start

Installation

npm install @comfanion/usethis_search

Configuration

Add to opencode.json:

{
  "plugin": ["@comfanion/usethis_search"]
}

First Run

On OpenCode startup, the plugin automatically:

Creates indexes for code and documentation
Phase 1: chunks files, builds code graph (fast, parallel) — BM25 search available immediately
Phase 2: embeds chunks into vectors — hybrid search available after completion

Indexing time estimates:

< 100 files — ~1 min
< 500 files — ~3 min
500+ files — ~10 min

Search API

The search tool has 5 parameters:

| Parameter | Type | Default | Description | |-----------|------|---------|-------------| | query | string | required | What you're looking for (semantic) | | index | string | "code" | Which index: code, docs, config | | limit | number | 10 | Number of results | | searchAll | boolean | false | Search across all indexes | | filter | string | — | Filter by path or language |

Search examples

// Basic semantic search
search({ query: "authentication logic" })

// Search documentation
search({ query: "how to deploy", index: "docs" })

// Search all indexes
search({ query: "database connection", searchAll: true })

// Filter by directory
search({ query: "tenant management", filter: "internal/domain/" })

// Filter by language
search({ query: "event handling", filter: "*.go" })
search({ query: "middleware", filter: "go" })

// Combined: directory + language
search({ query: "API routes", filter: "internal/**/*.go" })

// Substring match on file path
search({ query: "metrics", filter: "service" })

// More results
search({ query: "error handling", limit: 20 })

Filter syntax

The filter parameter is smart — it auto-detects what you mean:

| Input | Parsed as | |-------|-----------| | "internal/domain/" | Path prefix | | "*.go" or ".go" | Language filter (go) | | "go" or "python" | Language filter | | "internal/**/*.go" | Path prefix + language | | "service" | Substring match on file path |

Search output

Each result includes:

Score breakdown: Score: 0.619 (vec: 0.47, bm25: +0.04, kw: +0.11 | matched: "event", "correlation")
Rich metadata: language, function name, class name, heading context
File grouping: best chunk per file + "N matching sections" count
Related context: graph-expanded neighbors (imports, calls, type references)
Confidence signal: warning when top score < 0.45

When vectors are not yet available (Phase 2 in progress), search automatically falls back to BM25-only mode with a banner notification.

Index Management

CLI

# Reindex everything
bunx usethis_search reindex

# Check status
bunx usethis_search status

# List indexes
bunx usethis_search list

# Clear index
bunx usethis_search clear

Tool API

// List all indexes with stats
codeindex({ action: "list" })

// Check specific index status
codeindex({ action: "status", index: "code" })

// Reindex
codeindex({ action: "reindex", index: "code" })

Architecture

Two-Phase Indexing Pipeline

Phase 1 (fast, parallel, 5 workers):
  file -> read -> chunk -> regex analyze -> graph edges -> ChunkStore (SQLite)
  Result: BM25 + graph search available immediately

Phase 2 (batch, sequential):
  ChunkStore chunks -> batch embed (32/batch) -> LanceDB
  Result: vector/hybrid search becomes available

Search Strategy (auto-detect)

Has vectors? -> hybrid search (vector + BM25 + graph + keyword rerank)
No vectors?  -> BM25-only search (from ChunkStore + graph + keyword rerank)

Storage Layout

.opencode/
  vectors/
    code/
      lancedb/          # Vector embeddings (LanceDB)
      chunks.db         # Chunk content + metadata (SQLite, ChunkStore)
      hashes.json       # File hashes for change detection
    docs/
      lancedb/
      chunks.db
      hashes.json
  graph/
    code_graph.db       # Code relationships (SQLite, GraphDB)
    doc_graph.db        # Doc relationships (SQLite, GraphDB)
  vectorizer.yaml       # Configuration
  indexer.log           # Indexing log

Module Overview

| Module | Purpose | |--------|---------| | Core | | | vectorizer/index.ts | CodebaseIndexer, two-phase pipeline, search, singleton pool | | vectorizer/chunk-store.ts | SQLite chunk storage (BM25 without vectors) | | vectorizer/graph-db.ts | SQLite triple store for code relationships | | vectorizer/graph-builder.ts | Builds graph edges from code analysis | | vectorizer/bm25-index.ts | Inverted index for keyword search | | Chunking | | | vectorizer/chunkers/code-chunker.ts | Function/class-aware splitting | | vectorizer/chunkers/markdown-chunker.ts | Heading-aware splitting with hierarchy | | vectorizer/chunkers/chunker-factory.ts | Routes to correct chunker by file type | | Analysis | | | vectorizer/analyzers/regex-analyzer.ts | Regex-based code analysis (imports, calls, types) | | vectorizer/analyzers/lsp-analyzer.ts | LSP-based code analysis (definitions, references) | | vectorizer/analyzers/lsp-client.ts | Language Server Protocol client | | Search | | | vectorizer/hybrid-search.ts | Merge vector + BM25 scores | | vectorizer/query-cache.ts | LRU cache for query embeddings | | vectorizer/content-cleaner.ts | Remove noise (TOC, breadcrumbs, markers) | | vectorizer/metadata-extractor.ts | Extract file_type, language, tags, dates | | Tracking | | | vectorizer/search-metrics.ts | Search quality metrics | | vectorizer/usage-tracker.ts | Usage provenance tracking | | Tools | | | tools/search.ts | Search tool (5 params, smart filter, score breakdown) | | tools/codeindex.ts | Index management tool |

Graph-Based Context

The code graph tracks relationships between chunks:

imports — file A imports module B
calls — function A calls function B
references — code references a type/interface
implements — class implements an interface
extends — class extends another class
belongs_to — chunk belongs to file (structural)

When you search, results are automatically expanded with 1-hop graph neighbors. Related context is scored by edge_weight * cosine_similarity (or edge_weight * 0.7 in BM25-only mode) and filtered by min_relevance.

Singleton Indexer Pool

Multiple parallel searches share one CodebaseIndexer instance per (project, index) pair. No SQLite lock conflicts. Managed via getIndexer() / releaseIndexer() / destroyIndexer().

Configuration

Full config example

# .opencode/vectorizer.yaml
vectorizer:
  enabled: true
  auto_index: true
  model: "Xenova/all-MiniLM-L6-v2"
  debounce_ms: 1000

  cleaning:
    remove_toc: true
    remove_frontmatter_metadata: false
    remove_imports: false
    remove_comments: false

  chunking:
    strategy: "semantic"    # fixed | semantic
    markdown:
      split_by_headings: true
      min_chunk_size: 200
      max_chunk_size: 2000
      preserve_heading_hierarchy: true
    code:
      split_by_functions: true
      include_function_signature: true
      min_chunk_size: 300
      max_chunk_size: 1500
    fixed:
      max_chars: 1500

  search:
    hybrid: true
    bm25_weight: 0.3
    freshen: false              # Don't re-index on every search
    min_score: 0.35             # Minimum relevance cutoff
    include_archived: false
    default_limit: 10

  graph:
    enabled: true
    max_related: 4              # Max related chunks per result
    min_relevance: 0.5          # Min score for related context
    semantic_edges: false       # O(n^2) — enable only for small repos
    semantic_edges_max_chunks: 500
    lsp:
      enabled: true
      timeout_ms: 5000
    read_intercept: true

  quality:
    enable_metrics: false
    enable_cache: true

  indexes:
    code:
      enabled: true
      pattern: "**/*.{js,ts,jsx,tsx,mjs,cjs,py,go,rs,java,kt,swift,c,cpp,h,hpp,cs,rb,php,scala,clj}"
      ignore:
        - "**/node_modules/**"
        - "**/.git/**"
        - "**/dist/**"
        - "**/build/**"
        - "**/.opencode/**"
        - "**/vendor/**"
      hybrid: true
      bm25_weight: 0.3
    docs:
      enabled: true
      pattern: "docs/**/*.{md,mdx,txt,rst,adoc}"
      hybrid: false
      bm25_weight: 0.2
    config:
      enabled: false
      pattern: "**/*.{yaml,yml,json,toml,ini,env,xml}"
      hybrid: false
      bm25_weight: 0.3

  exclude:
    - node_modules
    - vendor
    - dist
    - build
    - out
    - __pycache__

Disable automatic indexing

vectorizer:
  auto_index: false

Skip auto-index via env

export OPENCODE_SKIP_AUTO_INDEX=1

Debugging

Enable logs

export DEBUG=vectorizer
# or all logs
export DEBUG=*

Indexing activity is logged to .opencode/indexer.log.

Technical Details

Vectorization: @xenova/transformers (ONNX Runtime)
Vector DB: LanceDB (local, serverless)
Chunk Store: bun:sqlite (WAL mode, concurrent reads)
Graph DB: bun:sqlite (WAL mode, triple store)
Model: Xenova/all-MiniLM-L6-v2 (multilingual, 384 dimensions, ~23 MB)
Embedding speed: ~0.5 sec/file
Phase 1 speed: ~0.05 sec/file (no embedding)
Supported languages: JavaScript, TypeScript, Python, Go, Rust, Java, Kotlin, Swift, C/C++, C#, Ruby, PHP, Scala, Clojure

License

MIT

Made by the Comfanion team

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

@comfanion/usethis_search

What is this?

Quick Start

Installation

Configuration

First Run

Search API

Search examples

Filter syntax

Search output

Index Management

CLI

Tool API

Architecture

Two-Phase Indexing Pipeline

Search Strategy (auto-detect)

Storage Layout

Module Overview

Graph-Based Context

Singleton Indexer Pool

Configuration

Full config example

Disable automatic indexing

Skip auto-index via env

Debugging

Enable logs

Technical Details

License