npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

@comfanion/usethis_search

v4.5.1

Published

OpenCode plugin: semantic search with query decomposition, RRF merge, and context-efficient workspace (v4.5.0)

Readme

@comfanion/usethis_search

Semantic code search with graph-based context for OpenCode

Search code by meaning, not by text. Get related context automatically via code graph.


What is this?

An OpenCode plugin that adds smart search to your project:

  • Semantic search — finds code by meaning, even when words don't match
  • Hybrid search — combines vector similarity + BM25 keyword matching
  • Graph-based context — automatically attaches related code (imports, calls, type references) to search results
  • Two-phase indexing — BM25 + graph search available immediately (Phase 1), vector search after embedding (Phase 2)
  • Simplified API — 5 parameters, smart filter parsing, config-driven defaults
  • Automatic indexing — files are indexed on change, zero effort
  • Local vectorization — works offline, no API keys needed
  • Three indexes — separate for code, docs, and configs

Quick Start

Installation

npm install @comfanion/usethis_search

Configuration

Add to opencode.json:

{
  "plugin": ["@comfanion/usethis_search"]
}

First Run

On OpenCode startup, the plugin automatically:

  1. Creates indexes for code and documentation
  2. Phase 1: chunks files, builds code graph (fast, parallel) — BM25 search available immediately
  3. Phase 2: embeds chunks into vectors — hybrid search available after completion

Indexing time estimates:

  • < 100 files — ~1 min
  • < 500 files — ~3 min
  • 500+ files — ~10 min

Search API

The search tool has 5 parameters:

| Parameter | Type | Default | Description | |-----------|------|---------|-------------| | query | string | required | What you're looking for (semantic) | | index | string | "code" | Which index: code, docs, config | | limit | number | 10 | Number of results | | searchAll | boolean | false | Search across all indexes | | filter | string | — | Filter by path or language |

Search examples

// Basic semantic search
search({ query: "authentication logic" })

// Search documentation
search({ query: "how to deploy", index: "docs" })

// Search all indexes
search({ query: "database connection", searchAll: true })

// Filter by directory
search({ query: "tenant management", filter: "internal/domain/" })

// Filter by language
search({ query: "event handling", filter: "*.go" })
search({ query: "middleware", filter: "go" })

// Combined: directory + language
search({ query: "API routes", filter: "internal/**/*.go" })

// Substring match on file path
search({ query: "metrics", filter: "service" })

// More results
search({ query: "error handling", limit: 20 })

Filter syntax

The filter parameter is smart — it auto-detects what you mean:

| Input | Parsed as | |-------|-----------| | "internal/domain/" | Path prefix | | "*.go" or ".go" | Language filter (go) | | "go" or "python" | Language filter | | "internal/**/*.go" | Path prefix + language | | "service" | Substring match on file path |

Search output

Each result includes:

  • Score breakdown: Score: 0.619 (vec: 0.47, bm25: +0.04, kw: +0.11 | matched: "event", "correlation")
  • Rich metadata: language, function name, class name, heading context
  • File grouping: best chunk per file + "N matching sections" count
  • Related context: graph-expanded neighbors (imports, calls, type references)
  • Confidence signal: warning when top score < 0.45

When vectors are not yet available (Phase 2 in progress), search automatically falls back to BM25-only mode with a banner notification.


Index Management

CLI

# Reindex everything
bunx usethis_search reindex

# Check status
bunx usethis_search status

# List indexes
bunx usethis_search list

# Clear index
bunx usethis_search clear

Tool API

// List all indexes with stats
codeindex({ action: "list" })

// Check specific index status
codeindex({ action: "status", index: "code" })

// Reindex
codeindex({ action: "reindex", index: "code" })

Architecture

Two-Phase Indexing Pipeline

Phase 1 (fast, parallel, 5 workers):
  file -> read -> chunk -> regex analyze -> graph edges -> ChunkStore (SQLite)
  Result: BM25 + graph search available immediately

Phase 2 (batch, sequential):
  ChunkStore chunks -> batch embed (32/batch) -> LanceDB
  Result: vector/hybrid search becomes available

Search Strategy (auto-detect)

Has vectors? -> hybrid search (vector + BM25 + graph + keyword rerank)
No vectors?  -> BM25-only search (from ChunkStore + graph + keyword rerank)

Storage Layout

.opencode/
  vectors/
    code/
      lancedb/          # Vector embeddings (LanceDB)
      chunks.db         # Chunk content + metadata (SQLite, ChunkStore)
      hashes.json       # File hashes for change detection
    docs/
      lancedb/
      chunks.db
      hashes.json
  graph/
    code_graph.db       # Code relationships (SQLite, GraphDB)
    doc_graph.db        # Doc relationships (SQLite, GraphDB)
  vectorizer.yaml       # Configuration
  indexer.log           # Indexing log

Module Overview

| Module | Purpose | |--------|---------| | Core | | | vectorizer/index.ts | CodebaseIndexer, two-phase pipeline, search, singleton pool | | vectorizer/chunk-store.ts | SQLite chunk storage (BM25 without vectors) | | vectorizer/graph-db.ts | SQLite triple store for code relationships | | vectorizer/graph-builder.ts | Builds graph edges from code analysis | | vectorizer/bm25-index.ts | Inverted index for keyword search | | Chunking | | | vectorizer/chunkers/code-chunker.ts | Function/class-aware splitting | | vectorizer/chunkers/markdown-chunker.ts | Heading-aware splitting with hierarchy | | vectorizer/chunkers/chunker-factory.ts | Routes to correct chunker by file type | | Analysis | | | vectorizer/analyzers/regex-analyzer.ts | Regex-based code analysis (imports, calls, types) | | vectorizer/analyzers/lsp-analyzer.ts | LSP-based code analysis (definitions, references) | | vectorizer/analyzers/lsp-client.ts | Language Server Protocol client | | Search | | | vectorizer/hybrid-search.ts | Merge vector + BM25 scores | | vectorizer/query-cache.ts | LRU cache for query embeddings | | vectorizer/content-cleaner.ts | Remove noise (TOC, breadcrumbs, markers) | | vectorizer/metadata-extractor.ts | Extract file_type, language, tags, dates | | Tracking | | | vectorizer/search-metrics.ts | Search quality metrics | | vectorizer/usage-tracker.ts | Usage provenance tracking | | Tools | | | tools/search.ts | Search tool (5 params, smart filter, score breakdown) | | tools/codeindex.ts | Index management tool |

Graph-Based Context

The code graph tracks relationships between chunks:

  • imports — file A imports module B
  • calls — function A calls function B
  • references — code references a type/interface
  • implements — class implements an interface
  • extends — class extends another class
  • belongs_to — chunk belongs to file (structural)

When you search, results are automatically expanded with 1-hop graph neighbors. Related context is scored by edge_weight * cosine_similarity (or edge_weight * 0.7 in BM25-only mode) and filtered by min_relevance.

Singleton Indexer Pool

Multiple parallel searches share one CodebaseIndexer instance per (project, index) pair. No SQLite lock conflicts. Managed via getIndexer() / releaseIndexer() / destroyIndexer().


Configuration

Full config example

# .opencode/vectorizer.yaml
vectorizer:
  enabled: true
  auto_index: true
  model: "Xenova/all-MiniLM-L6-v2"
  debounce_ms: 1000

  cleaning:
    remove_toc: true
    remove_frontmatter_metadata: false
    remove_imports: false
    remove_comments: false

  chunking:
    strategy: "semantic"    # fixed | semantic
    markdown:
      split_by_headings: true
      min_chunk_size: 200
      max_chunk_size: 2000
      preserve_heading_hierarchy: true
    code:
      split_by_functions: true
      include_function_signature: true
      min_chunk_size: 300
      max_chunk_size: 1500
    fixed:
      max_chars: 1500

  search:
    hybrid: true
    bm25_weight: 0.3
    freshen: false              # Don't re-index on every search
    min_score: 0.35             # Minimum relevance cutoff
    include_archived: false
    default_limit: 10

  graph:
    enabled: true
    max_related: 4              # Max related chunks per result
    min_relevance: 0.5          # Min score for related context
    semantic_edges: false       # O(n^2) — enable only for small repos
    semantic_edges_max_chunks: 500
    lsp:
      enabled: true
      timeout_ms: 5000
    read_intercept: true

  quality:
    enable_metrics: false
    enable_cache: true

  indexes:
    code:
      enabled: true
      pattern: "**/*.{js,ts,jsx,tsx,mjs,cjs,py,go,rs,java,kt,swift,c,cpp,h,hpp,cs,rb,php,scala,clj}"
      ignore:
        - "**/node_modules/**"
        - "**/.git/**"
        - "**/dist/**"
        - "**/build/**"
        - "**/.opencode/**"
        - "**/vendor/**"
      hybrid: true
      bm25_weight: 0.3
    docs:
      enabled: true
      pattern: "docs/**/*.{md,mdx,txt,rst,adoc}"
      hybrid: false
      bm25_weight: 0.2
    config:
      enabled: false
      pattern: "**/*.{yaml,yml,json,toml,ini,env,xml}"
      hybrid: false
      bm25_weight: 0.3

  exclude:
    - node_modules
    - vendor
    - dist
    - build
    - out
    - __pycache__

Disable automatic indexing

vectorizer:
  auto_index: false

Skip auto-index via env

export OPENCODE_SKIP_AUTO_INDEX=1

Debugging

Enable logs

export DEBUG=vectorizer
# or all logs
export DEBUG=*

Indexing activity is logged to .opencode/indexer.log.


Technical Details

  • Vectorization: @xenova/transformers (ONNX Runtime)
  • Vector DB: LanceDB (local, serverless)
  • Chunk Store: bun:sqlite (WAL mode, concurrent reads)
  • Graph DB: bun:sqlite (WAL mode, triple store)
  • Model: Xenova/all-MiniLM-L6-v2 (multilingual, 384 dimensions, ~23 MB)
  • Embedding speed: ~0.5 sec/file
  • Phase 1 speed: ~0.05 sec/file (no embedding)
  • Supported languages: JavaScript, TypeScript, Python, Go, Rust, Java, Kotlin, Swift, C/C++, C#, Ruby, PHP, Scala, Clojure

License

MIT


Made by the Comfanion team