@brainbank/reranker

v0.1.1

Published

9 days ago

Qwen3 cross-encoder reranker plugin for BrainBank — 640MB local model, opt-in

Downloads

176

0High
0Medium
0Low

pinecall

@brainbank/reranker

Local cross-encoder reranker plugin for BrainBank. Runs Qwen3-Reranker-0.6B on-device via node-llama-cpp — no API keys, no network calls.

Why rerank?

Vector search returns results by embedding similarity, but embeddings can miss nuance. A cross-encoder reads the query + document together and scores relevance directly — typically improving precision by 15–30% on code search.

Without reranker:  query → vector search → results (good)
With reranker:     query → vector search → rerank top-K → results (better)

Install

npm install @brainbank/reranker node-llama-cpp

node-llama-cpp is a peer dependency. The GGUF model (~640MB) auto-downloads on first use and is cached at ~/.cache/brainbank/models/.

Usage

With the CLI

brainbank hsearch "auth middleware" --reranker qwen3

Programmatic

import { BrainBank } from 'brainbank';
import { code } from 'brainbank/code';
import { Qwen3Reranker } from '@brainbank/reranker';

const brain = new BrainBank({ repoPath: '.' })
  .use(code());

await brain.initialize();

const reranker = new Qwen3Reranker();

const results = await brain.hybridSearch('authentication guard', {
  reranker,
  maxResults: 10,
});

// Done? Release the model from memory
await reranker.close();

Options

new Qwen3Reranker({
  modelUri: 'hf:ggml-org/Qwen3-Reranker-0.6B-Q8_0-GGUF/...', // custom model
  cacheDir: '~/.cache/brainbank/models/',                       // cache location
  contextSize: 2048,                                            // context window
});

How it works

Lazy loading — model loads on first rank() call, not at import
Flash attention — 20× less VRAM than standard attention
Deduplication — identical documents scored once
Truncation — oversized documents are truncated by the tokenizer, not naively cut

Requirements

Node.js ≥ 18
~640MB disk for the model (auto-downloaded)
Works on macOS (Metal), Linux (CUDA/CPU), Windows (CPU)

License

MIT

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

@brainbank/reranker

Why rerank?

Install

Usage

With the CLI

Programmatic

Options

How it works

Requirements

License