npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

@engram-mem/rerank-onnx

v0.5.2

Published

Local cross-encoder reranker for Engram — runs mxbai-rerank-v1 via ONNX Runtime (no API calls)

Downloads

926

Readme

@engram-mem/rerank-onnx

Local cross-encoder reranker for Engram. Runs mxbai-rerank-v1 (DeBERTa-v2) via ONNX Runtime through @huggingface/transformers. No API calls at query time — weights are downloaded on first use and cached by HuggingFace locally.

Why

Engram's default reranker is an LLM pointwise scorer (gpt-4o-mini). That works, but:

  • Every recall that crosses the rerank threshold costs ~$0.001 in API calls.
  • Latency is dominated by the round-trip to OpenAI (~1-3s per rerank).
  • The ordering quality is capped by what gpt-4o-mini can discriminate over a tight JSON-scored list.

A purpose-built cross-encoder like mxbai-rerank-large-v1 typically gives stronger ordering with:

  • Zero API cost (model runs locally).
  • ~10-50ms inference per query after the model is loaded.
  • Better calibration — cross-encoders are trained on millions of rank-pair examples.

Install

npm install @engram-mem/rerank-onnx

macOS Intel note

[email protected]+ dropped darwin-x64 binaries. If you run Intel macOS, add an npm override:

"overrides": {
  "onnxruntime-node": "1.22.0"
}

Apple Silicon and Linux x64/arm64 work with the default version.

Usage

Compose with an existing intelligence adapter via object spread:

import { openaiIntelligence } from '@engram-mem/openai'
import { createOnnxReranker } from '@engram-mem/rerank-onnx'
import { createMemory } from '@engram-mem/core'

const openai = openaiIntelligence({ apiKey: process.env.OPENAI_API_KEY! })
const onnx = createOnnxReranker() // default: mxbai-rerank-large-v1 @ q8
await onnx.load() // optional — rerank() auto-loads on first call

const memory = createMemory({
  storage,
  intelligence: {
    ...openai,
    rerank: (query, docs) => onnx.rerank(query, docs),
  },
})

Options

createOnnxReranker({
  model: 'mixedbread-ai/mxbai-rerank-large-v1', // or -base-v1 / -xsmall-v1
  dtype: 'q8',        // 'fp32' | 'fp16' | 'q8' | 'q4'
  batchSize: 8,       // pairs per forward pass
  maxCandidates: 25,  // cap on docs reranked per call
  maxLength: 512,     // max tokens per pair
  maxDocChars: 1200,  // chars per doc before tokenization
})

Model variants

| Model | Params | q8 size | Quality | Speed | |------------------------------------|--------|---------|---------------|--------| | mxbai-rerank-large-v1 (default)| 435M | ~113MB | Best | Slower | | mxbai-rerank-base-v1 | 184M | ~47MB | Good | 3x faster | | mxbai-rerank-xsmall-v1 | 70M | ~17MB | Decent | Fastest |

In the MCP server (@engram-mem/mcp): just set ENGRAM_RERANK_LOCAL=true in the server's env — the MCP startup will dynamically import this package and spread its rerank over the openaiIntelligence adapter automatically. Pick the model variant via ENGRAM_RERANK_LOCAL_MODEL (default: mixedbread-ai/mxbai-rerank-large-v1).