harper-fabric-onnx
v0.3.0
Published
ONNX Runtime embedding wrapper for Harper Fabric. Runs inference in a dedicated child process — one model instance, one thread pool, no global state races.
Downloads
504
Maintainers
Readme
harper-fabric-onnx
ONNX Runtime embedding wrapper for Harper. Runs inference in a dedicated child process — one model instance, one thread pool, no global state races across Harper workers.
Same public API as harper-fabric-embeddings so harper-kb can swap backends by changing one import.
Install
npm install harper-fabric-onnxRequires Node.js 22+.
Usage
import {
downloadModel,
init,
embed,
embedBatch,
dimensions,
dispose,
} from "harper-fabric-onnx";
// Download model files (one-time)
await downloadModel(".models");
// Initialize — spawns child process, loads ONNX model
await init({ modelsDir: ".models" });
// Single embedding (768-dim, L2-normalized)
const vec = await embed("Hello world");
// Query-optimized embedding (uses "search_query:" prefix)
const queryVec = await embed("What is Harper?", "query");
// Batch embedding
const vecs = await embedBatch(["First text", "Second text"]);
// Get model dimensions
dimensions(); // 768
// Cleanup
await dispose();How it works
ONNX Runtime has a global singleton and process-wide thread pool, so it can't safely run per-worker in Harper's multi-worker architecture. This package runs ONNX in a dedicated child process and routes all worker calls to it via a Unix domain socket.
init()— spawns a child process (or connects to an existing one), loads the ONNX model and tokenizerembed()/embedBatch()— sends text over the socket, child tokenizes + runs inference, returns L2-normalized vectors- One child process is shared across all Harper worker threads
- Stale process detection via PID files — auto-recovers from crashes
Supported models
| Model | Repo | Dimensions |
| ---------------------------- | -------------------------------- | ---------- |
| nomic-embed-text (default) | nomic-ai/nomic-embed-text-v1.5 | 768 |
| nomic-embed-text-v2-moe | nomic-ai/nomic-embed-text-v2-moe | 768 |
API
downloadModel(dir: string, modelName?: string): Promise<string>
Downloads model and tokenizer files from HuggingFace. Returns the model directory path.
init(options: InitOptions): Promise<void>
Spawns the child process and loads the model. Options:
modelsDir— directory containing model subdirectories (e.g.,.models)modelPath— direct path to a specific model directorymodelName— model name from the registry (default:nomic-embed-text)
embed(text: string, type?: 'document' | 'query'): Promise<number[]>
Returns an L2-normalized embedding vector. Default type is 'document'.
embedBatch(texts: string[], type?: 'document' | 'query'): Promise<number[][]>
Returns an array of L2-normalized embedding vectors.
dimensions(): number
Returns the dimensionality of the loaded model (e.g., 768).
dispose(): Promise<void>
Shuts down the child process and cleans up socket/PID files.
Harper component usage
// resources.js
const { Resource } = globalThis;
export class Embed extends Resource {
static loadAsInstance = false;
async post(_query, data) {
const { init, embed, embedBatch, dimensions } =
await import("harper-fabric-onnx");
await init({ modelsDir: process.env.ONNX_MODELS_DIR });
if (data.texts) {
const vecs = await embedBatch(data.texts);
return { dimensions: dimensions(), vectors: vecs };
}
const vec = await embed(data.text, data.type);
return { dimensions: dimensions(), vector: vec };
}
}License
MIT
