optimbed

v0.2.6

Published

6 days ago

Production-grade local embeddings library for Node.js using ONNX-backed Hugging Face models.

0High
0Medium
0Low

tailorlite

embeddings onnx huggingface optimbed rag nlp nodejs

Optimbed

Optimbed is a lightweight, flexible Node.js library for generating high-quality text embeddings. It supports both local ONNX Runtime models (via Hugging Face) and remote OpenAI-compatible embedding APIs—no cloud services required for local mode, and minimal setup for remote mode. By default it uses onnx-community/Qwen3-Embedding-0.6B-ONNX.

Install

npm install optimbed

Quick Start — Local (ONNX)

import { embedder } from "optimbed";

await embedder.init();
// Note: Optimbed does not force a specific device or dtype. Configure
// runtime/backends via transformers or your environment if needed.

const query = await embedder.embedQuery("What is Node.js?");
const doc = await embedder.embedDoc("Node.js is a JavaScript runtime built on Chrome's V8 engine.");

Quick Start — Remote (OpenAI-compatible)

import { embedder } from "optimbed";

await embedder.init({
  apiKey: "sk-...",                     // or set OPENAI_API_KEY env var
  model: "text-embedding-3-small",
  baseUrl: "https://api.openai.com/v1",
});

const query = await embedder.embedQuery("What is Node.js?");
const doc = await embedder.embedDoc("Node.js is a JavaScript runtime built on Chrome's V8 engine.");

// Batch embedding for higher throughput
const results = await embedder.embedBatch(["first text", "second text", "third text"]);

When baseUrl is provided, the embedder automatically uses remote mode and calls the OpenAI-compatible /embeddings endpoint.

Advanced Configuration

Local embedder

import { embedder } from "optimbed";

await embedder.init({
  model: "onnx-community/bge-m3-ONNX",
  timeoutMs: 60_000,
  maxChars: 20_000
});

Optimbed automatically selects appropriate query/document prefixes based on the model:

Qwen models: Uses instruction-based prefixes for query-aware embeddings
BGE models: "Prefix-free" (no prefixes) as they distinguish query/document by structure
Instructor models: Uses "question:" and "text:" prefixes
Custom: Override with queryPrefix and documentPrefix options

Optimbed no longer manages device/dtype selection. Configure preferred backends via transformers settings or your environment if you need to control CPU/GPU/DML choices.

Remote embedder

import { embedder } from "optimbed";

await embedder.init({
  model: "text-embedding-3-large",
  baseUrl: "https://api.openai.com/v1",   // default; override for self-hosted proxies
  apiKey: "sk-...",                         // or set OPENAI_API_KEY env var
  maxChars: 20_000,
  timeoutMs: 60_000,
  headers: { "X-Custom-Header": "value" }, // optional extra headers
  queryPrefix: "Represent this query: ",   // optional prefix override
  documentPrefix: "Represent this text: ",
});

The remote embedder targets any OpenAI-compatible /embeddings endpoint (e.g., self-hosted Ollama, vLLM, OpenAI proxies).

API

embedder.init(options?): Initializes model and runtime. Pass baseUrl for remote mode.
embedder.embed(text): Returns normalized embedding vector.
embedder.embedQuery(text): Embeds query text with optional query prefix.
embedder.embedDoc(text): Embeds document text with optional document prefix.
embedder.embedBatch(texts): Embeds an array of texts in batches (2000 per request), returning EmbeddedResult[].
embedder.dispose(): Frees model and runtime resources (local mode only).
cosineSimilarity(a, b): Computes cosine similarity for equal-length vectors.

`EmbedderInitOptions`

| Option | Type | Description | |---|---|---| | model | string | Model name (default: "onnx-community/Qwen3-Embedding-0.6B-ONNX" for local, "text-embedding-3-small" for remote) | | device | "cpu" \| "webgpu" \| "cuda" \| "directml" | Device for local mode | | dtype | "fp32" \| "fp16" \| "q8" | Data type for local mode | | maxChars | number | Max input characters (default: 20000) | | timeoutMs | number | Operation timeout in ms (default: 60000) | | queryPrefix | string | Prefix for query text | | documentPrefix | string | Prefix for document text | | baseUrl | string | Remote API base URL (enables remote mode) | | apiKey | string | API key, or set OPENAI_API_KEY env var | | headers | Record<string, string> | Extra HTTP headers for remote requests |

Safety and Reliability

Input size validation with configurable maxChars.
Operation timeout (timeoutMs) to bound long-running work.
Strict TypeScript API and linted codebase.
Proper resource cleanup with dispose() method (local embedder).

Cleanup & Lifecycle

For proper resource management in long-running applications or tests, call dispose() before process exit (local embedder):

import { embedder } from "optimbed";

await embedder.init();
const result = await embedder.embed("text");
await embedder.dispose(); // Free model and runtime resources

For Node.js servers, add a graceful shutdown handler:

process.on("SIGTERM", async () => {
  await embedder.dispose();
  process.exit(0);
});

Scripts

npm run validate
npm run build
npm run demo
npm run demo:remote
npm run audit

Real Usage Demo

After building the project, run the end-to-end demo program:

npm run demo -- "What is Node.js?" "Node.js is a JavaScript runtime built on Chrome's V8 engine."

For remote embedding:

npm run demo:remote -- "What is Node.js?" "Node.js is a JavaScript runtime built on Chrome's V8 engine."

License

MIT