optimbed
v0.2.6
Published
Production-grade local embeddings library for Node.js using ONNX-backed Hugging Face models.
Maintainers
Readme
Optimbed
Optimbed is a lightweight, flexible Node.js library for generating high-quality text embeddings. It supports both local ONNX Runtime models (via Hugging Face) and remote OpenAI-compatible embedding APIs—no cloud services required for local mode, and minimal setup for remote mode. By default it uses onnx-community/Qwen3-Embedding-0.6B-ONNX.
Install
npm install optimbedQuick Start — Local (ONNX)
import { embedder } from "optimbed";
await embedder.init();
// Note: Optimbed does not force a specific device or dtype. Configure
// runtime/backends via transformers or your environment if needed.
const query = await embedder.embedQuery("What is Node.js?");
const doc = await embedder.embedDoc("Node.js is a JavaScript runtime built on Chrome's V8 engine.");Quick Start — Remote (OpenAI-compatible)
import { embedder } from "optimbed";
await embedder.init({
apiKey: "sk-...", // or set OPENAI_API_KEY env var
model: "text-embedding-3-small",
baseUrl: "https://api.openai.com/v1",
});
const query = await embedder.embedQuery("What is Node.js?");
const doc = await embedder.embedDoc("Node.js is a JavaScript runtime built on Chrome's V8 engine.");
// Batch embedding for higher throughput
const results = await embedder.embedBatch(["first text", "second text", "third text"]);When baseUrl is provided, the embedder automatically uses remote mode and calls the OpenAI-compatible /embeddings endpoint.
Advanced Configuration
Local embedder
import { embedder } from "optimbed";
await embedder.init({
model: "onnx-community/bge-m3-ONNX",
timeoutMs: 60_000,
maxChars: 20_000
});Optimbed automatically selects appropriate query/document prefixes based on the model:
- Qwen models: Uses instruction-based prefixes for query-aware embeddings
- BGE models: "Prefix-free" (no prefixes) as they distinguish query/document by structure
- Instructor models: Uses "question:" and "text:" prefixes
- Custom: Override with
queryPrefixanddocumentPrefixoptions
Optimbed no longer manages device/dtype selection. Configure preferred backends via transformers settings or your environment if you need to control CPU/GPU/DML choices.
Remote embedder
import { embedder } from "optimbed";
await embedder.init({
model: "text-embedding-3-large",
baseUrl: "https://api.openai.com/v1", // default; override for self-hosted proxies
apiKey: "sk-...", // or set OPENAI_API_KEY env var
maxChars: 20_000,
timeoutMs: 60_000,
headers: { "X-Custom-Header": "value" }, // optional extra headers
queryPrefix: "Represent this query: ", // optional prefix override
documentPrefix: "Represent this text: ",
});The remote embedder targets any OpenAI-compatible /embeddings endpoint (e.g., self-hosted Ollama, vLLM, OpenAI proxies).
API
embedder.init(options?): Initializes model and runtime. PassbaseUrlfor remote mode.embedder.embed(text): Returns normalized embedding vector.embedder.embedQuery(text): Embeds query text with optional query prefix.embedder.embedDoc(text): Embeds document text with optional document prefix.embedder.embedBatch(texts): Embeds an array of texts in batches (2000 per request), returningEmbeddedResult[].embedder.dispose(): Frees model and runtime resources (local mode only).cosineSimilarity(a, b): Computes cosine similarity for equal-length vectors.
EmbedderInitOptions
| Option | Type | Description |
|---|---|---|
| model | string | Model name (default: "onnx-community/Qwen3-Embedding-0.6B-ONNX" for local, "text-embedding-3-small" for remote) |
| device | "cpu" \| "webgpu" \| "cuda" \| "directml" | Device for local mode |
| dtype | "fp32" \| "fp16" \| "q8" | Data type for local mode |
| maxChars | number | Max input characters (default: 20000) |
| timeoutMs | number | Operation timeout in ms (default: 60000) |
| queryPrefix | string | Prefix for query text |
| documentPrefix | string | Prefix for document text |
| baseUrl | string | Remote API base URL (enables remote mode) |
| apiKey | string | API key, or set OPENAI_API_KEY env var |
| headers | Record<string, string> | Extra HTTP headers for remote requests |
Safety and Reliability
- Input size validation with configurable
maxChars. - Operation timeout (
timeoutMs) to bound long-running work. - Strict TypeScript API and linted codebase.
- Proper resource cleanup with
dispose()method (local embedder).
Cleanup & Lifecycle
For proper resource management in long-running applications or tests, call dispose() before process exit (local embedder):
import { embedder } from "optimbed";
await embedder.init();
const result = await embedder.embed("text");
await embedder.dispose(); // Free model and runtime resourcesFor Node.js servers, add a graceful shutdown handler:
process.on("SIGTERM", async () => {
await embedder.dispose();
process.exit(0);
});Scripts
npm run validate
npm run build
npm run demo
npm run demo:remote
npm run auditReal Usage Demo
After building the project, run the end-to-end demo program:
npm run demo -- "What is Node.js?" "Node.js is a JavaScript runtime built on Chrome's V8 engine."For remote embedding:
npm run demo:remote -- "What is Node.js?" "Node.js is a JavaScript runtime built on Chrome's V8 engine."License
MIT
