hf-embedder
v0.2.1
Published
A local text embedding library for Node.js using HuggingFace ONNX models via Transformers.js
Maintainers
Readme
hf-embedder
Local text embedding for Node.js. Runs a HuggingFace ONNX model via Transformers.js — no Python, no external services.
import { Embedder } from 'hf-embedder'
const embedder = await Embedder.create({ model: 'Xenova/multilingual-e5-small' })
const vector = await embedder.embed('hello world')
// => number[384]
const batch = await embedder.embed(['cat', 'dog', 'fish'])
// => number[][3][384]Installation
npm install hf-embedderNode.js 20+ (ESM only).
Usage
import { Embedder } from 'hf-embedder'
// Pick any HuggingFace ONNX embedding model
const embedder = await Embedder.create({ model: 'Xenova/multilingual-e5-small' })
// Single string → number[]
const vec = await embedder.embed('your text here')
// Batch → number[][]
const vecs = await embedder.embed(['first', 'second', 'third'])
// Repeated input hits the in-memory cache (FIFO, default 100 entries)
const again = await embedder.embed('your text here')
// same values as vec, returned instantly without inferenceSync API
For environments where async inference is impractical (e.g., some bundlers, scripts), Embedder.createSync() returns a SyncEmbedder that runs the model on a background thread and blocks the calling thread via shared memory:
import { Embedder } from 'hf-embedder'
const embedder = Embedder.createSync({ model: 'Xenova/multilingual-e5-small' })
const vec = embedder.embedSync('hello world')
// => number[384]
const batch = embedder.embedSync(['cat', 'dog', 'fish'])
// => number[][3][384]The sync variant shares the same model cache (~/.hfembedder/.cache/models/), result cache semantics, and API shape — only the method name changes from embed to embedSync.
import { SyncEmbedder } from 'hf-embedder'
const embedder = new SyncEmbedder({ model: 'Xenova/multilingual-e5-small', cacheSize: 50 })
const vec = embedder.embedSync('hello')Options
interface EmbedderOptions {
model?: string // HF model ID (default: 'onnx-community/Qwen3-Embedding-0.6B-ONNX')
dtype?: string // quantization (default: 'q8')
device?: string // execution device, e.g. 'cpu', 'cuda', 'wasm'
pooling?: 'mean' | 'last_token' // pooling strategy (default: 'mean')
normalize?: boolean // L2 normalize output (default: true)
queue?: boolean // serialize inference calls (concurrency: 1)
concurrency?: number // set a specific concurrency limit
cacheSize?: number // in-memory result cache size (default 100, 0 to disable)
}// Custom model
const e = await Embedder.create({ model: 'other-org/my-embedding-model' })
// Use GPU
const e = await Embedder.create({ device: 'cuda' })
// Serial execution — safe for memory-constrained environments
const e = await Embedder.create({ queue: true })
// Limited parallelism
const e = await Embedder.create({ concurrency: 2 })
// No result caching
const e = await Embedder.create({ cacheSize: 0 })Default Model
- Model:
onnx-community/Qwen3-Embedding-0.6B-ONNX - Quantization:
q8 - Pipeline:
feature-extractionwith mean pooling + L2 normalization - Output dimension: 1024
- Cache:
~/.hfembedder/.cache/models/(auto-downloaded on first use)
For a smaller alternative, use Xenova/multilingual-e5-small (384-dim, ~90MB).
Docs
License
MIT
