qwen-embedder
v0.1.5
Published
A lightweight, optimized local text embedding generation using qwen model
Maintainers
Readme
qwen-embedder
Local text embedding for Node.js. Runs a quantized Qwen ONNX model via Transformers.js — no Python, no external services.
import { Embedder } from 'qwen-embedder'
const embedder = await Embedder.create()
const vector = await embedder.embed('hello world')
// => number[1024]
const batch = await embedder.embed(['cat', 'dog', 'fish'])
// => number[][3][1024]Installation
npm install qwen-embedderNode.js 20+ (ESM only).
Usage
import { Embedder } from 'qwen-embedder'
// First call downloads the model (~600MB q8) to
// ~/.qwenembedder/.cache/models/ — subsequent runs use cache
const embedder = await Embedder.create()
// Single string → number[]
const vec = await embedder.embed('your text here')
console.log(vec.length) // 1024
// Batch → number[][]
const vecs = await embedder.embed(['first', 'second', 'third'])
// Repeated input hits the in-memory cache (FIFO, default 100 entries)
const again = await embedder.embed('your text here')
// same values as vec, returned instantly without inferenceSync API
For environments where async inference is impractical (e.g., some bundlers, scripts), Embedder.createSync() returns a SyncEmbedder that runs the model on a background thread and blocks the calling thread via shared memory:
import { Embedder } from 'qwen-embedder'
const embedder = Embedder.createSync()
const vec = embedder.embedSync('hello world')
// => number[1024]
const batch = embedder.embedSync(['cat', 'dog', 'fish'])
// => number[][3][1024]The sync variant shares the same model cache (~/.qwenembedder/.cache/models/), result cache semantics, and API shape — only the method name changes from embed to embedSync.
import { SyncEmbedder } from 'qwen-embedder'
const embedder = new SyncEmbedder({ cacheSize: 50 })
const vec = embedder.embedSync('hello')Options
interface EmbedderOptions {
queue?: boolean // serialize inference calls (concurrency: 1)
concurrency?: number // set a specific concurrency limit
cacheSize?: number // in-memory result cache size (default 100, 0 to disable)
}// Serial execution — safe for memory-constrained environments
const e = await Embedder.create({ queue: true })
// Limited parallelism
const e = await Embedder.create({ concurrency: 2 })
// No result caching
const e = await Embedder.create({ cacheSize: 0 })Model
- Model:
onnx-community/Qwen3-Embedding-0.6B-ONNX - Quantization:
q8 - Pipeline:
feature-extractionwith mean pooling + L2 normalization - Output dimension: 1024
- Cache:
~/.qwenembedder/.cache/models/(auto-downloaded on first use)
Docs
License
MIT
