npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

@jsilvanus/embedeer

v1.4.0

Published

A node.js embedding tool with optional GPU acceleration

Readme

embedeer

Embedeer Logo: a deer with vector numbers between antlers. Logo generated by ChatGPT. Public Domain.

A Node.js Embedding Tool

npm version npm downloads release downloads

A Node.js tool for generating text embeddings using models from Hugging Face.

Supports batched input, parallel execution, isolated child-process workers (default) or in-process threads, quantization, optional GPU acceleration, and Hugging Face auth.


Features

  • Downloads any Hugging Face feature-extraction model on first use (cached in ~/.embedeer/models)
  • Isolated processes (default) — a worker crash cannot bring down the caller
  • In-process threads — opt-in via mode: 'thread' for lower overhead
  • Sequential execution when concurrency: 1
  • Configurable batch size and concurrency
  • GPU acceleration — optional CUDA (Linux x64) and DirectML (Windows x64), no extra packages needed
  • Hugging Face API token support (--token / HF_TOKEN env var)
  • Quantization via dtype (fp32 · fp16 · q8 · q4 · q4f16 · auto)
  • Rich CLI: pull model, embed from file, dump output as JSON / TXT / SQL

Installation

npm install @jsilvanus/embedeer

GPU acceleration (CUDA on Linux x64, DirectML on Windows x64) is built into onnxruntime-node which ships as a transitive dependency. No additional packages are required.

For CUDA on Linux x64 you also need the CUDA 12 system libraries:

# Ubuntu / Debian
sudo apt install cuda-toolkit-12-6 libcudnn9-cuda-12
## Programmatic API

Model management

Embedeer supports pre-caching and managing downloaded models.

  • Pull (pre-cache) a model via the CLI:
npx @jsilvanus/embedeer --model Xenova/all-MiniLM-L6-v2
  • Programmatic pre-cache using loadModel():
import { loadModel } from '@jsilvanus/embedeer';

const { modelName, cacheDir } = await loadModel('Xenova/all-MiniLM-L6-v2', {
  token: 'hf_...',    // optional HF token
  dtype: 'q8',        // optional quantization
  cacheDir: '/my/cache', // optional override
});
  • Cache location: default is ~/.embedeer/models. Override with the CLI --cache-dir option or the cacheDir argument to loadModel().

  • Removing cached models: delete the model directory from the cache. Example:

# Unix
rm -rf ~/.embedeer/models/Xenova-all-MiniLM-L6-v2

# PowerShell (Windows)
Remove-Item -Recurse -Force $env:USERPROFILE\.embedeer\models\Xenova-all-MiniLM-L6-v2
  • Advanced: see src/model-management.js for low-level cache helpers.

Explainer — deterministic LLM interface

This was deprecated and moved to npm package @jsilvanus/chattydeer in 1.3.0.

Input Sources

Embed texts (CPU — default)

import { Embedder } from '@jsilvanus/embedeer';

const embedder = await Embedder.create('Xenova/all-MiniLM-L6-v2', {
  batchSize:   32,          // texts per worker task   (default: 32)
  concurrency: 2,           // parallel workers        (default: 2)
  mode:       'process',    // 'process' | 'thread'    (default: 'process')
  pooling:    'mean',       // 'mean' | 'cls' | 'none' (default: 'mean')
  normalize:   true,        // L2-normalise vectors    (default: true)
  token:      'hf_...',     // HF API token (optional; also reads HF_TOKEN env)
  dtype:      'q8',         // quantization dtype      (optional)
  cacheDir:   '/my/cache',  // override model cache    (default: ~/.embedeer/models)
});

const vectors = await embedder.embed(['Hello world', 'Foo bar baz']);
// → number[][]  (one 384-dim vector per text for all-MiniLM-L6-v2)

await embedder.destroy(); // shut down worker processes

TypeScript example

The package includes TypeScript declarations so imports are typed automatically.

import { Embedder } from '@jsilvanus/embedeer';

async function main() {
  const embedder = await Embedder.create('Xenova/all-MiniLM-L6-v2', { batchSize: 32, concurrency: 2 });
  const vectors = await embedder.embed(['Hello world', 'Foo bar baz']);
  // vectors: number[][]
  await embedder.destroy();
}

main().catch(console.error);

Programmatic profile generation (optional)

You can generate and save a per-user performance profile which Embedder.create() will automatically apply. This is useful to pick the best batchSize / concurrency for your machine without manual tuning.

import { Embedder } from '@jsilvanus/embedeer';

// Quick profile generation (writes ~/.embedeer/perf-profile.json)
await Embedder.generateAndSaveProfile({ mode: 'quick', device: 'cpu', sampleSize: 100 });
// Subsequent calls to Embedder.create() will auto-apply the saved profile by default.

Embed texts with GPU

import { Embedder } from '@jsilvanus/embedeer';

// Auto-detect GPU (falls back to CPU if no provider is installed)
const embedder = await Embedder.create('Xenova/all-MiniLM-L6-v2', {
  device: 'auto',
});

// Require GPU (throws if no provider is available)
const embedder = await Embedder.create('Xenova/all-MiniLM-L6-v2', {
  device: 'gpu',
});

// Explicitly select an execution provider
const embedder = await Embedder.create('Xenova/all-MiniLM-L6-v2', {
  provider: 'cuda',  // 'cuda' | 'dml'
});

Pull (pre-cache) a model

Like ollama pull — downloads the model once so workers start instantly:

import { loadModel } from '@jsilvanus/embedeer';

const { modelName, cacheDir } = await loadModel('Xenova/all-MiniLM-L6-v2', {
  token: 'hf_...',   // optional
  dtype: 'q8',       // optional
});

CLI

npx @jsilvanus/embedeer [options]

Model management (pull / cache model):
  npx @jsilvanus/embedeer --model <name>

Embed texts (batch):
  npx @jsilvanus/embedeer --model <name> --data "text1" "text2" ...
  npx @jsilvanus/embedeer --model <name> --data '["text1","text2"]'
  npx @jsilvanus/embedeer --model <name> --file texts.txt
  echo '["t1","t2"]' | npx @jsilvanus/embedeer --model <name>
  printf 'a\0b\0c' | npx @jsilvanus/embedeer --model <name> --delimiter '\0'

Interactive / streaming line-reader:
  npx @jsilvanus/embedeer --model <name> --interactive --dump out.jsonl
  cat big.txt | npx @jsilvanus/embedeer --model <name> -i --output csv --dump out.csv

Options:
  -m, --model <name>           Hugging Face model (default: Xenova/all-MiniLM-L6-v2)
  -d, --data <text...>         Text(s) or JSON array to embed
      --file <path>            Input file: JSON array or delimited texts
  -D, --delimiter <str>        Record separator for stdin/file (default: \n)
                               Escape sequences supported: \0 \n \t \r
  -i, --interactive            Interactive line-reader (see below)
      --dump <path>            Write output to file instead of stdout
      --output <format>        Output: json|jsonl|csv|txt|sql (default: json)
      --with-text              Include source text alongside each embedding
  -b, --batch-size <n>         Texts per worker batch (default: 32)
  -c, --concurrency <n>        Parallel workers (default: 2)
      --mode process|thread    Worker mode (default: process)
  -p, --pooling <mode>         mean|cls|none (default: mean)
      --no-normalize           Disable L2 normalisation
      --dtype <type>           Quantization: fp32|fp16|q8|q4|q4f16|auto
      --token <tok>            Hugging Face API token (or set HF_TOKEN env)
      --cache-dir <path>       Model cache directory (default: ~/.embedeer/models)
      --device <mode>          Compute device: auto|cpu|gpu (default: cpu)
      --provider <name>        Execution provider override: cpu|cuda|dml
  -h, --help                   Show this help

Input Sources

Texts can be provided in any of these ways (checked in order):

| Source | How | |--------|-----| | Inline args | --data "text1" "text2" "text3" | | Inline JSON | --data '["text1","text2"]' | | File | --file texts.txt (JSON array or one record per line) | | Stdin | Pipe or redirect — auto-detected; TTY is skipped | | Interactive | --interactive / -i — line-reader, embeds as you type |

Stdin auto-detection: when stdin is not a TTY (i.e. data is piped or redirected), embedeer reads it before deciding what to do. JSON arrays are accepted directly; otherwise records are split on the delimiter.


Interactive Line-Reader Mode (-i / --interactive)

The interactive mode opens a line-by-line reader that starts embedding as records arrive — ideal for pasting large datasets into a terminal or streaming data from another process.

# Open an interactive session (paste lines, Ctrl+D when done)
npx @jsilvanus/embedeer --model Xenova/all-MiniLM-L6-v2 --interactive --dump embeddings.jsonl

# Stream a large file through interactive mode with CSV output
cat big.txt | npx @jsilvanus/embedeer --model Xenova/all-MiniLM-L6-v2 \
  --interactive --output csv --dump embeddings.csv

# Interactive with GPU, custom batch size, txt output
npx @jsilvanus/embedeer --model Xenova/all-MiniLM-L6-v2 \
  --interactive --device auto --batch-size 16 --output txt --dump vecs.txt

How it works:

| Event | What happens | |-------|-------------| | Type a line, press Enter | Record is buffered | | Buffer reaches --batch-size | Auto-flush: embed + append to output | | Type an empty line | Manual flush: embed whatever is buffered | | Ctrl+D (EOF) | Flush remaining records and exit | | Ctrl+C | Flush remaining records and exit |

Behaviour notes:

  • Progress messages (Batch N: M record(s) → file) always go to stderr — they never pollute piped output.
  • When stdin is a TTY, a > prompt is shown on stderr.
  • Output defaults to stdout if --dump is omitted; a tip is printed when running in TTY mode.
  • --output json and --output sql are automatically promoted to jsonl since they produce complete documents that cannot be appended to incrementally.
  • --output csv writes the dimension header (text,dim_0,dim_1,...) on the first batch only; subsequent batches append data rows.
  • Each interactive session clears the --dump file on start so you always get a fresh output file.

Configurable delimiter (-D / --delimiter)

By default records in stdin and files are split on newline (\n). Use --delimiter to change it:

# Newline-delimited (default)
printf 'Hello\nWorld\n' | npx @jsilvanus/embedeer --model Xenova/all-MiniLM-L6-v2

# Null-byte delimited — safe with filenames/texts that contain newlines
printf 'Hello\0World\0' | npx @jsilvanus/embedeer --model Xenova/all-MiniLM-L6-v2 --delimiter '\0'

# Tab-delimited
printf 'Hello\tWorld' | npx @jsilvanus/embedeer --model Xenova/all-MiniLM-L6-v2 --delimiter '\t'

# Custom multi-character delimiter
printf 'Hello|||World|||Foo' | npx @jsilvanus/embedeer --model Xenova/all-MiniLM-L6-v2 --delimiter '|||'

# File with null-byte delimiter
npx @jsilvanus/embedeer --model Xenova/all-MiniLM-L6-v2 --file records.bin --delimiter '\0'

# Integrate with find -print0 (handles filenames with spaces / newlines)
find ./docs -name '*.txt' -print0 | \
  xargs -0 cat | \
  npx @jsilvanus/embedeer --model Xenova/all-MiniLM-L6-v2 --delimiter '\0'

Supported escape sequences in --delimiter:

| Sequence | Character | |----------|-----------| | \0 | Null byte (U+0000) | | \n | Newline (U+000A) | | \t | Tab (U+0009) | | \r | Carriage return (U+000D) |


Output Formats

| Format | Description | |--------|-------------| | json (default) | JSON array of float arrays: [[0.1,0.2,...],[...]] | | json --with-text | JSON array of objects: [{"text":"...","embedding":[...]}] | | jsonl | Newline-delimited JSON, one object per line: {"text":"...","embedding":[...]} | | csv | CSV with header: text,dim_0,dim_1,...,dim_N | | txt | Space-separated floats, one vector per line | | txt --with-text | Tab-separated: <original text>\t<float float ...> | | sql | INSERT INTO embeddings (text, vector) VALUES ...; |

Use --dump <path> to write the output to a file instead of stdout. Progress messages always go to stderr so they never interfere with piped output.

Piping examples

MODEL=Xenova/all-MiniLM-L6-v2

# --- json (default) ---
# Embed and pretty-print with jq
echo '["Hello","World"]' | npx @jsilvanus/embedeer --model $MODEL | jq '.[0] | length'

# --- jsonl ---
# One object per line — pipe to jq, grep, awk, etc.
npx @jsilvanus/embedeer --model $MODEL --data "foo" "bar" --output jsonl

# Filter by similarity: extract embedding for downstream processing
npx @jsilvanus/embedeer --model $MODEL --data "query text" --output jsonl \
  | jq -c '.embedding'

# Stream a large file and store as JSONL
npx @jsilvanus/embedeer --model $MODEL --file big.txt --output jsonl --dump out.jsonl

# --- json --with-text ---
# Keep the source text next to each vector (useful for building a search index)
npx @jsilvanus/embedeer --model $MODEL --output json --with-text \
  --data "cat" "dog" "fish" \
  | jq '.[] | {text, dims: (.embedding | length)}'

# --- csv ---
# Embed then open in Python/pandas
npx @jsilvanus/embedeer --model $MODEL --file texts.txt --output csv --dump vectors.csv
python3 -c "import pandas as pd; df = pd.read_csv('vectors.csv'); print(df.shape)"

# --- txt ---
# Raw floats — useful for awk/paste/numpy text loading
npx @jsilvanus/embedeer --model $MODEL --data "Hello" "World" --output txt \
  | awk '{print NF, "dimensions"}'

# txt --with-text: original text + tab + floats, easy to parse
npx @jsilvanus/embedeer --model $MODEL --file texts.txt --output txt --with-text \
  | while IFS=$'\t' read -r text vec; do echo "TEXT: $text"; done

# --- sql ---
# Generate INSERT statements for a vector DB or SQLite
npx @jsilvanus/embedeer --model $MODEL --file texts.txt --output sql --dump inserts.sql
sqlite3 mydb.sqlite < inserts.sql

# --- Chaining with other tools ---
# Embed stdin from another command
cat docs/*.txt | npx @jsilvanus/embedeer --model $MODEL --output jsonl > embeddings.jsonl

# Null-byte input from find (handles any filename or text with newlines)
find ./corpus -name '*.txt' -print0 \
  | xargs -0 cat \
  | npx @jsilvanus/embedeer --model $MODEL --delimiter '\0' --output jsonl

CLI Examples

# Pull a model (like ollama pull)
npx @jsilvanus/embedeer --model Xenova/all-MiniLM-L6-v2

# Embed a few strings, output JSON (CPU)
npx @jsilvanus/embedeer --model Xenova/all-MiniLM-L6-v2 --data "Hello" "World"

# Auto-detect GPU, fall back to CPU if unavailable
# (uses CUDA on Linux, DirectML on Windows, CPU everywhere else)
npx @jsilvanus/embedeer --model Xenova/all-MiniLM-L6-v2 --device auto --data "Hello"

# Require GPU (throws with install instructions if no provider found)
npx @jsilvanus/embedeer --model Xenova/all-MiniLM-L6-v2 --device gpu --data "Hello GPU"

# Explicit CUDA (Linux x64 — requires CUDA 12 system libraries)
npx @jsilvanus/embedeer --model Xenova/all-MiniLM-L6-v2 --provider cuda --data "Hello CUDA"

# Explicit DirectML (Windows x64)
npx @jsilvanus/embedeer --model Xenova/all-MiniLM-L6-v2 --provider dml --data "Hello DML"

# Embed from a file, dump SQL to disk
npx @jsilvanus/embedeer --model Xenova/all-MiniLM-L6-v2 \
  --file texts.txt --output sql --dump out.sql

# Use quantized model, in-process threads, private model with token
npx @jsilvanus/embedeer --model my-org/private-model \
  --token hf_xxx --dtype q8 --mode thread \
  --data "embed me"

Using GPU

No additional packages are needed — onnxruntime-node (installed with @jsilvanus/embedeer) already bundles the CUDA provider on Linux x64 and DirectML on Windows x64.

Linux x64 — NVIDIA CUDA:

# One-time: install CUDA 12 system libraries (Ubuntu/Debian)
sudo apt install cuda-toolkit-12-6 libcudnn9-cuda-12

# Auto-detect: uses CUDA here, CPU fallback on any other machine
npx @jsilvanus/embedeer --model Xenova/all-MiniLM-L6-v2 --device auto --data "Hello"

# Hard-require CUDA (throws with diagnostic error if unavailable):
npx @jsilvanus/embedeer --model Xenova/all-MiniLM-L6-v2 --device gpu --data "Hello GPU"

# Explicit CUDA provider:
npx @jsilvanus/embedeer --model Xenova/all-MiniLM-L6-v2 --provider cuda --data "Hello CUDA"

Windows x64 — DirectML (any GPU: NVIDIA / AMD / Intel):

npx @jsilvanus/embedeer --model Xenova/all-MiniLM-L6-v2 --device auto --data "Hello"
npx @jsilvanus/embedeer --model Xenova/all-MiniLM-L6-v2 --device gpu  --data "Hello GPU"
npx @jsilvanus/embedeer --model Xenova/all-MiniLM-L6-v2 --provider dml --data "Hello DML"

GPU Acceleration

GPU support is built into onnxruntime-node (a dependency of @huggingface/transformers):

| Platform | Provider | Requirement | |----------------|-----------|--------------------------------------------------------| | Linux x64 | CUDA | NVIDIA GPU + driver ≥ 525, CUDA 12 toolkit, cuDNN 9 | | Windows x64 | DirectML | Any DirectX 12 GPU (most GPUs since 2016), Windows 10+ |

Provider selection logic


Testing

Run the project's tests locally:

# install deps
pnpm install

# run tests
pnpm test

# run tests with coverage
pnpm run coverage

CI is enabled via GitHub Actions (.github/workflows/ci.yml) which runs tests and collects coverage on push and pull requests.

| device | provider | Behavior | |----------|-----------|----------| | cpu (default) | — | Always CPU | | auto | — | Try GPU providers for the platform in order; silent CPU fallback | | gpu | — | Try GPU providers; throw if none available | | any | cuda | Load CUDA provider; throw if not available or not supported | | any | dml | Load DirectML provider; throw if not available or not supported | | any | cpu | Always CPU |

On Linux x64: GPU order is cuda.
On Windows x64: GPU order is cuda → dml.


Performance Optimizations

Embedeer exposes runtime knobs and helper scripts to tune throughput for your host.

  • Pre-load models: run Embedder.loadModel(model, { dtype, cacheDir }) or use the bench scripts so workers start instantly without re-downloading models.
  • Reuse Embedder instances: create a single Embedder and call embed() repeatedly instead of creating and destroying instances per batch.
  • Batch size vs concurrency:
    • CPU: moderate batch sizes (16–64) with multiple workers (concurrency ≥ 2) usually give best throughput.
    • GPU: larger batches (64–256) with low concurrency (1–2) are typically fastest.
  • BLAS threading: avoid oversubscription by setting OMP_NUM_THREADS and MKL_NUM_THREADS to Math.floor(cpu_cores / concurrency) before starting workers.
  • Device/provider: use cuda on Linux and dml (DirectML) on Windows when available; device: 'auto' will try providers and fall back to CPU.
  • Automatic tuning: use bench/grid-search.js to sweep batchSize, concurrency, and dtype for your host and save results. You can generate and persist a per-user profile and apply it automatically via the Embedder APIs.

Examples:

# CPU quick grid
node bench/grid-search.js --device cpu --sample-size 200 --out bench/grid-results-cpu.json

# GPU quick grid
node bench/grid-search.js --device gpu --sample-size 100 --out bench/grid-results-gpu.json

Programmatic profile generation (writes ~/.embedeer/perf-profile.json):

import { Embedder } from '@jsilvanus/embedeer';

await Embedder.generateAndSaveProfile({ mode: 'quick', device: 'cpu', sampleSize: 100 });
// Embedder.create() will auto-apply a saved per-user profile by default

How it works

embed(texts)
  │
  ├─ split into batches of batchSize
  │
  └─ Promise.all(batches) ──► WorkerPool
                                 │
                                 ├─ [process mode] ChildProcessWorker 0
                                 │   resolveProvider(device, provider)
                                 │   → pipeline('feature-extraction', model, { device: 'cuda' })
                                 │   → embed batch A
                                 │
                                 └─ [process mode] ChildProcessWorker 1
                                     resolveProvider(device, provider)
                                     → pipeline(...) → embed batch B

Workers load the model once at startup and reuse it for all batches.
Provider activation happens per-worker before the pipeline is created.


E2E-testing

Note: HF authentication has not been tested.


Collaboration

You are welcome to suggest additions or open a PR, especially if you have performance-related assistance. Opened issues are also accepted with thanks.

License

MIT