wasmicro

v0.3.1

Published

4 days ago

BERT, GPT-2 and T5 inference in a 199 KB WASM bundle. Browser, Node, Cloudflare Workers.

0High
0Medium
0Low

xzdes

wasm inference transformer bert gpt2 t5 embeddings llm nlp

wasmicro

BERT, GPT-2 and T5 inference in a 199 KB WASM bundle.

Run transformer inference in any JavaScript environment — browser, Node, Cloudflare Workers, Electron — with a single 199 KB WebAssembly file. Model type is auto-detected from config.json; no hardcoded parameters needed.

Install

npm install wasmicro

Quick start

import init, { WasmPipeline } from "wasmicro";

await init();

// Load files (fetch, fs.readFile, etc.)
const model     = new Uint8Array(await (await fetch("model.safetensors")).arrayBuffer());
const tokenizer = new Uint8Array(await (await fetch("vocab.txt")).arrayBuffer());
const config    = await (await fetch("config.json")).text();

// merges.txt is required for GPT-2 / T5; pass null for BERT.
const pipeline = WasmPipeline.fromBytes(model, tokenizer, config, null);

// ── BERT: semantic search / embeddings ────────────────────────────────────────
const embedding = pipeline.embed("Hello world", 128);          // Float32Array
const batch     = pipeline.embedBatch(["text 1", "text 2"], 128); // Float32Array

// ── GPT-2 / T5: text generation ───────────────────────────────────────────────
const merges   = new Uint8Array(await (await fetch("merges.txt")).arrayBuffer());
const pipeline2 = WasmPipeline.fromBytes(model, vocabJson, config, merges);
const text      = pipeline2.generate("Once upon a time", 50);

Supported models

| model_type in config.json | Tokenizer file | Methods | |---|---|---| | bert, roberta, distilbert, electra | vocab.txt | embed, embedBatch | | gpt2, gpt_neo, gpt_neox | vocab.json + merges.txt | generate | | t5, mt5, longt5 | vocab.json + merges.txt | generate |

Bundle size

| Runtime | WASM/JS payload | |---|---| | wasmicro | 199 KB | | Candle WASM | 1.5 – 5 MB | | transformers.js | ~10 MB | | ONNX Runtime Web | 8 – 20 MB |

Verified against real HuggingFace checkpoints

| Model | Result | |---|---| | bert-base-uncased | cosine 1.000000, max|Δ| 8.3 × 10⁻⁷ vs PyTorch | | openai-community/gpt2 | loads and generates correctly | | google-t5/t5-small | encoder shape [seq, 512], all values finite |

Full verification source: wasmicro-verify

License

MIT OR Apache-2.0

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

wasmicro

Install

Quick start

Supported models

Bundle size

Verified against real HuggingFace checkpoints

Links

License