@inbrowser/model
v0.1.0
Published
On-device LLM engine. Wraps @huggingface/transformers + ONNX Runtime Web behind a narrow Engine surface. Presets ship Gemma 4 (E2B/E4B) configurations. Adapters in subpaths satisfy @inbrowser/relay's InferenceProvider and @inbrowser/agent's LlmClient so a
Readme
@inbrowser/model
On-device LLM engine. Loads ONNX models in the browser via
@huggingface/transformers + ONNX Runtime Web (WebGPU / WASM), and
exposes them behind a narrow Engine surface.
Status: POC stub. Types, presets, adapter surface, and worker RPC frames are in place. The
@huggingface/transformerswiring insidecreateEngineis not yet implemented —generate()yields anerrorevent today. Seesrc/engine.ts.
One-liner
import { createEngine } from '@inbrowser/model';
import { gemma4_E2B } from '@inbrowser/model/presets';
const engine = createEngine(gemma4_E2B);
await engine.ensureReady();
for await (const evt of engine.generate([
{ role: 'user', text: 'Explain WebGPU in one paragraph.' },
])) {
if (evt.kind === 'token') process.stdout.write(evt.text);
}Surface
| Export | What it gives you |
|---|---|
| createEngine(preset) | Runtime Engine — owns load state + decode loop |
| definePreset(p) | Type-safe identity helper for community presets |
| ModelPreset, Engine, EngineEvent, … | Public types |
| @inbrowser/model/presets | gemma4_E2B, gemma4_E4B |
| @inbrowser/model/relay | createLocalInferenceProvider(engine) → relay InferenceProvider |
| @inbrowser/model/agent | createLocalLlmClient(engine, id) → agent LlmClient |
| @inbrowser/model/worker | hostEngineInWorker(self) + connectWorkerEngine(opts) |
Vocabulary anchor
- ONNX — model file format. ONNX Runtime Web is the execution
engine (
onnxruntime-web); WebGPU and WASM are its backends. dtype— weight/activation precision selection (q4f16,q8,fp16,fp32). Distinct from parameter count.ModelRef— bare locator (HF HubmodelId+revision).ModelPreset— locator + dtype + backend + capabilities. Static.Engine— runtime object owning a loaded model. Dynamic.- Cold start — fetch + init + warmup. Warm decode — subsequent calls on a ready engine.
Design notes
- One factory (
createEngine), many presets. NocreateGemmaEngine. capabilitiesis on the preset, not the engine — interrogable pre-load (gemma4_E2B.capabilities.contextWindow).EngineEventis narrower thanInferenceEvent/ChatEvent. Adapters widen.- Worker subpath returns the same
Engineshape; the agent runtime cannot tell whether it holds a direct or remote engine. - Tool calling is not native to Gemma 4. The polyfill (prompt-engineered
tool calling + structured-output parsing) lives in
@inbrowser/agent, not here.
