@simulatte/doppler

v0.1.6

Published

a day ago

Browser-native WebGPU inference engine for local intent and inference loops

0High
0Medium
0Low

clocksmith

doppler webgpu inference llm quantized browser local

@simulatte/doppler

Inference and training on raw WebGPU. Pure JS + WGSL.

Live Demo · npm · simulatte.world

Install

npm install @simulatte/doppler

Quick start

import { doppler } from '@simulatte/doppler';

const model = await doppler.load('gemma3-270m');

for await (const token of model.generate('Hello, world')) {
  process.stdout.write(token);
}

Registry IDs resolve to hosted RDRR artifacts from Clocksmith/rdrr by default. Tokens stream from a native AsyncGenerator. See more examples below or the canonical Root API guide.

Why Doppler

JS → WGSL → WebGPU. Direct JavaScript orchestration into native WebGPU kernels, avoiding ONNX runtimes, WASM blobs, and bridge layers.

for await streaming. Generation uses a native AsyncGenerator that fits normal app control flow.

LoRA hot-swap. Swap adapters at runtime without reloading the base model.

Independent model instances. Run multiple models concurrently. Each owns its pipeline, buffers, and KV cache.

Evidence

Phase-latency comparison on one workload across models

Snapshot artifacts:

Under the hood

Sharded weight loading via OPFS moves multi-GB weights into VRAM without blocking the main thread.
Quantized inference paths (Q4K, Q8, F16) support practical model sizes on consumer GPUs.
Kernel hot-swap between prefill and decode paths.
Config-driven runtime keeps presets, kernel-path selection, and sampling explicit.
Reproducible benchmarks expose deterministic knobs and auditable kernel traces.

More examples

// Non-streaming
const text = await model.generateText('Explain WebGPU in one sentence');

// Load with progress logging
const modelWithProgress = await doppler.load('gemma3-270m', {
  onProgress: ({ message }) => console.log(`[doppler] ${message}`),
});

// Chat
const reply = await model.chatText([
  { role: 'user', content: 'Write a dispatch that outruns its own light cone' },
]);

// LoRA hot-swap
await model.loadLoRA('https://example.com/adapters/oneshift-twoshift-redshift-blueshift/manifest.json');

// Convenience shorthand (caches model automatically)
for await (const token of doppler('Hello', { model: 'gemma3-270m' })) {
  process.stdout.write(token);
}

Documentation

Docs index (canonical navigation): docs/INDEX.md
First-run workflow: docs/getting-started.md
Runtime config contract: docs/config.md
Architecture: docs/architecture.md
Generated model support table: docs/model-support-matrix.md

Current model support

Verified right now:

gemma-3-270m-it-wq4k-ef16-hf16
gemma-3-1b-it-wq4k-ef16-hf16
google-embeddinggemma-300m-wq4k-ef16
translategemma-4b-it-wq4k-ef16-hf16

Known failing right now:

qwen-3-5-0-8b-wq4k-ef16-hf16-f16
qwen-3-5-2b-wq4k-ef16-hf16-f16

For the generated status table, including loads but unverified and everything else, see docs/model-support-matrix.md.

Environment requirements

WebGPU is required.
Supported runtimes: WebGPU-capable browsers, or Node with a WebGPU provider.
Chrome / Edge 113+ supported.
Firefox support varies (typically behind a flag).
Safari support is evolving.

License

Apache License 2.0 (Apache-2.0). See LICENSE and NOTICE.

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

@simulatte/doppler

Install

Quick start

Why Doppler

Evidence

Under the hood

More examples

Documentation

Current model support

Environment requirements

License