@simulatte/doppler
v0.1.6
Published
Browser-native WebGPU inference engine for local intent and inference loops
Maintainers
Readme
@simulatte/doppler
Inference and training on raw WebGPU. Pure JS + WGSL.
Live Demo · npm · simulatte.world
Install
npm install @simulatte/dopplerQuick start
import { doppler } from '@simulatte/doppler';
const model = await doppler.load('gemma3-270m');
for await (const token of model.generate('Hello, world')) {
process.stdout.write(token);
}Registry IDs resolve to hosted RDRR artifacts from Clocksmith/rdrr by default. Tokens stream from a native AsyncGenerator. See more examples below or the canonical Root API guide.
Why Doppler
JS → WGSL → WebGPU. Direct JavaScript orchestration into native WebGPU kernels, avoiding ONNX runtimes, WASM blobs, and bridge layers.
for await streaming. Generation uses a native AsyncGenerator that fits normal app control flow.
LoRA hot-swap. Swap adapters at runtime without reloading the base model.
Independent model instances. Run multiple models concurrently. Each owns its pipeline, buffers, and KV cache.
Evidence
Snapshot artifacts:
Under the hood
- Sharded weight loading via OPFS moves multi-GB weights into VRAM without blocking the main thread.
- Quantized inference paths (Q4K, Q8, F16) support practical model sizes on consumer GPUs.
- Kernel hot-swap between prefill and decode paths.
- Config-driven runtime keeps presets, kernel-path selection, and sampling explicit.
- Reproducible benchmarks expose deterministic knobs and auditable kernel traces.
More examples
// Non-streaming
const text = await model.generateText('Explain WebGPU in one sentence');
// Load with progress logging
const modelWithProgress = await doppler.load('gemma3-270m', {
onProgress: ({ message }) => console.log(`[doppler] ${message}`),
});
// Chat
const reply = await model.chatText([
{ role: 'user', content: 'Write a dispatch that outruns its own light cone' },
]);
// LoRA hot-swap
await model.loadLoRA('https://example.com/adapters/oneshift-twoshift-redshift-blueshift/manifest.json');
// Convenience shorthand (caches model automatically)
for await (const token of doppler('Hello', { model: 'gemma3-270m' })) {
process.stdout.write(token);
}Documentation
- Docs index (canonical navigation): docs/INDEX.md
- First-run workflow: docs/getting-started.md
- Runtime config contract: docs/config.md
- Architecture: docs/architecture.md
- Generated model support table: docs/model-support-matrix.md
Current model support
Verified right now:
gemma-3-270m-it-wq4k-ef16-hf16gemma-3-1b-it-wq4k-ef16-hf16google-embeddinggemma-300m-wq4k-ef16translategemma-4b-it-wq4k-ef16-hf16
Known failing right now:
qwen-3-5-0-8b-wq4k-ef16-hf16-f16qwen-3-5-2b-wq4k-ef16-hf16-f16
For the generated status table, including loads but unverified and everything else, see docs/model-support-matrix.md.
Environment requirements
- WebGPU is required.
- Supported runtimes: WebGPU-capable browsers, or Node with a WebGPU provider.
- Chrome / Edge 113+ supported.
- Firefox support varies (typically behind a flag).
- Safari support is evolving.
