@simulatte/webgpu
v0.3.1
Published
Headless WebGPU runtime for Node.js and Bun, powered by Doe
Maintainers
Readme
@simulatte/webgpu
@simulatte/webgpu is Fawn's headless WebGPU package for Node.js and Bun: use
the raw WebGPU API through requestDevice() and device.*, or move up to the
Doe API + routines when you want the same runtime with less setup. Browser
DOM/canvas ownership lives in the separate nursery/fawn-browser lane.
Terminology in this README is deliberate:
Doe runtimemeans the Zig/native WebGPU runtime underneath the packageDoe APImeans the explicit JS convenience surface underdoe,gpu.buffers.*,gpu.compute.run(...), andgpu.compute.compile(...)Doe routinesmeans the narrower, more opinionated JS flows such asgpu.compute.once(...)
Start here
From direct WebGPU to Doe
The same simple compute pass, shown first at the raw WebGPU layer and then at the explicit Doe API layer.
1. Direct WebGPU
import { globals, requestDevice } from "@simulatte/webgpu";
const device = await requestDevice();
const input = new Float32Array([1, 2, 3, 4]);
const bytes = input.byteLength;
const src = device.createBuffer({
size: bytes,
usage: globals.GPUBufferUsage.STORAGE | globals.GPUBufferUsage.COPY_DST,
});
device.queue.writeBuffer(src, 0, input);
const dst = device.createBuffer({
size: bytes,
usage: globals.GPUBufferUsage.STORAGE | globals.GPUBufferUsage.COPY_SRC,
});
const readback = device.createBuffer({
size: bytes,
usage: globals.GPUBufferUsage.COPY_DST | globals.GPUBufferUsage.MAP_READ,
});
const pipeline = device.createComputePipeline({
layout: "auto",
compute: {
module: device.createShaderModule({
code: `
@group(0) @binding(0) var<storage, read> src: array<f32>;
@group(0) @binding(1) var<storage, read_write> dst: array<f32>;
@compute @workgroup_size(4)
fn main(@builtin(global_invocation_id) gid: vec3u) {
let i = gid.x;
dst[i] = src[i] * 2.0;
}
`,
}),
entryPoint: "main",
},
});
const bindGroup = device.createBindGroup({
layout: pipeline.getBindGroupLayout(0),
entries: [
{ binding: 0, resource: { buffer: src } },
{ binding: 1, resource: { buffer: dst } },
],
});
const encoder = device.createCommandEncoder();
const pass = encoder.beginComputePass();
pass.setPipeline(pipeline);
pass.setBindGroup(0, bindGroup);
pass.dispatchWorkgroups(1);
pass.end();
encoder.copyBufferToBuffer(dst, 0, readback, 0, bytes);
device.queue.submit([encoder.finish()]);
await device.queue.onSubmittedWorkDone();
await readback.mapAsync(globals.GPUMapMode.READ);
const result = new Float32Array(readback.getMappedRange().slice(0));
readback.unmap();
console.log(result); // Float32Array(4) [ 2, 4, 6, 8 ]2. Doe API
Explicit Doe buffers and dispatch when you want less boilerplate but still want to manage the resources yourself.
import { doe } from "@simulatte/webgpu/compute";
const gpu = await doe.requestDevice();
const src = gpu.buffers.fromData(Float32Array.of(1, 2, 3, 4));
const dst = gpu.buffers.like(src, {
usage: "storageReadWrite",
});
await gpu.compute.run({
code: `
@group(0) @binding(0) var<storage, read> src: array<f32>;
@group(0) @binding(1) var<storage, read_write> dst: array<f32>;
@compute @workgroup_size(4)
fn main(@builtin(global_invocation_id) gid: vec3u) {
let i = gid.x;
dst[i] = src[i] * 2.0;
}
`,
// Access is inferred from the Doe buffer usage above.
bindings: [src, dst],
workgroups: 1,
});
console.log(await gpu.buffers.read(dst, Float32Array)); // Float32Array(4) [ 2, 4, 6, 8 ]What this package gives you:
requestDevice()gives you real headless WebGPUdoegives you the same runtime with less boilerplate and explicit resource controlcompute.once(...)is the more opinionated routines layer when you do not want to manage buffers and readback yourself
3. Doe routines: one-shot tensor matmul
This is where the routines layer starts to separate itself: you pass typed arrays and an output spec, and the package handles upload, output allocation, dispatch, and readback while the shader and tensor shapes stay explicit.
import { doe } from "@simulatte/webgpu/compute";
const gpu = await doe.requestDevice();
const [M, K, N] = [256, 512, 256];
const lhs = Float32Array.from({ length: M * K }, (_, i) => (i % 17) / 17);
const rhs = Float32Array.from({ length: K * N }, (_, i) => (i % 13) / 13);
const dims = new Uint32Array([M, K, N, 0]);
const result = await gpu.compute.once({
code: `
struct Dims {
m: u32,
k: u32,
n: u32,
_pad: u32,
};
@group(0) @binding(0) var<uniform> dims: Dims;
@group(0) @binding(1) var<storage, read> lhs: array<f32>;
@group(0) @binding(2) var<storage, read> rhs: array<f32>;
@group(0) @binding(3) var<storage, read_write> out: array<f32>;
@compute @workgroup_size(8, 8)
fn main(@builtin(global_invocation_id) gid: vec3u) {
let row = gid.y;
let col = gid.x;
if (row >= dims.m || col >= dims.n) {
return;
}
var acc = 0.0;
for (var i = 0u; i < dims.k; i = i + 1u) {
acc += lhs[row * dims.k + i] * rhs[i * dims.n + col];
}
out[row * dims.n + col] = acc;
}
`,
inputs: [
{ data: dims, usage: "uniform", access: "uniform" },
lhs,
rhs,
],
output: {
type: Float32Array,
size: M * N * Float32Array.BYTES_PER_ELEMENT,
},
workgroups: [Math.ceil(N / 8), Math.ceil(M / 8)],
});
console.log(result.subarray(0, 8)); // Float32Array(8) [ ... ]Benchmark snapshot
This package is the headless package surface of the Doe runtime, Fawn's Zig-first WebGPU implementation, and it is benchmarked through separate Node and Bun package lanes.
@simulatte/webgpu is the headless package surface of the broader
Fawn project. The same repository also
carries the Doe runtime itself, benchmarking and verification tooling, and the
separate nursery/fawn-browser Chromium/browser integration lane.
Install
npm install @simulatte/webgpuThe install ships platform-specific prebuilds for macOS arm64 (Metal) and
Linux x64 (Vulkan). If no prebuild matches your platform, the installer falls
back to building the native addon with node-gyp only; it does not build or
bundle libwebgpu_doe and the required Dawn sidecar for you. On unsupported
platforms, use a local Fawn workspace build for those runtime libraries.
Choose a surface
| Import | Surface | Includes |
| --------------------------- | --------------------- | --------------------------------------------------------- |
| @simulatte/webgpu | Default full surface | Buffers, compute, textures, samplers, render, Doe API + routines |
| @simulatte/webgpu/compute | Compute-first surface | Buffers, compute, copy/upload/readback, Doe API + routines |
| @simulatte/webgpu/full | Explicit full surface | Same contract as the default package surface |
Use @simulatte/webgpu/compute when you want the constrained package contract
for AI workloads and other buffer/dispatch-heavy headless execution. The
compute surface intentionally omits render and sampler methods from the JS
facade.
Basic entry points
Inspect the provider
import { providerInfo } from "@simulatte/webgpu";
console.log(providerInfo());Request a full device
import { requestDevice } from "@simulatte/webgpu";
const device = await requestDevice();
console.log(device.limits.maxBufferSize);Request a compute-only device
import { requestDevice } from "@simulatte/webgpu/compute";
const device = await requestDevice();
console.log(typeof device.createComputePipeline); // "function"
console.log(typeof device.createRenderPipeline); // "undefined"API layers
The package gives you three API styles over the same Doe runtime:
Direct WebGPUrawrequestDevice()plus directdevice.*Doe APIexplicit Doe surface for lower-boilerplate buffer and compute flowsDoe routinesmore opinionated Doe flows where the JS surface carries more of the operation
Examples for each style ship in:
examples/direct-webgpu/examples/doe-api/examples/doe-routines/
doe is the package's shared JS convenience surface over the Doe runtime. It is available
from both @simulatte/webgpu and @simulatte/webgpu/compute.
await doe.requestDevice()gets a bound helper object in one step; usedoe.bind(device)when you already have a device.gpu.buffers.*,gpu.compute.run(...), andgpu.compute.compile(...)are the mainDoe APIsurface.gpu.compute.once(...)is currently the firstDoe routinespath.
The Doe API and Doe routines surface is the same on both package surfaces. The difference is the raw device beneath it:
@simulatte/webgpu/computereturns a compute-only facade@simulatte/webgpukeeps the full headless device surface
Binding access is inferred from Doe helper-created buffer usage when possible.
For raw WebGPU buffers or non-bindable/ambiguous usage, pass
{ buffer, access } explicitly.
Runtime notes
@simulatte/webgpu is the canonical package surface for the Doe runtime. Node uses the
addon-backed path. Bun uses a platform-dependent bridge today: Linux routes
through the package FFI surface, while macOS currently uses the full
addon-backed path for correctness parity. Current builds still ship a Dawn
sidecar where proc resolution requires it.
The Doe runtime is Fawn's Zig-first WebGPU implementation with explicit profile
and quirk binding, a native WGSL pipeline (lexer -> parser -> semantic
analysis -> IR -> backend emitters), and explicit Vulkan/Metal/D3D12
execution paths in one system.
Optional -Dlean-verified=true builds use Lean 4 as build-time proof support,
not as a runtime interpreter. When a condition is proved ahead of time, the Doe
runtime can remove that branch instead of re-checking it on every command;
package consumers should not assume that path by default.
Verify your install
npm run smoke
npm test
npm run test:bunnpm run smoke checks native library loading and a GPU round-trip. npm test
covers the Node package contract and a packed-tarball export/import check.
Caveats
- This is a headless package, not a browser DOM/canvas package.
@simulatte/webgpu/computeis intentionally narrower than the default full surface.- Bun currently uses a platform-dependent bridge layer under the same package contract: FFI on Linux, full/addon-backed on macOS. Package-surface contract tests are green, and package benchmark rows are positioning data rather than the source of truth for strict backend-native Doe-vs-Dawn claims.
