paraweb-js
v1.0.0
Published
TypeScript library of ten parallel programming patterns for Node.js Worker Threads, browser Web Workers, and WebGPU compute shaders.
Maintainers
Readme
ParaWeb
ParaWeb is a TypeScript library of ten parallel programming patterns for Node.js Worker Threads, browser Web Workers, and WebGPU compute shaders. Each pattern exposes three implementation variants (Message Passing, SharedArrayBuffer, GPU) under a single calling convention, so switching between them is a single property access.
Live demos: https://paraweb-js.vercel.app — interactive per-pattern demos, an all-pattern benchmark dashboard, and an image-convolution case study, running entirely in the browser on MP / Shared / GPU.
Patterns
| Pattern | Description | Default variant |
|---|---|---|
| map | Transform each element | Shared |
| filter | Keep elements matching a predicate | Shared |
| reduce | Aggregate to a single value (associative operator) | MP |
| scan | Inclusive prefix scan (associative operator) | MP |
| scatter | Redistribute values by an index array | MP |
| farm | Distribute variable-cost tasks across workers | Shared |
| pipeline | Sequential stages with intra-stage data parallelism | MP |
| divideAndConquer | Recursive problem decomposition | MP |
| stencil | Neighborhood computation with overlap regions | MP |
| mapReduce | Fused map then reduce | Shared |
Installation
npm install paraweb-jsThe Shared variants require SharedArrayBuffer. In Node.js this is available by default. In the browser, the page must be served with cross-origin isolation headers (Cross-Origin-Opener-Policy: same-origin and Cross-Origin-Embedder-Policy: require-corp).
The GPU variants require WebGPU. In modern browsers (Chrome 113+, Edge 113+, Firefox 121+, Safari 18+) this is built in. In Node.js, install the optional webgpu peer dependency (Dawn-based).
Quick start
Switching between the three variants is a single property access. The call site is otherwise identical.
import paraweb from "paraweb-js";
const f = (x: number) => Math.sin(x) * Math.cos(x * 0.5);
const input = Array.from({ length: 1_000_000 }, (_, i) => i);
const y1 = await paraweb.mp.map(f, input); // Worker Threads, postMessage
const y2 = await paraweb.shared.map(f, input); // SharedArrayBuffer, zero-copy
const y3 = await paraweb.gpu.map( // WebGPU compute shader
{ wgsl: "sin(x) * cos(x * 0.5)" }, input);The bare paraweb.<pattern> entry points select the empirically best variant per pattern (see the paper's Section 6 for the full evaluation):
const y = await paraweb.map(f, input); // uses Shared internallyComposition
Patterns return Promises and compose with ordinary JavaScript control flow:
import paraweb from "paraweb-js";
const denoise = (x: number) => 0.25 * x + 0.5 * x + 0.25 * x;
const features = (x: number) => Math.tanh(x);
const classify = (x: number) => x > 0.5 ? 1 : 0;
const result = await paraweb.pipeline(
[denoise, features, classify], pixels, /* threads */ 16);Public API
// Data-parallel
paraweb.map(fn, input, threads?);
paraweb.filter(pred, input, threads?);
paraweb.reduce(op, input, identity, threads?);
paraweb.scan(op, input, identity, threads?);
paraweb.scatter(input, indices, default?, conflictFn?, threads?);
// Task-parallel
paraweb.farm(fn, input, threads?);
paraweb.pipeline(stages, input, threads?);
paraweb.divideAndConquer(divideFn, conquerFn, baseFn, input, threads?);
// Specialized
paraweb.stencil(fn, input, window, threads?, edgeOption?);
paraweb.mapReduce(mapFn, reduceOp, input, threads?);Each call returns a Promise. threads defaults to the number of available CPU cores.
Constraints
- The user-supplied function passed to a worker must be self-contained. Functions are reconstructed inside workers via
new Function(), which does not preserve the caller's lexical scope. References to external helpers or to captured variables from the enclosing scope will throw at execution time. Inline the helper logic inside the function body, or pass additional data as part of the input array. - The Shared variants restrict input to numeric arrays (
Float64Array-compatible) becauseSharedArrayBufferrequires typed-array views. - The GPU variants accept a WGSL expression string or a built-in operation name instead of a JavaScript function, because the body executes on the device.
Building from source
npm install
npm run build
npm run test:functionalTo run the benchmark suite from the paper:
npm run test:benchmarkResults
The figures below summarise the performance evaluation reported in the paper. The full evaluation is reproducible from this repository: run npm run test:benchmark for Node.js results and open the browser benchmark dashboard for the in-browser numbers.
Figure 2 — Speedup across all ten patterns (Node.js, CPU variants): speedup over a sequential JavaScript baseline, log-scale y-axis, plotted across data sizes (Medium / Large / Extremely Large) and thread counts (2--16). Most compute-bound patterns reach 9--12x at 16 threads on the Shared variant; Reduce/Scan are bandwidth-bound below an arithmetic-intensity threshold; D&C plateaus at ~2.7x because of FFT's intrinsic cross-stage synchronization.

Figure 3 — Real-time image convolution case study (3840x2160): speedup over a sequential CPU baseline as the kernel radius grows from 1 to 20 on a 4K image. The non-separable emboss filter reaches up to 414x on WebGPU at radius 20, where the GPU's 5,120 parallel lanes absorb the quadratic per-pixel cost.

Citing
If you use ParaWeb in academic work, please cite:
Memeti, S. ParaWeb: Parallel Programming Patterns for Web Development.
International Journal of Parallel Programming, HLPP 2026 special issue (forthcoming).License
MIT © Suejb Memeti
