@cooljapan/trustformers
v0.1.0
Published
WebAssembly bindings for TrustformeRS transformer library
Maintainers
Readme
TrustformeRS WebAssembly
WebAssembly bindings for the TrustformeRS transformer library, enabling transformer models to run directly in web browsers and Node.js environments with full WebGPU hardware acceleration.
Version: 0.1.0 | Status: Stable | Tests: 128 | SLoC: 55,504 | Last Updated: 2026-03-21
Features
- WebGPU Backend: 50-100x speedup over CPU via GPU compute shaders (wgpu 29.0 API)
- Web Workers Parallelism: Multi-threaded inference via SharedArrayBuffer
- IndexedDB Caching: Persistent model and KV-cache storage in the browser
- BERT WASM Model: Complete BERT implementation running in-browser
- React/Vue/Angular/Web Components: First-class framework bindings
- Streaming Inference: Token-by-token generation with streaming API
- SIMD Support: Hardware-accelerated tensor ops where available
- Mobile Optimization: Battery-aware, network-adaptive loading
- SciRS2 Integration: scirs2-core tensor operations in WASM
Building
Prerequisites
- Rust (latest stable)
- wasm-pack (
curl https://rustwasm.github.io/wasm-pack/installer/init.sh -sSf | sh)
Build Commands
# Build for all targets
./build.sh
# Or build individually:
wasm-pack build --target web --out-dir pkg-web
wasm-pack build --target bundler --out-dir pkg-bundler
wasm-pack build --target nodejs --out-dir pkg-nodeUsage
Browser (Direct)
<script type="module">
import init, { TrustformersWasm, WasmTensor } from './pkg-web/trustformers_wasm.js';
async function run() {
await init();
const tf = new TrustformersWasm();
console.log('Version:', tf.version); // "0.1.0"
// Create and manipulate tensors
const tensor = WasmTensor.new([1, 2, 3, 4], [2, 2]);
const result = tensor.add(tensor);
console.log('Result:', result.data);
}
run();
</script>Node.js
const { TrustformersWasm, WasmTensor } = require('./pkg-node/trustformers_wasm.js');
const tf = new TrustformersWasm();
const tensor = WasmTensor.new([1, 2, 3, 4], [2, 2]);
console.log(tensor.toString());Webpack/Bundler
import * as wasm from './pkg-bundler/trustformers_wasm';
async function run() {
await wasm.default();
const tf = new wasm.TrustformersWasm();
// Use the library...
}API Overview
Core Classes
TrustformersWasm
Main entry point for the library.
const tf = new TrustformersWasm();
console.log(tf.version); // "0.1.0"
console.log(tf.initialized); // trueWasmTensor
Core tensor operations.
// Creation
const a = WasmTensor.new([1, 2, 3, 4], [2, 2]);
const b = WasmTensor.zeros([3, 3]);
const c = WasmTensor.ones([2, 4]);
const d = WasmTensor.randn([5, 5]);
// Operations
const sum = a.add(b);
const prod = a.matmul(b);
const transposed = a.transpose();
// Activations
const relu_out = a.relu();
const gelu_out = a.gelu();
const softmax_out = a.softmax(-1);Linear
Fully connected layer.
const linear = new Linear(input_size, output_size, use_bias);
const output = linear.forward(input_tensor);BertModelWasm
BERT model running entirely in WASM.
const config = BertConfig.tiny();
const model = new BertModelWasm(config);
const output = model.forward(input_ids, attention_mask);WebGPU Backend
import { WebGpuInference, StreamingGenerator } from './pkg-web/trustformers_wasm.js';
// Initialize WebGPU (50-100x speedup vs CPU)
const inference = await WebGpuInference.new();
// Streaming token generation
const generator = new StreamingGenerator(inference, model_id);
for await (const token of generator.stream(prompt)) {
process.stdout.write(token);
}Framework Bindings
React
import { useTrustformers, TrustformersProvider } from 'trustformers-react';
function App() {
const { model, generate, isLoading } = useTrustformers('bert-base');
return (
<TrustformersProvider>
<InferenceComponent model={model} onGenerate={generate} />
</TrustformersProvider>
);
}Vue
import { useTrustformers } from 'trustformers-vue';
export default {
setup() {
const { model, tokenizer, generate } = useTrustformers('bert-base');
return { model, generate };
}
}Angular
import { TrustformersService } from 'trustformers-angular';
@Injectable({ providedIn: 'root' })
export class AppComponent {
constructor(private tf: TrustformersService) {}
async generate(prompt: string) {
return this.tf.generate(prompt).pipe(toArray()).toPromise();
}
}Web Components
<trustformers-inference-engine model="bert-base"></trustformers-inference-engine>
<trustformers-model-loader src="./models/bert.bin"></trustformers-model-loader>
<trustformers-performance-monitor></trustformers-performance-monitor>Utilities
// Performance measurement
const timer = new Timer("My Operation");
// ... do work ...
console.log(`Elapsed: ${timer.elapsed()}ms`);
// Memory statistics
const stats = get_memory_stats();
console.log(`Memory used: ${stats.used_mb} MB`);
// Feature detection
console.log(`SIMD enabled: ${enable_simd()}`);
console.log(`Features: ${features()}`);Feature Flags
webgpu— WebGPU compute shader backend (wgpu 29.0)web-workers— Web Workers parallelism (SharedArrayBuffer)shared-memory— Shared memory for multi-threaded WASMkernel-fusion— Fused transformer kernels (MHA, FFN, LayerNorm+Residual)async-executor— Async Rust executor for WASMindexeddb— IndexedDB model and KV-cache persistencememory64— WASM memory64 for models >4GBstreaming-loader— Progressive chunked model loadingreact-components— React hooks and component libraryvue-components— Vue composables and pluginangular-components— Angular services and directivesweb-components— Framework-agnostic custom elementsplayground— Interactive browser playgroundstreaming-generation— Token-by-token streaming inferencemobile-optimization— Battery/network-adaptive loadingscirs2— SciRS2-core tensor operations
WebGPU Notes (wgpu 29.0)
This crate targets wgpu 29.0 with the following API specifics:
InstanceDescriptor::new_without_display_handle()— headless instance creationbind_group_layoutsaccepts&[Option<&BindGroupLayout>]for sparse layouts- Kernel fusion enabled for MHA (2.5x), FFN (1.8x), LayerNorm+Residual (1.5x)
Examples
See the examples/ directory for complete examples:
index.html/playground.html— Interactive browser demodemo/— Full-featured playground application- Node.js example in
examples/
Performance Tips
- Enable WebGPU: Use Chrome 113+ / Edge 113+ for 50-100x speedup
- Enable SIMD: Compile with WASM SIMD128 target feature
- Batch operations: Process multiple inputs together
- Use IndexedDB caching: Avoid re-downloading models between sessions
- Enable kernel fusion:
webgpu+kernel-fusionfeatures - Reuse tensors: Minimize allocations in hot loops
Testing
# Run WASM tests
wasm-pack test --headless --firefox --chrome
# Run with specific features
cargo test --target wasm32-unknown-unknown --features webgpu
# Check compilation
cargo check --target wasm32-unknown-unknown128 unit tests with 100% pass rate, covering:
- Core tensor operations
- WebGPU backend (mock device)
- BERT forward pass
- Framework binding contracts
- Streaming generation
- IndexedDB model cache
Limitations
- WebGPU requires Chrome 113+, Edge 113+, or Safari (experimental)
- SharedArrayBuffer requires cross-origin isolation headers
- SIMD requires WASM SIMD128 browser support
- Memory typically capped at 2-4GB (use
memory64+ quantization for large models)
License
Apache-2.0
