tfjs-turbo
v2.0.0
Published
High-performance TensorFlow.js bridge for Node.js and Bun via WebGPU/WebGL/WASM
Maintainers
Readme
tfjs-turbo
High-performance TensorFlow.js bridge for Node.js and Bun via WebGPU/WebGL/WASM
💡 Alternative to tfjs-node. Native bindings can be brittle across OS/Node versions. tfjs-turbo provides GPU acceleration through headless Chrome instead.
Why tfjs-turbo?
- No native bindings hassle - tfjs-node relies on
tfjs_binding.nodewhich can fail on some OS/Node ABI combos - GPU acceleration - WebGPU, WebGL, WASM backends via Chrome
- Universal runtime - Works on Node.js 18+ and Bun 1.0+
- Checkpoint & Resume - Never lose training progress
- Callback bridge - Real training events in Node.js
- Auto backend selection - Picks the best backend for your workload
tfjs-node vs tfjs-turbo
# Check current tfjs-node status yourself:
npm view @tensorflow/tfjs-node version
npm view @tensorflow/tfjs-node deprecated| Aspect | tfjs-node | tfjs-turbo | |--------|-----------|------------| | Install | Requires native build (node-gyp) | Pure JS (Puppeteer) | | GPU | CUDA only (Linux) | WebGPU/WebGL (cross-platform) | | Maintenance | Prebuilt binaries may be missing | Works anywhere Chrome runs | | Bun | ❌ Not supported | ✅ Full support |
Installation
npm install tfjs-turbo
# or
bun add tfjs-turboQuick Start
import { TensorFlow } from 'tfjs-turbo';
const tf = new TensorFlow({ backend: 'wasm' });
await tf.ready();
// Train a model
const result = await tf.train({
layers: [
{ type: 'dense', units: 64, activation: 'relu', inputShape: [10] },
{ type: 'dense', units: 1, activation: 'sigmoid' }
],
compile: { optimizer: 'adam', loss: 'binaryCrossentropy' },
data: { samples: 1000, features: 10, type: 'binary' },
fit: { epochs: 10, verbose: true }
});
// Save the model
tf.save(result.model, './my-model');
await tf.close();Features
🎯 Backend Options
| Backend | Best For | Speed |
|---------|----------|-------|
| wasm | Small/medium models, CPU-only | ⭐⭐⭐⭐ |
| webgl | Medium models, GPU acceleration | ⭐⭐⭐ |
| webgpu | Large models, modern GPU | ⭐⭐⭐⭐⭐ |
| cpu | Debugging, compatibility | ⭐⭐ |
| auto | Auto-select based on workload | Smart! |
💾 Checkpoint & Resume
Training interrupted? Resume exactly where you left off:
// First training session
const result = await tf.train({
layers: [...],
fit: { epochs: 50 },
checkpoint: {
enabled: true,
everyNEpoch: 5, // Save every 5 epochs
key: 'my-training', // Unique identifier
includeOptimizer: true, // Save optimizer state
resumePolicy: 'same-bucket' // Validate architecture match
}
});
// Later... resume from checkpoint
const resumed = await tf.resume({
layers: [...], // Same architecture
fit: { epochs: 100 },
checkpoint: { key: 'my-training' }
});
// Automatically continues from last saved epoch!Resume Policies:
same-bucket(default) - Validates architecture and bucket shape matchforce- Resume without validation (use with caution)
⚠️ IMPORTANT: Checkpoints are stored in IndexedDB inside Chrome. To persist checkpoints between sessions, you MUST use
profileDir:const tf = new TensorFlow({ profileDir: './my-training-profile' // Enables checkpoint persistence });
📡 Callback Bridge
Get real training events in Node.js:
await tf.train({
layers: [...],
callbacks: {
onTrainBegin: (logs) => console.log('Training started!'),
onEpochBegin: (epoch) => console.log(`Epoch ${epoch + 1} starting...`),
onEpochEnd: (epoch, logs) => {
console.log(`Epoch ${epoch + 1}: loss=${logs.loss.toFixed(4)}`);
},
onBatchEnd: (batch, logs) => {
// Progress tracking
},
onTrainEnd: (logs) => console.log(`Done in ${logs.time}ms`)
}
});📦 Bucket System (Text/Image/Audio)
Efficiently batch variable-length inputs:
import { BucketManager } from 'tfjs-turbo';
const buckets = new BucketManager({
textBuckets: [256, 512, 1024, 2048],
imageBuckets: [[224, 224], [384, 384], [512, 512]],
audioBuckets: [16000, 32000, 64000],
maxCachedModels: 4 // LRU cache
});
// Select bucket for input
const textBucket = buckets.selectTextBucket(350); // → 512
const imageBucket = buckets.selectImageBucket(300, 400); // → [512, 512]
// Pad sequences
const { padded, mask } = BucketManager.padSequence([1, 2, 3], 10);⚙️ Configuration
const tf = new TensorFlow({
// Backend
backend: 'wasm', // 'webgpu' | 'webgl' | 'wasm' | 'cpu' | 'auto'
version: 'latest', // TensorFlow.js version
// Stability
timeout: 60000, // Navigation timeout (ms)
protocolTimeout: 0, // CDP timeout (0 = unlimited)
profileDir: null, // Chrome profile for persistent IndexedDB
// Memory
memory: 'auto', // Max memory MB ('auto' = 4GB, max 8GB)
threads: 'auto', // WASM threads ('auto' = CPU cores)
// Performance
simd: true, // WASM SIMD acceleration
turboMode: false, // Aggressive GPU flags (may be unstable)
autoScope: true, // Auto memory management
// Debug
verbose: false,
debug: false,
headless: true
});🔍 Extended Info
const info = await tf.getInfo();
console.log(info);
// {
// backend: 'wasm',
// version: '4.22.0',
// memory: { numTensors: 5, numBytes: 1024, ... },
// flags: { ... },
// uptime: 12345,
// actualBackend: 'wasm',
// webgpu: { vendor, architecture, device } // if webgpu
// }🌊 Streaming Predict
Memory-efficient prediction for large datasets:
const inputs = Array.from({ length: 10000 }, () => [1, 2, 3, 4, 5]);
for await (const batch of tf.predictStream(model, inputs, { batchSize: 256 })) {
console.log(`Batch ${batch.batchIndex + 1}/${batch.totalBatches}`);
console.log(`Progress: ${(batch.progress * 100).toFixed(0)}%`);
// Process batch.predictions
}API Reference
TensorFlow Class
| Method | Description |
|--------|-------------|
| ready() | Initialize backend and browser |
| train(config) | Train a model |
| resume(config) | Resume training from checkpoint |
| predict(model, inputs) | Run inference |
| predictStream(model, inputs, opts) | Streaming inference |
| evaluate(model, x, y, opts) | Evaluate model |
| save(model, path) | Save to filesystem |
| load(path) | Load from filesystem |
| summary(model) | Get model summary |
| run(code) | Execute raw TF.js code |
| warmup(iterations) | JIT warmup |
| benchmark(opts) | Backend comparison |
| getInfo() | Get runtime info |
| disposeAll() | Clean all tensors |
| close() | Shutdown |
BucketManager Class
| Method | Description |
|--------|-------------|
| selectTextBucket(length) | Select bucket for token sequence |
| selectImageBucket(h, w) | Select bucket for image dimensions |
| selectAudioBucket(samples) | Select bucket for audio samples |
| setInCache(arch, bucket, backend, model) | Cache a model |
| getFromCache(arch, bucket, backend) | Retrieve cached model |
| getStats() | Cache statistics |
Examples
The examples/ folder contains comprehensive examples that double as tests:
| File | Description |
|------|-------------|
| 01_basic_lifecycle.js | Lifecycle: ready, close, getInfo |
| 02_training_custom_data.js | XOR training with custom data + object optimizer |
| 03_training_callbacks.js | Callback bridge (onEpochEnd, onTrainEnd) |
| 04_checkpoint_resume.js | Real crash simulation: checkpoint survives process death |
| 05_model_io.js | Save/Load model + summary |
| 06_inference.js | predict, predictStream, evaluate |
| 07_advanced_run_tidy.js | run() and tidy() with { tf } destructuring |
| 08_memory_management.js | Memory tracking and leak detection |
| 09_data_utils.js | Data preprocessing utilities |
| 10_bucket_manager.js | Shape bucketing for variable inputs |
| 11_profiler_utilities.js | Profiler, EarlyStopping, LR schedulers |
| 12_constants_exports.js | All exports verification |
| run_all.js | CI runner - fail-fast test suite |
# Run all examples (CI mode)
bun examples/run_all.jsKnown Limitations
- WebGPU context lost - Long training may trigger GPU reset. Use checkpoints!
- Large weights transfer - Very large models (>500MB) may be slow via CDP
- Browser required - Needs Chrome/Chromium (Puppeteer handles this)
- RAM overhead - Chrome process uses ~100-300MB baseline RAM
Memory Usage Guide
| State | Expected RAM | |-------|--------------| | Idle (after ready) | ~150-200MB | | Small model training | ~200-400MB | | Large model training | ~400-800MB | | After close() | ~0MB (browser terminated) |
Best practices:
- Call
disposeAll()between training runs - Use
close()when done - Enable checkpoints for long training
- Monitor with
trackMemory()+getMemoryReport()
Troubleshooting
"WebGPU not available"
WebGPU requires Chrome 113+. On Linux, add --enable-unsafe-webgpu flag:
const tf = new TensorFlow({
backend: 'webgpu',
browserArgs: ['--enable-unsafe-webgpu']
});Fallback: Use backend: 'webgl' or backend: 'wasm'.
"CDP timeout" or "Protocol timeout"
For long training sessions, set unlimited timeout:
const tf = new TensorFlow({ protocolTimeout: 0 });"No usable sandbox"
Puppeteer sandbox issue. Already handled with --no-sandbox flag.
If issues persist on Linux, set: CHROME_DEVEL_SANDBOX=/usr/local/sbin/chrome-devel-sandbox
"Context lost" during training
GPU driver reset. Use checkpoints to recover:
const tf = new TensorFlow({ backend: 'webgpu' });
await tf.train({
// ...
checkpoint: { enabled: true, everyNEpoch: 5, key: 'my-training' }
});High RAM usage
- tfjs-turbo runs Chrome + TF.js = ~300-500MB baseline
- Use
backend: 'wasm'for lower memory footprint - Call
await tf.disposeAll()to clear tensors - Reduce
batchSizefor large models
Security Note
⚠️ tf.run() executes arbitrary code via eval()
DO NOT:
- Pass user input to
tf.run() - Run untrusted code
- Expose
tf.run()to external APIs
SAFE:
// Your own trusted code only
await tf.run(async ({ tf }) => {
const a = tf.tensor([1, 2, 3]);
return a.sum().dataSync()[0];
});Note: Benchmark utilities also use internal
evalfor controlled scripts (no untrusted input).
Roadmap
- [x] Checkpoint & Resume (IndexedDB)
- [x] Callback Bridge
- [x] Bucket System
- [x] Auto Backend Selection
- [x] Streaming Predict
- [x] Memory Leak Detector
- [ ] Workload-aware AutoTuner
- [ ] Filesystem Checkpoint Export (save to disk, resume from IDB)
TypeScript Support
Full TypeScript definitions included:
import { TensorFlow, TrainConfig, ModelData } from 'tfjs-turbo';
const tf = new TensorFlow({ backend: 'wasm' });
// Full autocomplete and type safety!Contributing
Contributions welcome! Please open an issue first.
License
MIT
