tfjs-turbo

v2.0.0

Published

5 days ago

High-performance TensorFlow.js bridge for Node.js and Bun via WebGPU/WebGL/WASM

0High
0Medium
0Low

tensorflow tfjs tensorflow.js webgpu webgl wasm gpu machine-learning deep-learning neural-network ai ml bun puppeteer training inference

tfjs-turbo

High-performance TensorFlow.js bridge for Node.js and Bun via WebGPU/WebGL/WASM

💡 Alternative to tfjs-node. Native bindings can be brittle across OS/Node versions. tfjs-turbo provides GPU acceleration through headless Chrome instead.

Why tfjs-turbo?

No native bindings hassle - tfjs-node relies on tfjs_binding.node which can fail on some OS/Node ABI combos
GPU acceleration - WebGPU, WebGL, WASM backends via Chrome
Universal runtime - Works on Node.js 18+ and Bun 1.0+
Checkpoint & Resume - Never lose training progress
Callback bridge - Real training events in Node.js
Auto backend selection - Picks the best backend for your workload

tfjs-node vs tfjs-turbo

# Check current tfjs-node status yourself:
npm view @tensorflow/tfjs-node version
npm view @tensorflow/tfjs-node deprecated

| Aspect | tfjs-node | tfjs-turbo | |--------|-----------|------------| | Install | Requires native build (node-gyp) | Pure JS (Puppeteer) | | GPU | CUDA only (Linux) | WebGPU/WebGL (cross-platform) | | Maintenance | Prebuilt binaries may be missing | Works anywhere Chrome runs | | Bun | ❌ Not supported | ✅ Full support |

Installation

npm install tfjs-turbo
# or
bun add tfjs-turbo

Quick Start

import { TensorFlow } from 'tfjs-turbo';

const tf = new TensorFlow({ backend: 'wasm' });
await tf.ready();

// Train a model
const result = await tf.train({
    layers: [
        { type: 'dense', units: 64, activation: 'relu', inputShape: [10] },
        { type: 'dense', units: 1, activation: 'sigmoid' }
    ],
    compile: { optimizer: 'adam', loss: 'binaryCrossentropy' },
    data: { samples: 1000, features: 10, type: 'binary' },
    fit: { epochs: 10, verbose: true }
});

// Save the model
tf.save(result.model, './my-model');

await tf.close();

Features

🎯 Backend Options

| Backend | Best For | Speed | |---------|----------|-------| | wasm | Small/medium models, CPU-only | ⭐⭐⭐⭐ | | webgl | Medium models, GPU acceleration | ⭐⭐⭐ | | webgpu | Large models, modern GPU | ⭐⭐⭐⭐⭐ | | cpu | Debugging, compatibility | ⭐⭐ | | auto | Auto-select based on workload | Smart! |

💾 Checkpoint & Resume

Training interrupted? Resume exactly where you left off:

// First training session
const result = await tf.train({
    layers: [...],
    fit: { epochs: 50 },
    checkpoint: {
        enabled: true,
        everyNEpoch: 5,           // Save every 5 epochs
        key: 'my-training',       // Unique identifier
        includeOptimizer: true,   // Save optimizer state
        resumePolicy: 'same-bucket' // Validate architecture match
    }
});

// Later... resume from checkpoint
const resumed = await tf.resume({
    layers: [...],  // Same architecture
    fit: { epochs: 100 },
    checkpoint: { key: 'my-training' }
});
// Automatically continues from last saved epoch!

Resume Policies:

same-bucket (default) - Validates architecture and bucket shape match
force - Resume without validation (use with caution)

⚠️ IMPORTANT: Checkpoints are stored in IndexedDB inside Chrome. To persist checkpoints between sessions, you MUST use profileDir:
const tf = new TensorFlow({
    profileDir: './my-training-profile'  // Enables checkpoint persistence
});

📡 Callback Bridge

Get real training events in Node.js:

await tf.train({
    layers: [...],
    callbacks: {
        onTrainBegin: (logs) => console.log('Training started!'),
        onEpochBegin: (epoch) => console.log(`Epoch ${epoch + 1} starting...`),
        onEpochEnd: (epoch, logs) => {
            console.log(`Epoch ${epoch + 1}: loss=${logs.loss.toFixed(4)}`);
        },
        onBatchEnd: (batch, logs) => {
            // Progress tracking
        },
        onTrainEnd: (logs) => console.log(`Done in ${logs.time}ms`)
    }
});

📦 Bucket System (Text/Image/Audio)

Efficiently batch variable-length inputs:

import { BucketManager } from 'tfjs-turbo';

const buckets = new BucketManager({
    textBuckets: [256, 512, 1024, 2048],
    imageBuckets: [[224, 224], [384, 384], [512, 512]],
    audioBuckets: [16000, 32000, 64000],
    maxCachedModels: 4  // LRU cache
});

// Select bucket for input
const textBucket = buckets.selectTextBucket(350);  // → 512
const imageBucket = buckets.selectImageBucket(300, 400);  // → [512, 512]

// Pad sequences
const { padded, mask } = BucketManager.padSequence([1, 2, 3], 10);

⚙️ Configuration

const tf = new TensorFlow({
    // Backend
    backend: 'wasm',        // 'webgpu' | 'webgl' | 'wasm' | 'cpu' | 'auto'
    version: 'latest',      // TensorFlow.js version
    
    // Stability
    timeout: 60000,         // Navigation timeout (ms)
    protocolTimeout: 0,     // CDP timeout (0 = unlimited)
    profileDir: null,       // Chrome profile for persistent IndexedDB
    
    // Memory
    memory: 'auto',         // Max memory MB ('auto' = 4GB, max 8GB)
    threads: 'auto',        // WASM threads ('auto' = CPU cores)
    
    // Performance
    simd: true,             // WASM SIMD acceleration
    turboMode: false,       // Aggressive GPU flags (may be unstable)
    autoScope: true,        // Auto memory management
    
    // Debug
    verbose: false,
    debug: false,
    headless: true
});

🔍 Extended Info

const info = await tf.getInfo();
console.log(info);
// {
//   backend: 'wasm',
//   version: '4.22.0',
//   memory: { numTensors: 5, numBytes: 1024, ... },
//   flags: { ... },
//   uptime: 12345,
//   actualBackend: 'wasm',
//   webgpu: { vendor, architecture, device }  // if webgpu
// }

🌊 Streaming Predict

Memory-efficient prediction for large datasets:

const inputs = Array.from({ length: 10000 }, () => [1, 2, 3, 4, 5]);

for await (const batch of tf.predictStream(model, inputs, { batchSize: 256 })) {
    console.log(`Batch ${batch.batchIndex + 1}/${batch.totalBatches}`);
    console.log(`Progress: ${(batch.progress * 100).toFixed(0)}%`);
    // Process batch.predictions
}

API Reference

TensorFlow Class

| Method | Description | |--------|-------------| | ready() | Initialize backend and browser | | train(config) | Train a model | | resume(config) | Resume training from checkpoint | | predict(model, inputs) | Run inference | | predictStream(model, inputs, opts) | Streaming inference | | evaluate(model, x, y, opts) | Evaluate model | | save(model, path) | Save to filesystem | | load(path) | Load from filesystem | | summary(model) | Get model summary | | run(code) | Execute raw TF.js code | | warmup(iterations) | JIT warmup | | benchmark(opts) | Backend comparison | | getInfo() | Get runtime info | | disposeAll() | Clean all tensors | | close() | Shutdown |

BucketManager Class

| Method | Description | |--------|-------------| | selectTextBucket(length) | Select bucket for token sequence | | selectImageBucket(h, w) | Select bucket for image dimensions | | selectAudioBucket(samples) | Select bucket for audio samples | | setInCache(arch, bucket, backend, model) | Cache a model | | getFromCache(arch, bucket, backend) | Retrieve cached model | | getStats() | Cache statistics |

Examples

The examples/ folder contains comprehensive examples that double as tests:

| File | Description | |------|-------------| | 01_basic_lifecycle.js | Lifecycle: ready, close, getInfo | | 02_training_custom_data.js | XOR training with custom data + object optimizer | | 03_training_callbacks.js | Callback bridge (onEpochEnd, onTrainEnd) | | 04_checkpoint_resume.js | Real crash simulation: checkpoint survives process death | | 05_model_io.js | Save/Load model + summary | | 06_inference.js | predict, predictStream, evaluate | | 07_advanced_run_tidy.js | run() and tidy() with { tf } destructuring | | 08_memory_management.js | Memory tracking and leak detection | | 09_data_utils.js | Data preprocessing utilities | | 10_bucket_manager.js | Shape bucketing for variable inputs | | 11_profiler_utilities.js | Profiler, EarlyStopping, LR schedulers | | 12_constants_exports.js | All exports verification | | run_all.js | CI runner - fail-fast test suite |

# Run all examples (CI mode)
bun examples/run_all.js

Known Limitations

WebGPU context lost - Long training may trigger GPU reset. Use checkpoints!
Large weights transfer - Very large models (>500MB) may be slow via CDP
Browser required - Needs Chrome/Chromium (Puppeteer handles this)
RAM overhead - Chrome process uses ~100-300MB baseline RAM

Memory Usage Guide

| State | Expected RAM | |-------|--------------| | Idle (after ready) | ~150-200MB | | Small model training | ~200-400MB | | Large model training | ~400-800MB | | After close() | ~0MB (browser terminated) |

Best practices:

Call disposeAll() between training runs
Use close() when done
Enable checkpoints for long training
Monitor with trackMemory() + getMemoryReport()

Troubleshooting

"WebGPU not available"

WebGPU requires Chrome 113+. On Linux, add --enable-unsafe-webgpu flag:

const tf = new TensorFlow({
    backend: 'webgpu',
    browserArgs: ['--enable-unsafe-webgpu']
});

Fallback: Use backend: 'webgl' or backend: 'wasm'.

"CDP timeout" or "Protocol timeout"

For long training sessions, set unlimited timeout:

const tf = new TensorFlow({ protocolTimeout: 0 });

"No usable sandbox"

Puppeteer sandbox issue. Already handled with --no-sandbox flag. If issues persist on Linux, set: CHROME_DEVEL_SANDBOX=/usr/local/sbin/chrome-devel-sandbox

"Context lost" during training

GPU driver reset. Use checkpoints to recover:

const tf = new TensorFlow({ backend: 'webgpu' });
await tf.train({
    // ...
    checkpoint: { enabled: true, everyNEpoch: 5, key: 'my-training' }
});

High RAM usage

tfjs-turbo runs Chrome + TF.js = ~300-500MB baseline
Use backend: 'wasm' for lower memory footprint
Call await tf.disposeAll() to clear tensors
Reduce batchSize for large models

Security Note

⚠️ tf.run() executes arbitrary code via eval()

DO NOT:

Pass user input to tf.run()
Run untrusted code
Expose tf.run() to external APIs

SAFE:

// Your own trusted code only
await tf.run(async ({ tf }) => {
    const a = tf.tensor([1, 2, 3]);
    return a.sum().dataSync()[0];
});

Note: Benchmark utilities also use internal eval for controlled scripts (no untrusted input).

Roadmap

[x] Checkpoint & Resume (IndexedDB)
[x] Callback Bridge
[x] Bucket System
[x] Auto Backend Selection
[x] Streaming Predict
[x] Memory Leak Detector
[ ] Workload-aware AutoTuner
[ ] Filesystem Checkpoint Export (save to disk, resume from IDB)

TypeScript Support

Full TypeScript definitions included:

import { TensorFlow, TrainConfig, ModelData } from 'tfjs-turbo';

const tf = new TensorFlow({ backend: 'wasm' });
// Full autocomplete and type safety!

Contributing

Contributions welcome! Please open an issue first.

License

MIT

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

tfjs-turbo

Why tfjs-turbo?

tfjs-node vs tfjs-turbo

Installation

Quick Start

Features

🎯 Backend Options

💾 Checkpoint & Resume

📡 Callback Bridge

📦 Bucket System (Text/Image/Audio)

⚙️ Configuration

🔍 Extended Info

🌊 Streaming Predict

API Reference

TensorFlow Class

BucketManager Class

Examples

Known Limitations

Memory Usage Guide

Troubleshooting

"WebGPU not available"

"CDP timeout" or "Protocol timeout"

"No usable sandbox"

"Context lost" during training

High RAM usage

Security Note

Roadmap

TypeScript Support

Contributing

License