webinfer-js
v0.1.0
Published
Lightweight, high-performance browser ML inference framework with native concurrency support
Maintainers
Readme
WebInfer.js
Browser ML inference framework with task scheduling and smart caching.
Documentation · Examples · API Reference · English | 中文
✨ Features
- 📋 Task Scheduler - Priority queue, concurrency control, task cancellation
- 🔄 Batch Processing - Efficient batch inference out of the box
- 💾 Memory Management - Automatic memory tracking and cleanup with scopes
- 📥 Smart Model Loading - Preloading, sharding, resume download support
- 💿 Offline Caching - IndexedDB-based model caching for offline use
- ⚡ Multi-Backend - WebGPU, WebNN, WASM with automatic fallback
- 🤗 HuggingFace Hub - Direct model download with one line
- 🔤 Real Tokenizers - BPE & WordPiece tokenizers, load tokenizer.json directly
- 👷 Web Worker Support - Run inference in background threads
- 📦 Batteries Included - ONNX Runtime bundled, zero configuration needed
- 🎯 TypeScript First - Full type support with intuitive APIs
📦 Installation
npm install WebInferjsyarn add WebInferjspnpm add WebInferjsNote: ONNX Runtime is included as a dependency. No additional setup required.
🚀 Quick Start
Try the Demo
Run the interactive demo locally to test all features:
# Clone and install
git clone https://github.com/hanschurer/WebInfer.js.git
cd WebInfer.js
npm install
# Build and start demo server
npm run demoOpen http://localhost:3000 in your browser:
Load Model - Enter a Hugging Face ONNX model URL and click "Load Model"
https://huggingface.co/Xenova/distilbert-base-uncased-finetuned-sst-2-english/resolve/main/onnx/model_quantized.onnxTest Features:
- 🧮 Tensor Operations - Test tensor creation, math ops, softmax, relu
- 📝 Text Classification - Run sentiment analysis on text
- 🔍 Feature Extraction - Extract embeddings from text
- 🏷️ Named Entity Recognition - Extract entities (PER, ORG, LOC, MISC) from text
- ⚡ Task Scheduling - Test priority-based scheduling
- 📋 Task Scheduler - Test priority-based task scheduling
- 💾 Memory Management - Test allocation and cleanup
Basic Usage
import { pipeline } from 'WebInferjs';
// Create a sentiment analysis pipeline
const sentiment = await pipeline('sentiment-analysis');
// Run inference
const result = await sentiment.run('I love this product!');
console.log(result);
// { label: 'positive', score: 0.98, processingTime: 12.5 }Batch Processing
// Native batch processing support
const results = await sentiment.run([
'This is amazing!',
'This is terrible.',
'It\'s okay I guess.'
]);
console.log(results);
// [
// { label: 'positive', score: 0.95 },
// { label: 'negative', score: 0.92 },
// { label: 'neutral', score: 0.68 }
// ]Multiple Pipelines
import { pipeline } from 'WebInferjs';
// Create multiple pipelines
const classifier = await pipeline('text-classification');
const extractor = await pipeline('feature-extraction');
// Run in parallel with Promise.all
const [classification, features] = await Promise.all([
classifier.run('Sample text'),
extractor.run('Sample text')
]);Image Classification
import { pipeline } from 'WebInferjs';
const classifier = await pipeline('image-classification');
// From URL
const result = await classifier.run('https://example.com/image.jpg');
// From HTMLImageElement
const img = document.getElementById('myImage');
const result = await classifier.run(img);
// Batch
const results = await classifier.run([img1, img2, img3]);Text Generation (Streaming)
import { pipeline } from 'WebInferjs';
const generator = await pipeline('text-generation');
// Simple generation
const result = await generator.run('Once upon a time', {
maxNewTokens: 50,
temperature: 0.8,
});
console.log(result.generatedText);
// Streaming output
for await (const event of generator.stream('Hello, ')) {
process.stdout.write(event.token);
if (event.done) break;
}Zero-shot Classification
import { pipeline } from 'WebInferjs';
const classifier = await pipeline('zero-shot-classification');
const result = await classifier.classify(
'I love playing soccer on weekends',
['sports', 'politics', 'technology', 'entertainment']
);
console.log(result.labels[0], result.scores[0]);
// 'sports', 0.92Question Answering
import { pipeline } from 'WebInferjs';
const qa = await pipeline('question-answering');
const result = await qa.run({
question: 'What is the capital of France?',
context: 'Paris is the capital and largest city of France.'
});
console.log(result.answer); // 'Paris'Named Entity Recognition (NER)
import { pipeline } from 'WebInferjs';
const ner = await pipeline('token-classification');
const entities = await ner.run('Barack Obama visited Beijing and met researchers at OpenAI Labs.', {
threshold: 0.5,
entityTypes: ['PER', 'ORG', 'LOC'] // Optional: filter by entity types
});
console.log(entities);
// [
// { entity: 'PER', word: 'Barack Obama', score: 0.98, start: 0, end: 12 },
// { entity: 'LOC', word: 'Beijing', score: 0.95, start: 21, end: 28 },
// { entity: 'ORG', word: 'OpenAI Labs', score: 0.92, start: 48, end: 60 }
// ]Load from HuggingFace Hub
import { fromHub, fromTask } from 'WebInferjs';
// Load by model ID (auto-downloads model, tokenizer, config)
const bundle = await fromHub('Xenova/distilbert-base-uncased-finetuned-sst-2-english');
console.log(bundle.tokenizer); // Tokenizer instance
console.log(bundle.config); // Model config
// Load by task name (uses recommended model)
const sentimentBundle = await fromTask('sentiment-analysis');Web Workers (Background Inference)
import { runInWorker, WorkerPool, isWorkerSupported } from 'WebInferjs';
// Simple: run inference in background thread
if (isWorkerSupported()) {
const outputs = await runInWorker(modelUrl, inputs);
}
// Advanced: use worker pool for parallel processing
const pool = new WorkerPool({ numWorkers: 4 });
await pool.init();
const modelId = await pool.loadModel(modelUrl);
const results = await pool.runBatch(modelId, batchInputs);
pool.terminate();🎯 Supported Tasks
| Task | Pipeline | Status |
|------|----------|--------|
| Text Classification | text-classification | ✅ |
| Sentiment Analysis | sentiment-analysis | ✅ |
| Feature Extraction | feature-extraction | ✅ |
| Image Classification | image-classification | ✅ |
| Text Generation | text-generation | ✅ |
| Named Entity Recognition | token-classification | ✅ |
| Object Detection | object-detection | ✅ |
| Speech Recognition | automatic-speech-recognition | ✅ |
| Zero-shot Classification | zero-shot-classification | ✅ |
| Question Answering | question-answering | ✅ |
⚡ Key Differentiators
Comparison with transformers.js
| Feature | transformers.js | WebInfer.js | |---------|-----------------|-------------| | Task Scheduler | ❌ None | ✅ Priority queue with limits | | Task Cancellation | ❌ None | ✅ Cancel pending tasks | | Batch Processing | ⚠️ Manual | ✅ Built-in batching | | Memory Scopes | ❌ None | ✅ Auto cleanup with scopes | | Model Preloading | ❌ None | ✅ Background loading | | Resume Download | ❌ None | ✅ Chunked with resume | | Model Caching | ⚠️ Basic | ✅ IndexedDB with stats | | TypeScript | ✅ Full | ✅ Full |
🔧 Configuration
Runtime Selection
import { pipeline } from 'WebInferjs';
// Automatic (recommended)
const model = await pipeline('text-classification');
// Specify runtime
const model = await pipeline('text-classification', {
runtime: 'webgpu' // or 'webnn', 'wasm', 'auto'
});Memory Management
import { pipeline, getMemoryStats, gc } from 'WebInferjs';
const model = await pipeline('text-classification');
// Use the model
await model.run('text');
// Check memory usage
console.log(getMemoryStats());
// { allocated: 50MB, used: 45MB, peak: 52MB, tensorCount: 12 }
// Explicit cleanup
model.dispose();
// Force garbage collection
gc();Scheduler Configuration
import { configureScheduler } from 'WebInferjs';
configureScheduler({
maxConcurrentTasks: 4,
maxConcurrentPerModel: 1,
defaultTimeout: 30000,
enableBatching: true,
maxBatchSize: 32,
});Caching
import { pipeline, Cache } from 'WebInferjs';
// Create a cache
const cache = new Cache({
strategy: 'lru',
maxSize: 100 * 1024 * 1024, // 100MB
persistent: true, // Use IndexedDB
});
const model = await pipeline('text-classification', {
cache: true
});🛠️ Advanced Usage
Custom Model Loading
import { loadModel, runInference } from 'WebInferjs';
// Load from URL with caching, sharding, and resume support
const model = await loadModel('https://example.com/model.bin', {
runtime: 'webgpu',
quantization: 'int8',
cache: true, // Enable IndexedDB caching (default: true)
resumable: true, // Enable resume download (default: true)
chunkSize: 5 * 1024 * 1024, // 5MB chunks for large models
onProgress: (progress) => console.log(`Loading: ${progress * 100}%`)
});
// Run inference
const outputs = await runInference(model, inputs);
// Cleanup
model.dispose();Preloading Models
import { preloadModel, preloadModels, getPreloadStatus } from 'WebInferjs';
// Preload a single model in background (with priority)
preloadModel('https://example.com/model1.onnx', { priority: 10 });
// Preload multiple models
preloadModels([
{ url: 'https://example.com/model1.onnx', priority: 10 },
{ url: 'https://example.com/model2.onnx', priority: 5 },
]);
// Check preload status
const status = getPreloadStatus('https://example.com/model1.onnx');
// 'pending' | 'loading' | 'complete' | 'error' | 'not_found'Model Caching
import {
isModelCached,
getCachedModel,
deleteCachedModel,
clearModelCache,
getModelCacheStats
} from 'WebInferjs';
// Check if model is cached
if (await isModelCached('https://example.com/model.onnx')) {
console.log('Model is cached!');
}
// Get cached model data directly
const modelData = await getCachedModel('https://example.com/model.onnx');
// Delete a specific cached model
await deleteCachedModel('https://example.com/model.onnx');
// Clear all cached models
await clearModelCache();
// Get cache statistics
const stats = await getModelCacheStats();
console.log(`${stats.models} models cached, ${stats.totalSize} bytes total`);Resume Downloads
Large model downloads automatically support resuming from where they left off:
import { loadModelData } from 'WebInferjs';
// Download with progress and resume support
const modelData = await loadModelData('https://example.com/large-model.onnx', {
resumable: true,
chunkSize: 10 * 1024 * 1024, // 10MB chunks
parallelConnections: 4, // Download 4 chunks in parallel
onProgress: (progress) => {
console.log(`${progress.percent.toFixed(1)}% downloaded`);
console.log(`Speed: ${(progress.speed / 1024 / 1024).toFixed(2)} MB/s`);
console.log(`ETA: ${(progress.eta / 1000).toFixed(0)}s`);
console.log(`Chunk ${progress.currentChunk}/${progress.totalChunks}`);
}
});Model Quantization
import { quantize } from 'WebInferjs/tools';
const quantized = await quantize(model, {
method: 'int8',
calibrationData: samples,
});
console.log(`Compression: ${quantized.compressionRatio}x`);
// Compression: 3.8xBenchmarking
import { benchmark } from 'WebInferjs/tools';
const result = await benchmark(
() => model.run('sample text'),
{ warmupRuns: 5, runs: 100 }
);
console.log(result);
// {
// avgTime: 12.5,
// minTime: 10.2,
// maxTime: 18.3,
// throughput: 80 // inferences/sec
// }Memory Scope
import { withMemoryScope, tensor } from 'WebInferjs';
const result = await withMemoryScope(async (scope) => {
// Tensors tracked in scope
const a = scope.track(tensor([1, 2, 3]));
const b = scope.track(tensor([4, 5, 6]));
// Process...
const output = process(a, b);
// Keep result, dispose others
return scope.keep(output);
});
// a and b automatically disposed🔌 Tensor Operations
import { tensor, zeros, ones, matmul, softmax, relu } from 'WebInferjs';
// Create tensors
const a = tensor([[1, 2], [3, 4]]);
const b = zeros([2, 2]);
const c = ones([2, 2]);
// Operations
const d = matmul(a, c);
const probs = softmax(d);
const activated = relu(d);
// Cleanup
a.dispose();
b.dispose();
c.dispose();🌐 Browser Support
| Browser | WebGPU | WebNN | WASM | |---------|--------|-------|------| | Chrome 113+ | ✅ | ✅ | ✅ | | Edge 113+ | ✅ | ✅ | ✅ | | Firefox 118+ | ⚠️ Flag | ❌ | ✅ | | Safari 17+ | ⚠️ Preview | ❌ | ✅ |
📖 API Reference
Core
pipeline(task, options?)- Create a pipeline for a taskloadModel(url, options?)- Load a model from URLrunInference(model, inputs)- Run model inferencegetScheduler()- Get the global schedulergetMemoryManager()- Get the memory managerrunInWorker(url, inputs)- Run inference in a Web WorkerWorkerPool- Manage multiple workers for parallel inference
Pipelines
TextClassificationPipeline- Text/sentiment classificationSentimentAnalysisPipeline- Sentiment analysisFeatureExtractionPipeline- Text embeddingsImageClassificationPipeline- Image classificationTextGenerationPipeline- Text generation with streamingTokenClassificationPipeline- Named Entity Recognition (NER)ObjectDetectionPipeline- Object detection with bounding boxesAutomaticSpeechRecognitionPipeline- Speech to textZeroShotClassificationPipeline- Classify without trainingQuestionAnsweringPipeline- Extractive QA
HuggingFace Hub
fromHub(modelId, options?)- Load model bundle from HuggingFacefromTask(task, options?)- Load recommended model for taskdownloadTokenizer(modelId)- Download tokenizer onlydownloadConfig(modelId)- Download config onlyPOPULAR_MODELS- Registry of popular models by task
Utilities
Tokenizer- BPE/WordPiece tokenization with HuggingFace supportImagePreprocessor- Image preprocessing with HuggingFace config supportAudioPreprocessor- Audio preprocessing for Whisper/wav2vecCache- LRU caching utilities
Tools
quantize(model, options)- Quantize a modelprune(model, options)- Prune model weightsbenchmark(fn, options)- Benchmark inferenceanalyzeModel(model)- Analyze model structure
🤝 Contributing
We welcome contributions! Please see our Contributing Guide for details.
- Fork the repository
- Create your feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
📄 License
MIT © WebInfer.js Contributors
