webinfer-js

v0.1.0

Published

25 days ago

Lightweight, high-performance browser ML inference framework with native concurrency support

0High
0Medium
0Low

hanschurer

machine-learning ml ai inference webgpu webnn browser edge transformers neural-network

WebInfer.js

Browser ML inference framework with task scheduling and smart caching.

Documentation · Examples · API Reference · English | 中文

✨ Features

📋 Task Scheduler - Priority queue, concurrency control, task cancellation
🔄 Batch Processing - Efficient batch inference out of the box
💾 Memory Management - Automatic memory tracking and cleanup with scopes
📥 Smart Model Loading - Preloading, sharding, resume download support
💿 Offline Caching - IndexedDB-based model caching for offline use
⚡ Multi-Backend - WebGPU, WebNN, WASM with automatic fallback
🤗 HuggingFace Hub - Direct model download with one line
🔤 Real Tokenizers - BPE & WordPiece tokenizers, load tokenizer.json directly
👷 Web Worker Support - Run inference in background threads
📦 Batteries Included - ONNX Runtime bundled, zero configuration needed
🎯 TypeScript First - Full type support with intuitive APIs

📦 Installation

npm install WebInferjs

yarn add WebInferjs

pnpm add WebInferjs

Note: ONNX Runtime is included as a dependency. No additional setup required.

🚀 Quick Start

Try the Demo

Run the interactive demo locally to test all features:

# Clone and install
git clone https://github.com/hanschurer/WebInfer.js.git
cd WebInfer.js
npm install

# Build and start demo server
npm run demo

Open http://localhost:3000 in your browser:

Load Model - Enter a Hugging Face ONNX model URL and click "Load Model"

https://huggingface.co/Xenova/distilbert-base-uncased-finetuned-sst-2-english/resolve/main/onnx/model_quantized.onnx

Test Features:
- 🧮 Tensor Operations - Test tensor creation, math ops, softmax, relu
- 📝 Text Classification - Run sentiment analysis on text
- 🔍 Feature Extraction - Extract embeddings from text
- 🏷️ Named Entity Recognition - Extract entities (PER, ORG, LOC, MISC) from text
- ⚡ Task Scheduling - Test priority-based scheduling
- 📋 Task Scheduler - Test priority-based task scheduling
- 💾 Memory Management - Test allocation and cleanup

Basic Usage

import { pipeline } from 'WebInferjs';

// Create a sentiment analysis pipeline
const sentiment = await pipeline('sentiment-analysis');

// Run inference
const result = await sentiment.run('I love this product!');
console.log(result);
// { label: 'positive', score: 0.98, processingTime: 12.5 }

Batch Processing

// Native batch processing support
const results = await sentiment.run([
  'This is amazing!',
  'This is terrible.',
  'It\'s okay I guess.'
]);

console.log(results);
// [
//   { label: 'positive', score: 0.95 },
//   { label: 'negative', score: 0.92 },
//   { label: 'neutral', score: 0.68 }
// ]

Multiple Pipelines

import { pipeline } from 'WebInferjs';

// Create multiple pipelines
const classifier = await pipeline('text-classification');
const extractor = await pipeline('feature-extraction');

// Run in parallel with Promise.all
const [classification, features] = await Promise.all([
  classifier.run('Sample text'),
  extractor.run('Sample text')
]);

Image Classification

import { pipeline } from 'WebInferjs';

const classifier = await pipeline('image-classification');

// From URL
const result = await classifier.run('https://example.com/image.jpg');

// From HTMLImageElement
const img = document.getElementById('myImage');
const result = await classifier.run(img);

// Batch
const results = await classifier.run([img1, img2, img3]);

Text Generation (Streaming)

import { pipeline } from 'WebInferjs';

const generator = await pipeline('text-generation');

// Simple generation
const result = await generator.run('Once upon a time', {
  maxNewTokens: 50,
  temperature: 0.8,
});
console.log(result.generatedText);

// Streaming output
for await (const event of generator.stream('Hello, ')) {
  process.stdout.write(event.token);
  if (event.done) break;
}

Zero-shot Classification

import { pipeline } from 'WebInferjs';

const classifier = await pipeline('zero-shot-classification');

const result = await classifier.classify(
  'I love playing soccer on weekends',
  ['sports', 'politics', 'technology', 'entertainment']
);

console.log(result.labels[0], result.scores[0]);
// 'sports', 0.92

Question Answering

import { pipeline } from 'WebInferjs';

const qa = await pipeline('question-answering');

const result = await qa.run({
  question: 'What is the capital of France?',
  context: 'Paris is the capital and largest city of France.'
});

console.log(result.answer); // 'Paris'

Named Entity Recognition (NER)

import { pipeline } from 'WebInferjs';

const ner = await pipeline('token-classification');

const entities = await ner.run('Barack Obama visited Beijing and met researchers at OpenAI Labs.', {
  threshold: 0.5,
  entityTypes: ['PER', 'ORG', 'LOC'] // Optional: filter by entity types
});

console.log(entities);
// [
//   { entity: 'PER', word: 'Barack Obama', score: 0.98, start: 0, end: 12 },
//   { entity: 'LOC', word: 'Beijing', score: 0.95, start: 21, end: 28 },
//   { entity: 'ORG', word: 'OpenAI Labs', score: 0.92, start: 48, end: 60 }
// ]

Load from HuggingFace Hub

import { fromHub, fromTask } from 'WebInferjs';

// Load by model ID (auto-downloads model, tokenizer, config)
const bundle = await fromHub('Xenova/distilbert-base-uncased-finetuned-sst-2-english');
console.log(bundle.tokenizer); // Tokenizer instance
console.log(bundle.config);    // Model config

// Load by task name (uses recommended model)
const sentimentBundle = await fromTask('sentiment-analysis');

Web Workers (Background Inference)

import { runInWorker, WorkerPool, isWorkerSupported } from 'WebInferjs';

// Simple: run inference in background thread
if (isWorkerSupported()) {
  const outputs = await runInWorker(modelUrl, inputs);
}

// Advanced: use worker pool for parallel processing
const pool = new WorkerPool({ numWorkers: 4 });
await pool.init();

const modelId = await pool.loadModel(modelUrl);
const results = await pool.runBatch(modelId, batchInputs);

pool.terminate();

🎯 Supported Tasks

| Task | Pipeline | Status | |------|----------|--------| | Text Classification | text-classification | ✅ | | Sentiment Analysis | sentiment-analysis | ✅ | | Feature Extraction | feature-extraction | ✅ | | Image Classification | image-classification | ✅ | | Text Generation | text-generation | ✅ | | Named Entity Recognition | token-classification | ✅ | | Object Detection | object-detection | ✅ | | Speech Recognition | automatic-speech-recognition | ✅ | | Zero-shot Classification | zero-shot-classification | ✅ | | Question Answering | question-answering | ✅ |

⚡ Key Differentiators

Comparison with transformers.js

| Feature | transformers.js | WebInfer.js | |---------|-----------------|-------------| | Task Scheduler | ❌ None | ✅ Priority queue with limits | | Task Cancellation | ❌ None | ✅ Cancel pending tasks | | Batch Processing | ⚠️ Manual | ✅ Built-in batching | | Memory Scopes | ❌ None | ✅ Auto cleanup with scopes | | Model Preloading | ❌ None | ✅ Background loading | | Resume Download | ❌ None | ✅ Chunked with resume | | Model Caching | ⚠️ Basic | ✅ IndexedDB with stats | | TypeScript | ✅ Full | ✅ Full |

🔧 Configuration

Runtime Selection

import { pipeline } from 'WebInferjs';

// Automatic (recommended)
const model = await pipeline('text-classification');

// Specify runtime
const model = await pipeline('text-classification', {
  runtime: 'webgpu' // or 'webnn', 'wasm', 'auto'
});

Memory Management

import { pipeline, getMemoryStats, gc } from 'WebInferjs';

const model = await pipeline('text-classification');

// Use the model
await model.run('text');

// Check memory usage
console.log(getMemoryStats());
// { allocated: 50MB, used: 45MB, peak: 52MB, tensorCount: 12 }

// Explicit cleanup
model.dispose();

// Force garbage collection
gc();

Scheduler Configuration

import { configureScheduler } from 'WebInferjs';

configureScheduler({
  maxConcurrentTasks: 4,
  maxConcurrentPerModel: 1,
  defaultTimeout: 30000,
  enableBatching: true,
  maxBatchSize: 32,
});

Caching

import { pipeline, Cache } from 'WebInferjs';

// Create a cache
const cache = new Cache({
  strategy: 'lru',
  maxSize: 100 * 1024 * 1024, // 100MB
  persistent: true, // Use IndexedDB
});

const model = await pipeline('text-classification', {
  cache: true
});

🛠️ Advanced Usage

Custom Model Loading

import { loadModel, runInference } from 'WebInferjs';

// Load from URL with caching, sharding, and resume support
const model = await loadModel('https://example.com/model.bin', {
  runtime: 'webgpu',
  quantization: 'int8',
  cache: true,           // Enable IndexedDB caching (default: true)
  resumable: true,       // Enable resume download (default: true)
  chunkSize: 5 * 1024 * 1024, // 5MB chunks for large models
  onProgress: (progress) => console.log(`Loading: ${progress * 100}%`)
});

// Run inference
const outputs = await runInference(model, inputs);

// Cleanup
model.dispose();

Preloading Models

import { preloadModel, preloadModels, getPreloadStatus } from 'WebInferjs';

// Preload a single model in background (with priority)
preloadModel('https://example.com/model1.onnx', { priority: 10 });

// Preload multiple models
preloadModels([
  { url: 'https://example.com/model1.onnx', priority: 10 },
  { url: 'https://example.com/model2.onnx', priority: 5 },
]);

// Check preload status
const status = getPreloadStatus('https://example.com/model1.onnx');
// 'pending' | 'loading' | 'complete' | 'error' | 'not_found'

Model Caching

import { 
  isModelCached, 
  getCachedModel, 
  deleteCachedModel, 
  clearModelCache,
  getModelCacheStats 
} from 'WebInferjs';

// Check if model is cached
if (await isModelCached('https://example.com/model.onnx')) {
  console.log('Model is cached!');
}

// Get cached model data directly
const modelData = await getCachedModel('https://example.com/model.onnx');

// Delete a specific cached model
await deleteCachedModel('https://example.com/model.onnx');

// Clear all cached models
await clearModelCache();

// Get cache statistics
const stats = await getModelCacheStats();
console.log(`${stats.models} models cached, ${stats.totalSize} bytes total`);

Resume Downloads

Large model downloads automatically support resuming from where they left off:

import { loadModelData } from 'WebInferjs';

// Download with progress and resume support
const modelData = await loadModelData('https://example.com/large-model.onnx', {
  resumable: true,
  chunkSize: 10 * 1024 * 1024, // 10MB chunks
  parallelConnections: 4,      // Download 4 chunks in parallel
  onProgress: (progress) => {
    console.log(`${progress.percent.toFixed(1)}% downloaded`);
    console.log(`Speed: ${(progress.speed / 1024 / 1024).toFixed(2)} MB/s`);
    console.log(`ETA: ${(progress.eta / 1000).toFixed(0)}s`);
    console.log(`Chunk ${progress.currentChunk}/${progress.totalChunks}`);
  }
});

Model Quantization

import { quantize } from 'WebInferjs/tools';

const quantized = await quantize(model, {
  method: 'int8',
  calibrationData: samples,
});

console.log(`Compression: ${quantized.compressionRatio}x`);
// Compression: 3.8x

Benchmarking

import { benchmark } from 'WebInferjs/tools';

const result = await benchmark(
  () => model.run('sample text'),
  { warmupRuns: 5, runs: 100 }
);

console.log(result);
// {
//   avgTime: 12.5,
//   minTime: 10.2,
//   maxTime: 18.3,
//   throughput: 80 // inferences/sec
// }

Memory Scope

import { withMemoryScope, tensor } from 'WebInferjs';

const result = await withMemoryScope(async (scope) => {
  // Tensors tracked in scope
  const a = scope.track(tensor([1, 2, 3]));
  const b = scope.track(tensor([4, 5, 6]));
  
  // Process...
  const output = process(a, b);
  
  // Keep result, dispose others
  return scope.keep(output);
});
// a and b automatically disposed

🔌 Tensor Operations

import { tensor, zeros, ones, matmul, softmax, relu } from 'WebInferjs';

// Create tensors
const a = tensor([[1, 2], [3, 4]]);
const b = zeros([2, 2]);
const c = ones([2, 2]);

// Operations
const d = matmul(a, c);
const probs = softmax(d);
const activated = relu(d);

// Cleanup
a.dispose();
b.dispose();
c.dispose();

🌐 Browser Support

| Browser | WebGPU | WebNN | WASM | |---------|--------|-------|------| | Chrome 113+ | ✅ | ✅ | ✅ | | Edge 113+ | ✅ | ✅ | ✅ | | Firefox 118+ | ⚠️ Flag | ❌ | ✅ | | Safari 17+ | ⚠️ Preview | ❌ | ✅ |

📖 API Reference

Core

pipeline(task, options?) - Create a pipeline for a task
loadModel(url, options?) - Load a model from URL
runInference(model, inputs) - Run model inference
getScheduler() - Get the global scheduler
getMemoryManager() - Get the memory manager
runInWorker(url, inputs) - Run inference in a Web Worker
WorkerPool - Manage multiple workers for parallel inference

Pipelines

TextClassificationPipeline - Text/sentiment classification
SentimentAnalysisPipeline - Sentiment analysis
FeatureExtractionPipeline - Text embeddings
ImageClassificationPipeline - Image classification
TextGenerationPipeline - Text generation with streaming
TokenClassificationPipeline - Named Entity Recognition (NER)
ObjectDetectionPipeline - Object detection with bounding boxes
AutomaticSpeechRecognitionPipeline - Speech to text
ZeroShotClassificationPipeline - Classify without training
QuestionAnsweringPipeline - Extractive QA

HuggingFace Hub

fromHub(modelId, options?) - Load model bundle from HuggingFace
fromTask(task, options?) - Load recommended model for task
downloadTokenizer(modelId) - Download tokenizer only
downloadConfig(modelId) - Download config only
POPULAR_MODELS - Registry of popular models by task

Utilities

Tokenizer - BPE/WordPiece tokenization with HuggingFace support
ImagePreprocessor - Image preprocessing with HuggingFace config support
AudioPreprocessor - Audio preprocessing for Whisper/wav2vec
Cache - LRU caching utilities

Tools

quantize(model, options) - Quantize a model
prune(model, options) - Prune model weights
benchmark(fn, options) - Benchmark inference
analyzeModel(model) - Analyze model structure

🤝 Contributing

We welcome contributions! Please see our Contributing Guide for details.

Fork the repository
Create your feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'Add amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

📄 License

Get Started · API Docs · Examples