npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

webinfer-js

v0.1.0

Published

Lightweight, high-performance browser ML inference framework with native concurrency support

Readme

WebInfer.js

Browser ML inference framework with task scheduling and smart caching.

npm version install size license

Documentation · Examples · API Reference · English | 中文


✨ Features

  • 📋 Task Scheduler - Priority queue, concurrency control, task cancellation
  • 🔄 Batch Processing - Efficient batch inference out of the box
  • 💾 Memory Management - Automatic memory tracking and cleanup with scopes
  • 📥 Smart Model Loading - Preloading, sharding, resume download support
  • 💿 Offline Caching - IndexedDB-based model caching for offline use
  • Multi-Backend - WebGPU, WebNN, WASM with automatic fallback
  • 🤗 HuggingFace Hub - Direct model download with one line
  • 🔤 Real Tokenizers - BPE & WordPiece tokenizers, load tokenizer.json directly
  • 👷 Web Worker Support - Run inference in background threads
  • 📦 Batteries Included - ONNX Runtime bundled, zero configuration needed
  • 🎯 TypeScript First - Full type support with intuitive APIs

📦 Installation

npm install WebInferjs
yarn add WebInferjs
pnpm add WebInferjs

Note: ONNX Runtime is included as a dependency. No additional setup required.

🚀 Quick Start

Try the Demo

Run the interactive demo locally to test all features:

# Clone and install
git clone https://github.com/hanschurer/WebInfer.js.git
cd WebInfer.js
npm install

# Build and start demo server
npm run demo

Open http://localhost:3000 in your browser:

  1. Load Model - Enter a Hugging Face ONNX model URL and click "Load Model"

    https://huggingface.co/Xenova/distilbert-base-uncased-finetuned-sst-2-english/resolve/main/onnx/model_quantized.onnx
  2. Test Features:

    • 🧮 Tensor Operations - Test tensor creation, math ops, softmax, relu
    • 📝 Text Classification - Run sentiment analysis on text
    • 🔍 Feature Extraction - Extract embeddings from text
    • 🏷️ Named Entity Recognition - Extract entities (PER, ORG, LOC, MISC) from text
    • Task Scheduling - Test priority-based scheduling
    • 📋 Task Scheduler - Test priority-based task scheduling
    • 💾 Memory Management - Test allocation and cleanup

Basic Usage

import { pipeline } from 'WebInferjs';

// Create a sentiment analysis pipeline
const sentiment = await pipeline('sentiment-analysis');

// Run inference
const result = await sentiment.run('I love this product!');
console.log(result);
// { label: 'positive', score: 0.98, processingTime: 12.5 }

Batch Processing

// Native batch processing support
const results = await sentiment.run([
  'This is amazing!',
  'This is terrible.',
  'It\'s okay I guess.'
]);

console.log(results);
// [
//   { label: 'positive', score: 0.95 },
//   { label: 'negative', score: 0.92 },
//   { label: 'neutral', score: 0.68 }
// ]

Multiple Pipelines

import { pipeline } from 'WebInferjs';

// Create multiple pipelines
const classifier = await pipeline('text-classification');
const extractor = await pipeline('feature-extraction');

// Run in parallel with Promise.all
const [classification, features] = await Promise.all([
  classifier.run('Sample text'),
  extractor.run('Sample text')
]);

Image Classification

import { pipeline } from 'WebInferjs';

const classifier = await pipeline('image-classification');

// From URL
const result = await classifier.run('https://example.com/image.jpg');

// From HTMLImageElement
const img = document.getElementById('myImage');
const result = await classifier.run(img);

// Batch
const results = await classifier.run([img1, img2, img3]);

Text Generation (Streaming)

import { pipeline } from 'WebInferjs';

const generator = await pipeline('text-generation');

// Simple generation
const result = await generator.run('Once upon a time', {
  maxNewTokens: 50,
  temperature: 0.8,
});
console.log(result.generatedText);

// Streaming output
for await (const event of generator.stream('Hello, ')) {
  process.stdout.write(event.token);
  if (event.done) break;
}

Zero-shot Classification

import { pipeline } from 'WebInferjs';

const classifier = await pipeline('zero-shot-classification');

const result = await classifier.classify(
  'I love playing soccer on weekends',
  ['sports', 'politics', 'technology', 'entertainment']
);

console.log(result.labels[0], result.scores[0]);
// 'sports', 0.92

Question Answering

import { pipeline } from 'WebInferjs';

const qa = await pipeline('question-answering');

const result = await qa.run({
  question: 'What is the capital of France?',
  context: 'Paris is the capital and largest city of France.'
});

console.log(result.answer); // 'Paris'

Named Entity Recognition (NER)

import { pipeline } from 'WebInferjs';

const ner = await pipeline('token-classification');

const entities = await ner.run('Barack Obama visited Beijing and met researchers at OpenAI Labs.', {
  threshold: 0.5,
  entityTypes: ['PER', 'ORG', 'LOC'] // Optional: filter by entity types
});

console.log(entities);
// [
//   { entity: 'PER', word: 'Barack Obama', score: 0.98, start: 0, end: 12 },
//   { entity: 'LOC', word: 'Beijing', score: 0.95, start: 21, end: 28 },
//   { entity: 'ORG', word: 'OpenAI Labs', score: 0.92, start: 48, end: 60 }
// ]

Load from HuggingFace Hub

import { fromHub, fromTask } from 'WebInferjs';

// Load by model ID (auto-downloads model, tokenizer, config)
const bundle = await fromHub('Xenova/distilbert-base-uncased-finetuned-sst-2-english');
console.log(bundle.tokenizer); // Tokenizer instance
console.log(bundle.config);    // Model config

// Load by task name (uses recommended model)
const sentimentBundle = await fromTask('sentiment-analysis');

Web Workers (Background Inference)

import { runInWorker, WorkerPool, isWorkerSupported } from 'WebInferjs';

// Simple: run inference in background thread
if (isWorkerSupported()) {
  const outputs = await runInWorker(modelUrl, inputs);
}

// Advanced: use worker pool for parallel processing
const pool = new WorkerPool({ numWorkers: 4 });
await pool.init();

const modelId = await pool.loadModel(modelUrl);
const results = await pool.runBatch(modelId, batchInputs);

pool.terminate();

🎯 Supported Tasks

| Task | Pipeline | Status | |------|----------|--------| | Text Classification | text-classification | ✅ | | Sentiment Analysis | sentiment-analysis | ✅ | | Feature Extraction | feature-extraction | ✅ | | Image Classification | image-classification | ✅ | | Text Generation | text-generation | ✅ | | Named Entity Recognition | token-classification | ✅ | | Object Detection | object-detection | ✅ | | Speech Recognition | automatic-speech-recognition | ✅ | | Zero-shot Classification | zero-shot-classification | ✅ | | Question Answering | question-answering | ✅ |

⚡ Key Differentiators

Comparison with transformers.js

| Feature | transformers.js | WebInfer.js | |---------|-----------------|-------------| | Task Scheduler | ❌ None | ✅ Priority queue with limits | | Task Cancellation | ❌ None | ✅ Cancel pending tasks | | Batch Processing | ⚠️ Manual | ✅ Built-in batching | | Memory Scopes | ❌ None | ✅ Auto cleanup with scopes | | Model Preloading | ❌ None | ✅ Background loading | | Resume Download | ❌ None | ✅ Chunked with resume | | Model Caching | ⚠️ Basic | ✅ IndexedDB with stats | | TypeScript | ✅ Full | ✅ Full |

🔧 Configuration

Runtime Selection

import { pipeline } from 'WebInferjs';

// Automatic (recommended)
const model = await pipeline('text-classification');

// Specify runtime
const model = await pipeline('text-classification', {
  runtime: 'webgpu' // or 'webnn', 'wasm', 'auto'
});

Memory Management

import { pipeline, getMemoryStats, gc } from 'WebInferjs';

const model = await pipeline('text-classification');

// Use the model
await model.run('text');

// Check memory usage
console.log(getMemoryStats());
// { allocated: 50MB, used: 45MB, peak: 52MB, tensorCount: 12 }

// Explicit cleanup
model.dispose();

// Force garbage collection
gc();

Scheduler Configuration

import { configureScheduler } from 'WebInferjs';

configureScheduler({
  maxConcurrentTasks: 4,
  maxConcurrentPerModel: 1,
  defaultTimeout: 30000,
  enableBatching: true,
  maxBatchSize: 32,
});

Caching

import { pipeline, Cache } from 'WebInferjs';

// Create a cache
const cache = new Cache({
  strategy: 'lru',
  maxSize: 100 * 1024 * 1024, // 100MB
  persistent: true, // Use IndexedDB
});

const model = await pipeline('text-classification', {
  cache: true
});

🛠️ Advanced Usage

Custom Model Loading

import { loadModel, runInference } from 'WebInferjs';

// Load from URL with caching, sharding, and resume support
const model = await loadModel('https://example.com/model.bin', {
  runtime: 'webgpu',
  quantization: 'int8',
  cache: true,           // Enable IndexedDB caching (default: true)
  resumable: true,       // Enable resume download (default: true)
  chunkSize: 5 * 1024 * 1024, // 5MB chunks for large models
  onProgress: (progress) => console.log(`Loading: ${progress * 100}%`)
});

// Run inference
const outputs = await runInference(model, inputs);

// Cleanup
model.dispose();

Preloading Models

import { preloadModel, preloadModels, getPreloadStatus } from 'WebInferjs';

// Preload a single model in background (with priority)
preloadModel('https://example.com/model1.onnx', { priority: 10 });

// Preload multiple models
preloadModels([
  { url: 'https://example.com/model1.onnx', priority: 10 },
  { url: 'https://example.com/model2.onnx', priority: 5 },
]);

// Check preload status
const status = getPreloadStatus('https://example.com/model1.onnx');
// 'pending' | 'loading' | 'complete' | 'error' | 'not_found'

Model Caching

import { 
  isModelCached, 
  getCachedModel, 
  deleteCachedModel, 
  clearModelCache,
  getModelCacheStats 
} from 'WebInferjs';

// Check if model is cached
if (await isModelCached('https://example.com/model.onnx')) {
  console.log('Model is cached!');
}

// Get cached model data directly
const modelData = await getCachedModel('https://example.com/model.onnx');

// Delete a specific cached model
await deleteCachedModel('https://example.com/model.onnx');

// Clear all cached models
await clearModelCache();

// Get cache statistics
const stats = await getModelCacheStats();
console.log(`${stats.models} models cached, ${stats.totalSize} bytes total`);

Resume Downloads

Large model downloads automatically support resuming from where they left off:

import { loadModelData } from 'WebInferjs';

// Download with progress and resume support
const modelData = await loadModelData('https://example.com/large-model.onnx', {
  resumable: true,
  chunkSize: 10 * 1024 * 1024, // 10MB chunks
  parallelConnections: 4,      // Download 4 chunks in parallel
  onProgress: (progress) => {
    console.log(`${progress.percent.toFixed(1)}% downloaded`);
    console.log(`Speed: ${(progress.speed / 1024 / 1024).toFixed(2)} MB/s`);
    console.log(`ETA: ${(progress.eta / 1000).toFixed(0)}s`);
    console.log(`Chunk ${progress.currentChunk}/${progress.totalChunks}`);
  }
});

Model Quantization

import { quantize } from 'WebInferjs/tools';

const quantized = await quantize(model, {
  method: 'int8',
  calibrationData: samples,
});

console.log(`Compression: ${quantized.compressionRatio}x`);
// Compression: 3.8x

Benchmarking

import { benchmark } from 'WebInferjs/tools';

const result = await benchmark(
  () => model.run('sample text'),
  { warmupRuns: 5, runs: 100 }
);

console.log(result);
// {
//   avgTime: 12.5,
//   minTime: 10.2,
//   maxTime: 18.3,
//   throughput: 80 // inferences/sec
// }

Memory Scope

import { withMemoryScope, tensor } from 'WebInferjs';

const result = await withMemoryScope(async (scope) => {
  // Tensors tracked in scope
  const a = scope.track(tensor([1, 2, 3]));
  const b = scope.track(tensor([4, 5, 6]));
  
  // Process...
  const output = process(a, b);
  
  // Keep result, dispose others
  return scope.keep(output);
});
// a and b automatically disposed

🔌 Tensor Operations

import { tensor, zeros, ones, matmul, softmax, relu } from 'WebInferjs';

// Create tensors
const a = tensor([[1, 2], [3, 4]]);
const b = zeros([2, 2]);
const c = ones([2, 2]);

// Operations
const d = matmul(a, c);
const probs = softmax(d);
const activated = relu(d);

// Cleanup
a.dispose();
b.dispose();
c.dispose();

🌐 Browser Support

| Browser | WebGPU | WebNN | WASM | |---------|--------|-------|------| | Chrome 113+ | ✅ | ✅ | ✅ | | Edge 113+ | ✅ | ✅ | ✅ | | Firefox 118+ | ⚠️ Flag | ❌ | ✅ | | Safari 17+ | ⚠️ Preview | ❌ | ✅ |

📖 API Reference

Core

  • pipeline(task, options?) - Create a pipeline for a task
  • loadModel(url, options?) - Load a model from URL
  • runInference(model, inputs) - Run model inference
  • getScheduler() - Get the global scheduler
  • getMemoryManager() - Get the memory manager
  • runInWorker(url, inputs) - Run inference in a Web Worker
  • WorkerPool - Manage multiple workers for parallel inference

Pipelines

  • TextClassificationPipeline - Text/sentiment classification
  • SentimentAnalysisPipeline - Sentiment analysis
  • FeatureExtractionPipeline - Text embeddings
  • ImageClassificationPipeline - Image classification
  • TextGenerationPipeline - Text generation with streaming
  • TokenClassificationPipeline - Named Entity Recognition (NER)
  • ObjectDetectionPipeline - Object detection with bounding boxes
  • AutomaticSpeechRecognitionPipeline - Speech to text
  • ZeroShotClassificationPipeline - Classify without training
  • QuestionAnsweringPipeline - Extractive QA

HuggingFace Hub

  • fromHub(modelId, options?) - Load model bundle from HuggingFace
  • fromTask(task, options?) - Load recommended model for task
  • downloadTokenizer(modelId) - Download tokenizer only
  • downloadConfig(modelId) - Download config only
  • POPULAR_MODELS - Registry of popular models by task

Utilities

  • Tokenizer - BPE/WordPiece tokenization with HuggingFace support
  • ImagePreprocessor - Image preprocessing with HuggingFace config support
  • AudioPreprocessor - Audio preprocessing for Whisper/wav2vec
  • Cache - LRU caching utilities

Tools

  • quantize(model, options) - Quantize a model
  • prune(model, options) - Prune model weights
  • benchmark(fn, options) - Benchmark inference
  • analyzeModel(model) - Analyze model structure

🤝 Contributing

We welcome contributions! Please see our Contributing Guide for details.

  1. Fork the repository
  2. Create your feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'Add amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

📄 License

MIT © WebInfer.js Contributors


Get Started · API Docs · Examples