npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

inferis-ml

v1.0.3

Published

Worker pool for running AI models in the browser — WebGPU/WASM auto-detection, model lifecycle, streaming, cross-tab dedup

Readme

inferis-ml

npm version bundle size coverage npm downloads license Known Vulnerabilities GitHub stars

Run AI models in the browser. No server, no per-request cost, no data leaving the device.

Live Demo — try it in your browser.

import { createPool } from 'inferis-ml';
import { transformersAdapter } from 'inferis-ml/adapters/transformers';

const pool = await createPool({ adapter: transformersAdapter() });
const model = await pool.load<number[][]>('feature-extraction', {
  model: 'mixedbread-ai/mxbai-embed-xsmall-v1',
});

const embeddings = await model.run(['Hello world', 'Another sentence']);

Why

Existing browser runtimes (transformers.js, web-llm, onnxruntime-web) give you inference but leave everything else to you — worker management, postMessage boilerplate, model lifecycle, memory budgets, cross-tab dedup, WebGPU fallback, streaming.

inferis-ml handles all of it. You get a clean async API and focus on the product.

| Problem | Without inferis-ml | With inferis-ml | |---------|-------------------|-----------------| | UI freezes during inference | Main thread blocked | Runs in Web Workers | | 5 tabs = 5 model copies | 10 GB RAM, browser crashes | crossTab: true — one shared copy | | WebGPU not everywhere | Manual detection + swap | defaultDevice: 'auto' |

Install

npm install inferis-ml

# Pick your adapter (peer deps):
npm install @huggingface/transformers   # transformersAdapter
npm install @mlc-ai/web-llm             # webLlmAdapter
npm install onnxruntime-web             # onnxAdapter

Quick Start

LLM Streaming

import { createPool } from 'inferis-ml';
import { webLlmAdapter } from 'inferis-ml/adapters/web-llm';

const pool = await createPool({
  adapter: webLlmAdapter(),
  defaultDevice: 'webgpu',
  maxWorkers: 1,
});

const llm = await pool.load<string>('text-generation', {
  model: 'Llama-3.2-3B-Instruct-q4f32_1-MLC',
  onProgress: ({ phase }) => console.log(phase),
});

const stream = llm.stream({
  messages: [
    { role: 'system', content: 'You are a helpful assistant.' },
    { role: 'user', content: 'Explain WebGPU in 3 sentences.' },
  ],
});

for await (const token of stream) {
  output.textContent += token;
}

Speech Transcription

const transcriber = await pool.load<{ text: string }>('automatic-speech-recognition', {
  model: 'openai/whisper-base',
  estimatedMemoryMB: 80,
});

const result = await transcriber.run(audioData);
console.log(result.text);

Abort Inference

const ctrl = new AbortController();
stopButton.onclick = () => ctrl.abort();

try {
  for await (const token of llm.stream(input, { signal: ctrl.signal })) {
    output.textContent += token;
  }
} catch (e) {
  if (e.name === 'AbortError') output.textContent += ' [stopped]';
}

Cross-Tab Deduplication

const pool = await createPool({
  adapter: transformersAdapter(),
  crossTab: true, // SharedWorker > leader election > per-tab fallback
});

Model State Changes

model.onStateChange((state) => {
  if (state === 'loading')  showSpinner();
  if (state === 'ready')    hideSpinner();
  if (state === 'error')    showError('Failed to load model');
  if (state === 'disposed') disableUI();
});

Features

  • Runtime-agnostic — adapters for @huggingface/transformers, @mlc-ai/web-llm, onnxruntime-web, or your own
  • Zero framework deps — works with React, Vue, Svelte, or vanilla JS
  • WebGPU -> WASM fallback — auto-detected or configured explicitly
  • StreamingReadableStream + for await for token-by-token output
  • Memory budget — LRU eviction when models exceed the configured cap
  • Cross-tab dedup — SharedWorker (tier 1), leader election (tier 2), per-tab (tier 3)
  • AbortController — cancel any in-flight inference
  • TypeScript — full type safety, generic output types

API Reference

createPool(config)

const pool = await createPool({
  adapter: transformersAdapter(),   // required
  workerUrl: new URL('inferis-ml/worker', import.meta.url),
  maxWorkers: navigator.hardwareConcurrency - 1,
  maxMemoryMB: 2048,
  defaultDevice: 'auto',           // 'webgpu' | 'wasm' | 'auto'
  crossTab: false,
  taskTimeout: 120_000,
});

pool.load<TOutput>(task, config)

Loads a model and returns a ModelHandle. If already loaded, returns the existing handle.

const model = await pool.load<number[][]>('feature-extraction', {
  model: 'mixedbread-ai/mxbai-embed-xsmall-v1',
  estimatedMemoryMB: 30,
  onProgress: (p) => { ... },
});

ModelHandle<TOutput>

| Method | Description | |--------|-------------| | run(input, options?) | Non-streaming inference. Returns Promise<TOutput>. | | stream(input, options?) | Streaming inference. Returns ReadableStream<TOutput>. | | dispose() | Unload model and free memory. | | onStateChange(cb) | Subscribe to state changes. Returns unsubscribe function. | | id | Unique model ID (task:model). | | state | Current state: idle \| loading \| ready \| inferring \| unloading \| error \| disposed. | | memoryMB | Approximate memory usage. | | device | Resolved device: webgpu or wasm. |

InferenceOptions

interface InferenceOptions {
  signal?: AbortSignal;
  priority?: 'high' | 'normal' | 'low';
}

detectCapabilities()

import { detectCapabilities } from 'inferis-ml';

const caps = await detectCapabilities();
if (caps.webgpu.supported) {
  console.log('GPU vendor:', caps.webgpu.adapter?.vendor);
} else {
  console.log('WASM SIMD:', caps.wasm.simd);
}

Custom Adapter

import type { ModelAdapter, ModelAdapterFactory } from 'inferis-ml';

export function myCustomAdapter(): ModelAdapterFactory {
  return {
    name: 'my-adapter',

    async create(): Promise<ModelAdapter> {
      const { MyRuntime } = await import('my-runtime');

      return {
        name: 'my-adapter',

        estimateMemoryMB(_task, config) {
          return (config.estimatedMemoryMB as number) ?? 50;
        },

        async load(task, config, device, onProgress) {
          onProgress({ phase: 'loading', loaded: 0, total: 1 });
          const instance = await MyRuntime.load(config.model as string, { device });
          onProgress({ phase: 'done', loaded: 1, total: 1 });
          return { instance, memoryMB: 50 };
        },

        async run(model, input) {
          return (model.instance as MyRuntime).infer(input);
        },

        async stream(model, input, onChunk) {
          for await (const chunk of (model.instance as MyRuntime).stream(input)) {
            onChunk(chunk);
          }
        },

        async unload(model) {
          await (model.instance as MyRuntime).dispose();
        },
      };
    },
  };
}

Framework Integrations

Official bindings with idiomatic APIs for popular frameworks:

| Package | Install | Docs | |---------|---------|------| | inferis-react | npm i inferis-react | README | | inferis-vue | npm i inferis-vue | README | | inferis-svelte | npm i inferis-svelte | README |

Each package provides context/provider setup, model lifecycle management, streaming, capability detection, and memory monitoring -- all wired into the framework's reactivity system.

// React
const { text, start } = useStream(model);

// Vue
const { text, start } = useStream(model);

// Svelte
const { text, start } = useStream(model);  // $text in template

Bundler & Framework Setup

inferis-ml is browser-only. In SSR frameworks, ensure initialization runs only on the client.

Vite

// vite.config.ts
export default {
  worker: { format: 'es' },
};

webpack 5

// webpack.config.js
module.exports = {
  experiments: { asyncWebAssembly: true },
};

Next.js

'use client';

import { useEffect, useState } from 'react';
import type { WorkerPoolInterface } from 'inferis-ml';

export default function AI() {
  const [pool, setPool] = useState<WorkerPoolInterface | null>(null);

  useEffect(() => {
    import('inferis-ml').then(({ createPool }) =>
      createPool({ adapter: { type: 'transformers' } })
    ).then(setPool);
  }, []);

  if (!pool) return <p>Loading...</p>;
  // use pool
}

Nuxt

<template>
  <ClientOnly>
    <InferenceComponent />
  </ClientOnly>
</template>
// composables/useInferis.ts
export async function useInferis() {
  const { createPool } = await import('inferis-ml');
  return createPool({ adapter: { type: 'transformers' } });
}

SvelteKit

import { browser } from '$app/environment';

let pool;
if (browser) {
  const { createPool } = await import('inferis-ml');
  pool = await createPool({ adapter: { type: 'transformers' } });
}

Popular Models

Models download from Hugging Face Hub on first use and are cached in the browser's Cache API. Subsequent loads are instant and work offline.

Embeddings / Semantic Search

| Model | Size | Notes | |-------|------|-------| | mixedbread-ai/mxbai-embed-xsmall-v1 | 23 MB | Best quality/size for English | | Xenova/all-MiniLM-L6-v2 | 23 MB | Popular multilingual | | Xenova/multilingual-e5-small | 118 MB | 100+ languages |

Text Generation (LLM)

Requires @mlc-ai/web-llm + defaultDevice: 'webgpu'.

| Model | Size | Notes | |-------|------|-------| | Llama-3.2-1B-Instruct-q4f32_1-MLC | 0.8 GB | Fastest | | Llama-3.2-3B-Instruct-q4f32_1-MLC | 2 GB | Good balance | | Phi-3.5-mini-instruct-q4f16_1-MLC | 2.2 GB | Strong reasoning | | gemma-2-2b-it-q4f16_1-MLC | 1.5 GB | Fast on mobile GPU |

Speech Recognition

| Model | Size | Notes | |-------|------|-------| | openai/whisper-tiny | 39 MB | Fastest | | openai/whisper-base | 74 MB | Good balance | | openai/whisper-small | 244 MB | Better accuracy |

Text Classification

| Model | Size | Notes | |-------|------|-------| | Xenova/distilbert-base-uncased-finetuned-sst-2-english | 67 MB | Sentiment | | Xenova/toxic-bert | 438 MB | Toxicity detection |

Translation

| Model | Size | Notes | |-------|------|-------| | Xenova/opus-mt-en-ru | 74 MB | EN -> RU | | Xenova/opus-mt-ru-en | 74 MB | RU -> EN | | Xenova/nllb-200-distilled-600M | 600 MB | 200 languages |

Image Classification

| Model | Size | Notes | |-------|------|-------| | Xenova/efficientnet-lite4 | 13 MB | Fastest, 1000 classes | | Xenova/mobilevit-small | 22 MB | Mobile-friendly |

Model Sources

Models are not locked to Hugging Face. Each adapter has its own sources:

  • transformers.js — HF Hub ID or any direct URL
  • web-llm — MLC registry, or register custom models
  • onnxruntime-web — direct URL to .onnx file
  • Custom adapter — load from anywhere (fetch, IndexedDB, bundled)

Caching

First visit:  download -> Cache API -> run  (5-60s)
Next visits:  Cache API -> run              (1-3s, no network)
Offline:      Cache API -> run              (works without internet)

Browser Support

| Feature | Chrome | Firefox | Safari | Edge | |---------|--------|---------|--------|------| | Core (Worker + WASM) | 57+ | 52+ | 11+ | 16+ | | WebGPU | 113+ | 141+ | 26+ | 113+ | | WASM SIMD | 91+ | 89+ | 16.4+ | 91+ | | SharedWorker | 4+ | 29+ | 16+ | 79+ | | Leader Election | 69+ | 96+ | 15.4+ | 79+ |

Minimum: Web Workers + WebAssembly (97%+ of browsers). All advanced features are progressive enhancements.

Performance Tips

  • maxWorkers: 1 for GPU-bound workloads (LLMs)
  • defaultDevice: 'webgpu' when targeting modern hardware
  • estimatedMemoryMB for accurate LRU eviction
  • crossTab: true for multi-tab apps (chat, editors)
  • Reuse ModelHandle — re-loading a ready model is a no-op

When To Use

| Use case | Fit? | |----------|------| | Semantic search, chatbot, speech, classification, translation | Yes | | Private data (never leaves device) | Yes | | Offline after first load | Yes | | Server-side batch processing | No | | Models > 4 GB | No |

License

MIT