web-llm-runner

v0.2.1

Published

3 months ago

Hardware accelerated language model chats on browsers

0High
0Medium
0Low

ashu01304

llm large language model machine learning

Web-LLM-Runner

A lightweight, self-contained, browser-native NPM library for running Large Language Models (LLMs) purely on the frontend securely using hardware-accelerated WebGPU. No backend servers required!

Installation

npm install web-llm-runner

Quick Start

import { WebLLM } from "web-llm-runner";

const llm = new WebLLM();

⚡ Core API Endpoints

This library simplifies entirely complicated WebGPU setup schemas into fully abstracted, clean endpoints.

1. `models_available` (Static Property)

Returns a string-array containing all pre-configured compatible model IDs.

console.log(llm.models_available);
// ["Llama-3.1-8B-Instruct-q4f32_1-1k", "TinyLlama-1.1B-Chat-v1.0-q4f16_1-MLC", ...]

2. `local_model_available(model_id: string)` -> `Promise<boolean>`

Checks if a specific model is successfully cached locally directly in the browser's IndexedDB.

const isCached = await llm.local_model_available("TinyLlama-1.1B-Chat-v1.0-q4f16_1-MLC");

3. `download_model(model_id: string, progressCallback?: Function)` -> `Promise<void>`

Initializes the monolithic Web Worker engine and downloads the chosen model iteratively straight to the WebGPU memory cache.

await llm.download_model(
  "TinyLlama-1.1B-Chat-v1.0-q4f16_1-MLC", 
  (progressText) => console.log(progressText) // e.g., "Loading... 24%"
);

4. `chat_stream(prompt: string)` -> `AsyncGenerator<string>`

A stateful, streaming chatbot endpoint that innately saves & remembers previous conversational history locally. Yields chunks as they are generated.

const generator = llm.chat_stream("Count from 1 to 5.");
for await (const chunk of generator) {
  console.log(chunk); 
}

5. `chat_no_stream(prompt: string)` -> `Promise<string>`

A stateful chatbot endpoint that blocks until the generation completes and returns the full response at once.

const reply = await llm.chat_no_stream("What is the capital of France?");

6. `generate_stream(prompt: string)` -> `AsyncGenerator<string>`

A stateless, memory-free generation endpoint designed for single, independent completion tasks.

const generator = llm.generate_stream("Write a haiku about the ocean.");
for await (const chunk of generator) {
   console.log(chunk);
}

7. `generate_no_stream(prompt: string)` -> `Promise<string>`

A stateless completion endpoint returning an undisturbed full response.

const text = await llm.generate_no_stream("Say exactly the word 'Hello'.");

8. `reset_chat()` -> `Promise<void>`

Instantly wipes the active conversational memory stored from the chat_ endpoints.

await llm.reset_chat();

9. `delete_model(model_id: string)` -> `Promise<void>`

Safely removes a downloaded model completely from the browser's local cache to immediately free up gigabytes of hard drive space.

await llm.delete_model("TinyLlama-1.1B-Chat-v1.0-q4f16_1-MLC");

📱 Hybrid Inference & Device Sensitivity

web-llm-runner features an intelligent hybrid engine that automatically selects the best inference backend for your hardware:

WebGPU (Primary): Uses high-performance hardware acceleration on supported desktop and mobile browsers.
ONNX/WASM (Fallback): Automatically switches to a CPU-based fallback for older devices or browsers without WebGPU support (like many current mobile browsers).

[!NOTE] Streaming Support: While WebGPU streaming is fully stable, ONNX-based streaming is currently in active development. Performance and responsiveness may vary on low-resource mobile devices.

⚖ License

MIT

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

Web-LLM-Runner

Installation

Quick Start

⚡ Core API Endpoints

1. models_available (Static Property)

2. local_model_available(model_id: string) -> Promise<boolean>

3. download_model(model_id: string, progressCallback?: Function) -> Promise<void>

4. chat_stream(prompt: string) -> AsyncGenerator<string>

5. chat_no_stream(prompt: string) -> Promise<string>

6. generate_stream(prompt: string) -> AsyncGenerator<string>

7. generate_no_stream(prompt: string) -> Promise<string>

8. reset_chat() -> Promise<void>

9. delete_model(model_id: string) -> Promise<void>