@localmode/webllm

v1.0.2

Published

a month ago

WebLLM provider for @localmode - LLM inference with quantized models

0High
0Medium
0Low

localmode

llm webllm local-first privacy ai language-model text-generation offline

@localmode/webllm

WebLLM provider for local-first LLM inference. Uses 4-bit quantized models for efficient browser-based text generation.

Installation

pnpm install @localmode/webllm @localmode/core

Quick Start

import { generateText, streamText } from '@localmode/core';
import { webllm } from '@localmode/webllm';

// Generate text
const { text, usage } = await generateText({
  model: webllm.languageModel('Llama-3.2-1B-Instruct-q4f16'),
  prompt: 'Explain quantum computing in simple terms.',
});

console.log(text);
console.log(`Generated in ${usage.durationMs}ms`);

Streaming

import { streamText } from '@localmode/core';
import { webllm } from '@localmode/webllm';

const stream = await streamText({
  model: webllm.languageModel('Llama-3.2-1B-Instruct-q4f16'),
  prompt: 'Write a haiku about programming.',
});

for await (const { text } of stream) {
  process.stdout.write(text);
}

Model Preloading

import { preloadModel, isModelCached } from '@localmode/webllm';

// Check if model is already cached
if (!(await isModelCached('Llama-3.2-1B-Instruct-q4f16'))) {
  // Preload with progress
  await preloadModel('Llama-3.2-1B-Instruct-q4f16', {
    onProgress: (p) => console.log(`Loading: ${p.progress?.toFixed(1)}%`),
  });
}

Available Models

| Model | Size | Context | Best For | | ----------------------------- | ----- | ------- | ------------------ | | Llama-3.2-1B-Instruct-q4f16 | 700MB | 4K | Simple tasks, fast | | Llama-3.2-3B-Instruct-q4f16 | 1.8GB | 4K | General purpose | | Phi-3.5-mini-instruct-q4f16 | 2.4GB | 4K | Reasoning | | Qwen2.5-1.5B-Instruct-q4f16 | 1GB | 4K | Multilingual | | SmolLM2-1.7B-Instruct-q4f16 | 1.1GB | 2K | Compact, fast |

Custom Configuration

import { createWebLLM } from '@localmode/webllm';

const myWebLLM = createWebLLM({
  onProgress: (p) => updateLoadingBar(p.progress),
});

const model = myWebLLM.languageModel('Llama-3.2-1B-Instruct-q4f16', {
  systemPrompt: 'You are a helpful coding assistant.',
  temperature: 0.5,
  maxTokens: 1024,
});

Requirements

WebGPU support (Chrome 113+, Edge 113+)
Sufficient GPU memory for the model

License

MIT

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

@localmode/webllm

Installation

Quick Start

Streaming

Model Preloading

Available Models

Custom Configuration

Requirements

License