@localmode/webllm
v1.0.2
Published
WebLLM provider for @localmode - LLM inference with quantized models
Maintainers
Readme
@localmode/webllm
WebLLM provider for local-first LLM inference. Uses 4-bit quantized models for efficient browser-based text generation.
Installation
pnpm install @localmode/webllm @localmode/coreQuick Start
import { generateText, streamText } from '@localmode/core';
import { webllm } from '@localmode/webllm';
// Generate text
const { text, usage } = await generateText({
model: webllm.languageModel('Llama-3.2-1B-Instruct-q4f16'),
prompt: 'Explain quantum computing in simple terms.',
});
console.log(text);
console.log(`Generated in ${usage.durationMs}ms`);Streaming
import { streamText } from '@localmode/core';
import { webllm } from '@localmode/webllm';
const stream = await streamText({
model: webllm.languageModel('Llama-3.2-1B-Instruct-q4f16'),
prompt: 'Write a haiku about programming.',
});
for await (const { text } of stream) {
process.stdout.write(text);
}Model Preloading
import { preloadModel, isModelCached } from '@localmode/webllm';
// Check if model is already cached
if (!(await isModelCached('Llama-3.2-1B-Instruct-q4f16'))) {
// Preload with progress
await preloadModel('Llama-3.2-1B-Instruct-q4f16', {
onProgress: (p) => console.log(`Loading: ${p.progress?.toFixed(1)}%`),
});
}Available Models
| Model | Size | Context | Best For |
| ----------------------------- | ----- | ------- | ------------------ |
| Llama-3.2-1B-Instruct-q4f16 | 700MB | 4K | Simple tasks, fast |
| Llama-3.2-3B-Instruct-q4f16 | 1.8GB | 4K | General purpose |
| Phi-3.5-mini-instruct-q4f16 | 2.4GB | 4K | Reasoning |
| Qwen2.5-1.5B-Instruct-q4f16 | 1GB | 4K | Multilingual |
| SmolLM2-1.7B-Instruct-q4f16 | 1.1GB | 2K | Compact, fast |
Custom Configuration
import { createWebLLM } from '@localmode/webllm';
const myWebLLM = createWebLLM({
onProgress: (p) => updateLoadingBar(p.progress),
});
const model = myWebLLM.languageModel('Llama-3.2-1B-Instruct-q4f16', {
systemPrompt: 'You are a helpful coding assistant.',
temperature: 0.5,
maxTokens: 1024,
});Requirements
- WebGPU support (Chrome 113+, Edge 113+)
- Sufficient GPU memory for the model
