browser-llm-engine
v0.1.3
Published
A browser-friendly library for running LLM inference using Wllama with preset and dynamic model loading, caching, and download capabilities.
Maintainers
Readme
browser-llm-engine
A browser-friendly library for running large language models (LLMs) directly in the browser using Wllama. This library provides a simple interface to load .gguf or .bin models (e.g., from Hugging Face) and generate text completions, including streaming token support.
Features
- Plug-and-Play: Easy to integrate into your web projects.
- Local or Remote Models: Load a URL from Hugging Face or pass local
Fileobjects. - Token-by-Token Streaming: Handle partial results in real-time via
onNewTokencallback. - Templates: Leverages Jinja to format chat-based prompts.
- Lightweight: Bundles a minimal set of dependencies.
Table of Contents
Installation
npm install browser-llm-engineOr with Yarn:
yarn add browser-llm-engineUsage
Quick Start
import { createLlmEngine, CHAT_ROLE, PRESET_MODELS } from 'browser-llm-engine';
(async () => {
// 1) Create an engine instance
const llm = createLlmEngine({
// Optional: provide custom WASM paths or config
wasmPaths: {}
});
// 2) Load a preset model from the library
const modelUrl = PRESET_MODELS["SmolLM2 (360M)"].url;
await llm.loadModel(modelUrl, {
progressCallback: (progress) => console.log(`Loading: ${progress}%`),
});
// 3) Generate a completion
const result = await llm.createCompletion("Hello from the browser!");
console.log("Full model response:", result);
// 4) Clean up
await llm.exit();
})();That’s it! You have a working LLM in the browser.
Streaming
To get partial tokens as they are generated, supply an onNewToken callback:
const llm = createLlmEngine();
await llm.loadModel(PRESET_MODELS["SmolLM2 (360M)"].url);
let outputSoFar = "";
await llm.createCompletion("What's the weather today?", {
nPredict: 128,
sampling: { temp: 0.7, penalty_repeat: 1.1 },
onNewToken: (token) => {
outputSoFar += token;
console.log("Streamed token:", token);
}
});
console.log("Final streamed output:", outputSoFar);Loading Local Files
If you want to load the model from your local machine:
<input type="file" id="modelFile" multiple />
<script type="module">
import { createLlmEngine } from 'browser-llm-engine';
const fileInput = document.getElementById("modelFile");
const llm = createLlmEngine();
fileInput.addEventListener("change", async () => {
try {
// fileInput.files is a FileList
await llm.loadModel(fileInput.files);
console.log("Model loaded locally!");
} catch (error) {
console.error("Failed to load local model:", error);
}
});
</script>Preset Models
The library includes a models.json with references to a few hosted models. You can get them via:
import { PRESET_MODELS } from 'browser-llm-engine';
console.log("Available models:", PRESET_MODELS);Feel free to add or remove entries if you fork this library.
API
createLlmEngine(config?)
Creates a new engine instance.
- Parameters:
config(Object) – Optional configuration, e.g.{ wasmPaths: { ... } }.
loadModel(source, options?)
Loads the model from either a remote URL or local File objects.
- Parameters:
source(String | File[] | FileList) – The source of the model.options(Object) – Additional load options:progressCallback(function):(progress) => {}for tracking loading progressuseCache(Boolean): Cache the model for faster reloadsallowOffline(Boolean): If false, tries to fetch from network
formatChat(messages, useProvidedTemplate?)
Takes an array of messages (each with role and content) and formats them into a single prompt with Jinja.
createCompletion(prompt, options?)
Creates the text completion for a given prompt.
- Parameters:
prompt(String) – The text to generate from.options(Object) – Fine-tuning generation:nPredict(Number) – Maximum tokens to predict (default 512)sampling(Object) – e.g.{ temp: 0.7, penalty_repeat: 1.1 }onNewToken(function) – A callback for streaming tokens
exit()
Cleans up resources used by Wllama.
- Example:
await llm.exit();
Local Development
If you want to develop locally:
- Clone the repo:
git clone https://github.com/you/browser-llm-engine.git cd browser-llm-engine - Install dependencies:
npm install - Build the library:
This will createnpm run builddist/with both ESM and CJS bundles. - (Optional) Start a dev server (if you add a script in
package.json):npm run dev - Open
index.html(or any dev test page) in your browser to play around with the library.
License
This project is released under the MIT License. Feel free to fork, adapt, and contribute!
Happy coding and enjoy using your LLM in the browser!
