browser-llm-engine

v0.1.3

Published

a year ago

A browser-friendly library for running LLM inference using Wllama with preset and dynamic model loading, caching, and download capabilities.

0High
0Medium
0Low

dunkerbunker

llm inference browser wasm wllama ai gguf

browser-llm-engine

A browser-friendly library for running large language models (LLMs) directly in the browser using Wllama. This library provides a simple interface to load .gguf or .bin models (e.g., from Hugging Face) and generate text completions, including streaming token support.

Features

Plug-and-Play: Easy to integrate into your web projects.
Local or Remote Models: Load a URL from Hugging Face or pass local File objects.
Token-by-Token Streaming: Handle partial results in real-time via onNewToken callback.
Templates: Leverages Jinja to format chat-based prompts.
Lightweight: Bundles a minimal set of dependencies.

Installation

npm install browser-llm-engine

Or with Yarn:

yarn add browser-llm-engine

Usage

Quick Start

import { createLlmEngine, CHAT_ROLE, PRESET_MODELS } from 'browser-llm-engine';

(async () => {
  // 1) Create an engine instance
  const llm = createLlmEngine({
    // Optional: provide custom WASM paths or config
    wasmPaths: {}
  });

  // 2) Load a preset model from the library
  const modelUrl = PRESET_MODELS["SmolLM2 (360M)"].url;
  await llm.loadModel(modelUrl, {
    progressCallback: (progress) => console.log(`Loading: ${progress}%`),
  });

  // 3) Generate a completion
  const result = await llm.createCompletion("Hello from the browser!");
  console.log("Full model response:", result);

  // 4) Clean up
  await llm.exit();
})();

That’s it! You have a working LLM in the browser.

Streaming

To get partial tokens as they are generated, supply an onNewToken callback:

const llm = createLlmEngine();
await llm.loadModel(PRESET_MODELS["SmolLM2 (360M)"].url);

let outputSoFar = "";
await llm.createCompletion("What's the weather today?", {
  nPredict: 128,
  sampling: { temp: 0.7, penalty_repeat: 1.1 },
  onNewToken: (token) => {
    outputSoFar += token;
    console.log("Streamed token:", token);
  }
});

console.log("Final streamed output:", outputSoFar);

Loading Local Files

If you want to load the model from your local machine:

<input type="file" id="modelFile" multiple />
<script type="module">
  import { createLlmEngine } from 'browser-llm-engine';

  const fileInput = document.getElementById("modelFile");
  const llm = createLlmEngine();

  fileInput.addEventListener("change", async () => {
    try {
      // fileInput.files is a FileList
      await llm.loadModel(fileInput.files);
      console.log("Model loaded locally!");
    } catch (error) {
      console.error("Failed to load local model:", error);
    }
  });
</script>

Preset Models

The library includes a models.json with references to a few hosted models. You can get them via:

import { PRESET_MODELS } from 'browser-llm-engine';

console.log("Available models:", PRESET_MODELS);

Feel free to add or remove entries if you fork this library.

API

`createLlmEngine(config?)`

Creates a new engine instance.

Parameters:
- config (Object) – Optional configuration, e.g. { wasmPaths: { ... } }.

`loadModel(source, options?)`

Loads the model from either a remote URL or local File objects.

Parameters:
- source (String | File[] | FileList) – The source of the model.
- options (Object) – Additional load options:
  - progressCallback (function): (progress) => {} for tracking loading progress
  - useCache (Boolean): Cache the model for faster reloads
  - allowOffline (Boolean): If false, tries to fetch from network

`formatChat(messages, useProvidedTemplate?)`

Takes an array of messages (each with role and content) and formats them into a single prompt with Jinja.

`createCompletion(prompt, options?)`

Creates the text completion for a given prompt.

Parameters:
- prompt (String) – The text to generate from.
- options (Object) – Fine-tuning generation:
  - nPredict (Number) – Maximum tokens to predict (default 512)
  - sampling (Object) – e.g. { temp: 0.7, penalty_repeat: 1.1 }
  - onNewToken (function) – A callback for streaming tokens

`exit()`

Cleans up resources used by Wllama.

Example:
```
await llm.exit();
```

Local Development

If you want to develop locally:

Clone the repo:

git clone https://github.com/you/browser-llm-engine.git
cd browser-llm-engine

Install dependencies:
```
npm install
```
Build the library:
```
npm run build
```
This will create dist/ with both ESM and CJS bundles.
(Optional) Start a dev server (if you add a script in package.json):
```
npm run dev
```
Open index.html (or any dev test page) in your browser to play around with the library.

License

This project is released under the MIT License. Feel free to fork, adapt, and contribute!

Happy coding and enjoy using your LLM in the browser!

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

browser-llm-engine

Features

Table of Contents

Installation

Usage

Quick Start

Streaming

Loading Local Files

Preset Models

API

createLlmEngine(config?)

loadModel(source, options?)

formatChat(messages, useProvidedTemplate?)

createCompletion(prompt, options?)

exit()

Local Development

License

`createLlmEngine(config?)`

`loadModel(source, options?)`

`formatChat(messages, useProvidedTemplate?)`

`createCompletion(prompt, options?)`

`exit()`