@agent-layer-zero/dendrite

v0.4.2

Published

2 days ago

WebGPU LLM runner — run AI chat personas entirely in the browser

Downloads

1,619

0High
0Medium
0Low

shyaboi

webgpu wasm llm ai chat browser webllm transformers agent persona

@agentlayerzero/dendrite

WebGPU LLM runner — run AI chat personas entirely in the browser. Zero server required.

Part of AgentLayerZero.

Install

npm install @agentlayerzero/dendrite

Quick Start

import { createNeuron } from '@agentlayerzero/dendrite'

// That's it — no worker file, no extra imports, no setup
const neuron = createNeuron({
  modelId: 'Qwen2.5-1.5B-Instruct-q4f16_1-MLC',
  systemPrompt: 'You are a helpful cooking assistant.',
  onProgress: (percent, text) => console.log(`Loading: ${percent}%`),
})

// Stream tokens
for await (const token of neuron.send('How do I make pasta?')) {
  process.stdout.write(token)
}

// Or get the full response
const reply = await neuron.complete('What about carbonara?')
console.log(reply)

Personality Docs

Shape the LLM's behavior with typed personality documents:

const neuron = createNeuron({
  modelId: 'Qwen2.5-1.5B-Instruct-q4f16_1-MLC',
  personalityDocs: [
    { type: 'zero-shot', content: 'You are a pirate captain named Blackbeard.' },
    { type: 'personality', content: 'Speak in pirate dialect. Say "arrr" frequently.' },
    { type: 'knowledge', content: 'You sailed the Caribbean from 1716 to 1718.' },
    { type: 'instructions', content: 'Never break character.' },
  ],
  temperature: 0.7,
})

Doc types:

zero-shot — What the LLM IS (identity)
personality — HOW it responds (tone, style)
knowledge — WHAT it knows (facts, resume, docs)
instructions — Specific rules and constraints

API Connection (Optional)

Load persona config from an AgentLayerZero API instead of inline:

const neuron = createNeuron({
  modelId: 'Qwen2.5-1.5B-Instruct-q4f16_1-MLC',
  apiUrl: 'https://your-api.fly.dev',
  username: 'shyaboi',
  instanceSlug: 'career-coach',
})

API

`createNeuron(config): Neuron`

| Method | Description | |--------|-------------| | send(message) | Stream tokens as an async generator | | complete(message) | Get the full response string | | stop() | Stop generation mid-stream | | setModel(id) | Switch to a different model | | setPersonalityDocs(docs) | Update personality at runtime | | setSystemPrompt(prompt) | Update system prompt at runtime | | getHistory() | Get conversation history | | setHistory(messages) | Restore conversation history | | clearHistory() | Clear conversation history | | destroy() | Clean up: terminate worker, release resources |

Properties

| Property | Type | Description | |----------|------|-------------| | isLoading | boolean | Model is downloading/initializing | | isGenerating | boolean | LLM is generating a response | | loadProgress | { percent, text } | Loading progress |

Utilities

import { checkWebGPU, deleteAllModelCaches, MODEL_OPTIONS } from '@agentlayerzero/dendrite'

// Check browser support
const { available, reason } = checkWebGPU()

// Clear corrupted model cache
await deleteAllModelCaches()

// List available models
MODEL_OPTIONS.forEach(m => console.log(`${m.label} (${m.vram})`))

Web Worker (Optional)

By default Dendrite runs on the main thread — zero setup. For better performance (offloads model to a separate thread), pass a worker:

// Create worker.ts in your project:
// import { WebWorkerMLCEngineHandler } from '@mlc-ai/web-llm'
// const handler = new WebWorkerMLCEngineHandler()
// self.onmessage = (msg) => handler.onmessage(msg)

const neuron = createNeuron({
  modelId: 'Qwen2.5-1.5B-Instruct-q4f16_1-MLC',
  worker: new Worker(new URL('./worker.ts', import.meta.url), { type: 'module' }),
})

Requirements

Browser with WebGPU support (Chrome 113+, Edge 113+, Safari 18+)
~0.5–5 GB storage for model weights (cached after first download)

License

GPL-3.0

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme