@agent-layer-zero/dendrite
v0.4.2
Published
WebGPU LLM runner — run AI chat personas entirely in the browser
Downloads
1,619
Maintainers
Readme
@agentlayerzero/dendrite
WebGPU LLM runner — run AI chat personas entirely in the browser. Zero server required.
Part of AgentLayerZero.
Install
npm install @agentlayerzero/dendriteQuick Start
import { createNeuron } from '@agentlayerzero/dendrite'
// That's it — no worker file, no extra imports, no setup
const neuron = createNeuron({
modelId: 'Qwen2.5-1.5B-Instruct-q4f16_1-MLC',
systemPrompt: 'You are a helpful cooking assistant.',
onProgress: (percent, text) => console.log(`Loading: ${percent}%`),
})
// Stream tokens
for await (const token of neuron.send('How do I make pasta?')) {
process.stdout.write(token)
}
// Or get the full response
const reply = await neuron.complete('What about carbonara?')
console.log(reply)Personality Docs
Shape the LLM's behavior with typed personality documents:
const neuron = createNeuron({
modelId: 'Qwen2.5-1.5B-Instruct-q4f16_1-MLC',
personalityDocs: [
{ type: 'zero-shot', content: 'You are a pirate captain named Blackbeard.' },
{ type: 'personality', content: 'Speak in pirate dialect. Say "arrr" frequently.' },
{ type: 'knowledge', content: 'You sailed the Caribbean from 1716 to 1718.' },
{ type: 'instructions', content: 'Never break character.' },
],
temperature: 0.7,
})Doc types:
- zero-shot — What the LLM IS (identity)
- personality — HOW it responds (tone, style)
- knowledge — WHAT it knows (facts, resume, docs)
- instructions — Specific rules and constraints
API Connection (Optional)
Load persona config from an AgentLayerZero API instead of inline:
const neuron = createNeuron({
modelId: 'Qwen2.5-1.5B-Instruct-q4f16_1-MLC',
apiUrl: 'https://your-api.fly.dev',
username: 'shyaboi',
instanceSlug: 'career-coach',
})API
createNeuron(config): Neuron
| Method | Description |
|--------|-------------|
| send(message) | Stream tokens as an async generator |
| complete(message) | Get the full response string |
| stop() | Stop generation mid-stream |
| setModel(id) | Switch to a different model |
| setPersonalityDocs(docs) | Update personality at runtime |
| setSystemPrompt(prompt) | Update system prompt at runtime |
| getHistory() | Get conversation history |
| setHistory(messages) | Restore conversation history |
| clearHistory() | Clear conversation history |
| destroy() | Clean up: terminate worker, release resources |
Properties
| Property | Type | Description |
|----------|------|-------------|
| isLoading | boolean | Model is downloading/initializing |
| isGenerating | boolean | LLM is generating a response |
| loadProgress | { percent, text } | Loading progress |
Utilities
import { checkWebGPU, deleteAllModelCaches, MODEL_OPTIONS } from '@agentlayerzero/dendrite'
// Check browser support
const { available, reason } = checkWebGPU()
// Clear corrupted model cache
await deleteAllModelCaches()
// List available models
MODEL_OPTIONS.forEach(m => console.log(`${m.label} (${m.vram})`))Web Worker (Optional)
By default Dendrite runs on the main thread — zero setup. For better performance (offloads model to a separate thread), pass a worker:
// Create worker.ts in your project:
// import { WebWorkerMLCEngineHandler } from '@mlc-ai/web-llm'
// const handler = new WebWorkerMLCEngineHandler()
// self.onmessage = (msg) => handler.onmessage(msg)
const neuron = createNeuron({
modelId: 'Qwen2.5-1.5B-Instruct-q4f16_1-MLC',
worker: new Worker(new URL('./worker.ts', import.meta.url), { type: 'module' }),
})Requirements
- Browser with WebGPU support (Chrome 113+, Edge 113+, Safari 18+)
- ~0.5–5 GB storage for model weights (cached after first download)
License
GPL-3.0
