@aivue/browser-llm
v1.0.0
Published
High-performance in-browser LLM inference for Vue.js using WebLLM and WebGPU
Maintainers
Readme
@aivue/browser-llm
🚀 High-Performance In-Browser LLM Inference for Vue.js
Run powerful AI models completely locally in your browser with zero API costs and 100% privacy!
Demo | Documentation | Report Bug
✨ Features
🔒 100% Privacy - All processing happens locally, no data sent to servers
💰 Zero Cost - No API keys needed, no usage fees
⚡ WebGPU Accelerated - Hardware-accelerated inference using GPU
📦 Model Caching - Models cached in browser for offline use
🔄 OpenAI-Compatible API - Same interface as existing chatbot packages
🎯 Multiple Models - Support for Llama, Phi, Gemma, Mistral, Qwen, and more
📱 Progressive Loading - Show download progress for models
🔌 Drop-in Replacement - Works with existing @aivue/chatbot components
🌐 Vue 2 & 3 Compatible - Works with both Vue 2.6+ and Vue 3.x
📦 Installation
npm install @aivue/browser-llm
# or
yarn add @aivue/browser-llm
# or
pnpm add @aivue/browser-llm🚀 Quick Start
Basic Usage
<template>
<div>
<!-- Model Selector -->
<ModelSelector
:available-models="availableModels"
:current-model="currentModel"
:is-loading="isLoading"
:download-progress="downloadProgress"
:error="error"
:is-web-g-p-u-supported="isWebGPUSupported"
@load-model="loadModel"
/>
<!-- Chat Interface -->
<BrowserLLMChat
:messages="messages"
:is-model-loaded="isModelLoaded"
:is-loading="isLoading"
:current-model="currentModel"
:error="error"
:tokens-per-second="tokensPerSecond"
@send-message="sendMessage"
@stream-message="handleStreamMessage"
@clear="clearMessages"
/>
</div>
</template>
<script setup>
import { useBrowserLLM, ModelSelector, BrowserLLMChat } from '@aivue/browser-llm';
const {
availableModels,
selectedModel,
loadModel,
messages,
sendMessage,
streamMessage,
clearMessages,
isModelLoaded,
isLoading,
downloadProgress,
error,
isWebGPUSupported,
tokensPerSecond,
currentModel,
} = useBrowserLLM({
defaultModel: 'Llama-3.2-1B-Instruct-q4f16_1-MLC',
temperature: 0.7,
systemPrompt: 'You are a helpful AI assistant.',
});
const handleStreamMessage = async (content) => {
for await (const chunk of streamMessage(content)) {
// Handle streaming chunks if needed
}
};
</script>📚 Available Models
| Model | Size | Speed | Use Case | |-------|------|-------|----------| | Llama 3.2 1B | ~800MB | ⚡⚡⚡ | Fast chat, mobile-friendly | | Llama 3.2 3B | ~2GB | ⚡⚡ | Balanced performance | | Llama 3.1 8B | ~5GB | ⚡ | High quality responses | | Phi 3.5 Mini | ~2.5GB | ⚡⚡ | Excellent for reasoning | | Gemma 2 2B | ~1.5GB | ⚡⚡⚡ | Google's efficient model | | Mistral 7B | ~4GB | ⚡ | High quality general purpose | | Qwen 2.5 1.5B | ~1GB | ⚡⚡⚡ | Fast multilingual | | Qwen 2.5 7B | ~4.5GB | ⚡ | Excellent multilingual |
🎯 API Reference
useBrowserLLM(options)
Main composable for browser LLM functionality.
Options
interface BrowserLLMOptions {
defaultModel?: string; // Default model to load
temperature?: number; // 0.0 - 2.0 (default: 0.7)
topP?: number; // 0.0 - 1.0 (default: 0.9)
maxTokens?: number; // Max tokens to generate (default: 2048)
systemPrompt?: string; // System prompt for the model
}Returns
interface UseBrowserLLMReturn {
// Model Management
availableModels: Ref<ModelInfo[]>;
selectedModel: Ref<string>;
loadModel: (modelId: string) => Promise<void>;
unloadModel: () => Promise<void>;
isModelLoaded: Ref<boolean>;
// Chat
messages: Ref<BrowserLLMMessage[]>;
sendMessage: (content: string, options?: ChatOptions) => Promise<string>;
streamMessage: (content: string, options?: ChatOptions) => AsyncGenerator<string>;
clearMessages: () => void;
// Progress & Status
downloadProgress: Ref<ModelDownloadProgress | null>;
isLoading: Ref<boolean>;
error: Ref<Error | null>;
isWebGPUSupported: Ref<boolean>;
// Performance
tokensPerSecond: Ref<number>;
generationStats: Ref<GenerationStats | null>;
}🔧 Advanced Usage
Using Composable Only
<script setup>
import { useBrowserLLM } from '@aivue/browser-llm';
const { loadModel, sendMessage, messages } = useBrowserLLM();
// Load a model
await loadModel('Llama-3.2-1B-Instruct-q4f16_1-MLC');
// Send a message
const response = await sendMessage('Hello, how are you?');
console.log(response);Streaming Responses
<script setup>
import { useBrowserLLM } from '@aivue/browser-llm';
const { loadModel, streamMessage } = useBrowserLLM();
await loadModel('Llama-3.2-1B-Instruct-q4f16_1-MLC');
// Stream a response
for await (const chunk of streamMessage('Tell me a story')) {
console.log(chunk); // Print each chunk as it arrives
}Custom Chat Options
<script setup>
import { useBrowserLLM } from '@aivue/browser-llm';
const { sendMessage } = useBrowserLLM();
// Send with custom options
const response = await sendMessage('Explain quantum physics', {
temperature: 0.3, // More focused
max_tokens: 500, // Limit response length
top_p: 0.95,
});Checking WebGPU Support
<script setup>
import { checkWebGPUSupport, getWebGPUErrorMessage } from '@aivue/browser-llm';
const gpuInfo = await checkWebGPUSupport();
if (!gpuInfo.supported) {
console.error(getWebGPUErrorMessage());
} else {
console.log('WebGPU supported!', gpuInfo);
}🌐 Browser Requirements
- WebGPU Support: Chrome 113+, Edge 113+, or WebGPU-enabled browser
- RAM: 4GB+ recommended (8GB+ for larger models)
- Storage: 2-8GB for model caching
- Internet: Required for initial model download (then works offline)
Browser Compatibility
| Browser | Version | Status | |---------|---------|--------| | Chrome | 113+ | ✅ Fully Supported | | Edge | 113+ | ✅ Fully Supported | | Firefox | Nightly (with flag) | ⚠️ Experimental | | Safari | Not yet | ❌ Not Supported |
💡 Use Cases
🔒 Privacy-First Applications
- Medical/Healthcare apps with sensitive data
- Legal document analysis
- Personal journaling with AI assistance
💰 Cost-Effective Solutions
- Educational platforms
- Prototyping and development
- High-volume applications
📱 Offline-Capable Apps
- Field work applications
- Remote area tools
- Airplane mode functionality
🎓 Learning & Education
- AI/ML education without API costs
- Student projects
- Research and experimentation
🎨 Components
<BrowserLLMChat>
Full-featured chat interface component.
Props:
title(string): Chat window titleplaceholder(string): Input placeholder textmessages(BrowserLLMMessage[]): Chat messagesisModelLoaded(boolean): Whether model is loadedisLoading(boolean): Loading statecurrentModel(string | null): Current model IDerror(Error | null): Error statetokensPerSecond(number): Performance metricuseStreaming(boolean): Enable streaming (default: true)
Events:
@send-message: Emitted when user sends a message@stream-message: Emitted when streaming a message@clear: Emitted when clearing chat
<ModelSelector>
Model selection and loading component.
Props:
availableModels(ModelInfo[]): Available modelscurrentModel(string | null): Currently loaded modelisLoading(boolean): Loading statedownloadProgress(ModelDownloadProgress | null): Download progresserror(Error | null): Error stateisWebGPUSupported(boolean): WebGPU support status
Events:
@load-model: Emitted when loading a model@select-model: Emitted when selecting a model
🔍 Troubleshooting
WebGPU Not Supported
Problem: Browser doesn't support WebGPU
Solution:
- Update to Chrome 113+ or Edge 113+
- Enable WebGPU in browser flags (chrome://flags)
- Check GPU drivers are up to date
Model Download Fails
Problem: Model fails to download
Solution:
- Check internet connection
- Clear browser cache
- Try a smaller model first
- Check available disk space
Out of Memory
Problem: Browser runs out of memory
Solution:
- Use a smaller model (1B-3B parameters)
- Close other browser tabs
- Increase system RAM if possible
- Use low-resource models
Slow Performance
Problem: Model runs slowly
Solution:
- Ensure WebGPU is enabled (not CPU fallback)
- Close background applications
- Try a smaller model
- Check GPU is being utilized
📊 Performance Tips
- Start Small: Begin with 1B-3B parameter models
- Cache Models: Models are cached after first download
- Use Streaming: Better UX with streaming responses
- Monitor Performance: Check
tokensPerSecondmetric - Optimize Prompts: Shorter prompts = faster responses
🤝 Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
📄 License
MIT © reachbrt
🙏 Acknowledgments
🔗 Links
Made with ❤️ by reachbrt
If you find this package useful, please consider giving it a ⭐ on GitHub!
