voxglide
v1.1.1
Published
Embeddable voice AI SDK for web pages — form filling, navigation, Q&A via speech recognition and server proxy
Downloads
912
Maintainers
Readme
Features
- Voice & text input — Browser Speech API with automatic text fallback
- Form filling — Detects fields, fills values, triggers React/Vue/Angular change detection
- Smart page scanning — Auto-discovers forms, headings, navigation, interactive elements
- Multi-LLM support — Gemini, OpenAI, Anthropic, Ollama (any OpenAI-compatible API)
- Themeable UI — Presets, sizes, full color control
- Conversation workflows — Guided multi-step flows with validation
- Accessibility — ARIA live regions, keyboard shortcuts, screen reader tools
- Zero dependencies — Self-contained SDK with Shadow DOM isolation
Architecture
Browser (SDK) Server (proxy)
───────────── ──────────────
SpeechRecognition → text ──WS──→ Receives text
Execute DOM actions ←──WS──────← LLM tool calls
SpeechSynthesis (TTS) LLM API (holds key)
Page context scanning Session/history mgmt
Shadow DOM UI Context cachingQuick Start
1. Start the server
The server is a thin proxy that holds your API key — it never reaches the browser.
cd server && npm install
GEMINI_API_KEY=your-key npm run dev# OpenAI / GPT
OPENAI_API_KEY=your-key LLM_PROVIDER=openai npm run dev
# Anthropic / Claude
ANTHROPIC_API_KEY=your-key LLM_PROVIDER=anthropic npm run dev
# Ollama (local, no key needed)
LLM_PROVIDER=ollama npm run dev2. Add the SDK
Script tag (IIFE):
<script src="https://your-server.com/sdk/voice-sdk.iife.js"></script>
<script>
const sdk = new VoxGlide.VoiceSDK({
serverUrl: 'wss://your-server.com',
});
</script>import { VoiceSDK } from 'voxglide';
const sdk = new VoiceSDK({
serverUrl: 'wss://your-server.com',
autoContext: true,
tts: true,
});That's it. The SDK auto-discovers forms and interactive elements on the page.
Configuration
const sdk = new VoiceSDK({
serverUrl: 'wss://your-server.com', // Required
autoContext: true, // Auto-scan DOM for context
context: 'This is a checkout page', // Developer-supplied context
language: 'en-US', // Speech recognition language
tts: true, // Enable browser text-to-speech
ui: { theme: 'ocean', size: 'md' }, // UI theming
debug: false, // Verbose logging
autoReconnect: true, // Reconnect after navigation
});See docs/configuration.md for full configuration reference.
Custom Tools
Pages can expose tools via window.nbt_functions — the SDK auto-discovers them:
<script>
window.nbt_functions = {
lookupOrder: {
description: 'Look up an order by ID',
parameters: {
orderId: { type: 'string', description: 'The order ID', required: true },
},
handler: async (args) => {
return await fetch(`/api/orders/${args.orderId}`).then(r => r.json());
},
},
};
</script>You can also register tools via SDK config or at runtime. See the custom tools docs.
Documentation
| Topic | Link | |-------|------| | Configuration & theming | docs/configuration.md | | Custom tools | docs/custom-tools.md | | Conversation workflows | docs/workflows.md | | Events reference | docs/events.md | | Server setup | docs/server.md | | Architecture overview | docs/architecture.md |
Examples
The examples/ directory contains demo pages:
- basic.html — Minimal integration
- form-filling.html — Form auto-fill demo
- custom-actions.html — Custom tool registration
Contributing
We welcome contributions! See CONTRIBUTING.md for setup instructions, code style, and PR guidelines.
git clone https://github.com/billiax/voxglide.git
cd voxglide && npm install
npm run check # typecheck + lint + test