voxglide

v1.1.1

Published

21 hours ago

Embeddable voice AI SDK for web pages — form filling, navigation, Q&A via speech recognition and server proxy

Downloads

912

0High
0Medium
0Low

nextbt

voice ai speech-recognition form-filling accessibility sdk web-speech-api voice-assistant

Features

Voice & text input — Browser Speech API with automatic text fallback
Form filling — Detects fields, fills values, triggers React/Vue/Angular change detection
Smart page scanning — Auto-discovers forms, headings, navigation, interactive elements
Multi-LLM support — Gemini, OpenAI, Anthropic, Ollama (any OpenAI-compatible API)
Themeable UI — Presets, sizes, full color control
Conversation workflows — Guided multi-step flows with validation
Accessibility — ARIA live regions, keyboard shortcuts, screen reader tools
Zero dependencies — Self-contained SDK with Shadow DOM isolation

Architecture

Browser (SDK)                    Server (proxy)
─────────────                    ──────────────
SpeechRecognition → text ──WS──→ Receives text
Execute DOM actions ←──WS──────← LLM tool calls
SpeechSynthesis (TTS)            LLM API (holds key)
Page context scanning            Session/history mgmt
Shadow DOM UI                    Context caching

Quick Start

1. Start the server

The server is a thin proxy that holds your API key — it never reaches the browser.

cd server && npm install
GEMINI_API_KEY=your-key npm run dev

# OpenAI / GPT
OPENAI_API_KEY=your-key LLM_PROVIDER=openai npm run dev

# Anthropic / Claude
ANTHROPIC_API_KEY=your-key LLM_PROVIDER=anthropic npm run dev

# Ollama (local, no key needed)
LLM_PROVIDER=ollama npm run dev

2. Add the SDK

Script tag (IIFE):

<script src="https://your-server.com/sdk/voice-sdk.iife.js"></script>
<script>
  const sdk = new VoxGlide.VoiceSDK({
    serverUrl: 'wss://your-server.com',
  });
</script>

import { VoiceSDK } from 'voxglide';

const sdk = new VoiceSDK({
  serverUrl: 'wss://your-server.com',
  autoContext: true,
  tts: true,
});

That's it. The SDK auto-discovers forms and interactive elements on the page.

Configuration

const sdk = new VoiceSDK({
  serverUrl: 'wss://your-server.com',  // Required
  autoContext: true,                     // Auto-scan DOM for context
  context: 'This is a checkout page',   // Developer-supplied context
  language: 'en-US',                     // Speech recognition language
  tts: true,                             // Enable browser text-to-speech
  ui: { theme: 'ocean', size: 'md' },   // UI theming
  debug: false,                          // Verbose logging
  autoReconnect: true,                   // Reconnect after navigation
});

See docs/configuration.md for full configuration reference.

Custom Tools

Pages can expose tools via window.nbt_functions — the SDK auto-discovers them:

<script>
  window.nbt_functions = {
    lookupOrder: {
      description: 'Look up an order by ID',
      parameters: {
        orderId: { type: 'string', description: 'The order ID', required: true },
      },
      handler: async (args) => {
        return await fetch(`/api/orders/${args.orderId}`).then(r => r.json());
      },
    },
  };
</script>

You can also register tools via SDK config or at runtime. See the custom tools docs.

Documentation

| Topic | Link | |-------|------| | Configuration & theming | docs/configuration.md | | Custom tools | docs/custom-tools.md | | Conversation workflows | docs/workflows.md | | Events reference | docs/events.md | | Server setup | docs/server.md | | Architecture overview | docs/architecture.md |

Examples

The examples/ directory contains demo pages:

basic.html — Minimal integration
form-filling.html — Form auto-fill demo
custom-actions.html — Custom tool registration

Contributing

We welcome contributions! See CONTRIBUTING.md for setup instructions, code style, and PR guidelines.

git clone https://github.com/billiax/voxglide.git
cd voxglide && npm install
npm run check    # typecheck + lint + test

License

MIT