audio-forms
v0.1.0
Published
Fill forms with voice using Gemini Live API
Downloads
129
Maintainers
Readme
audio-forms
Fill forms with voice. Powered by Gemini Live API.
A React component that lets users fill out forms by speaking. Drop in <AudioForm> around your inputs, run the included server, and your forms fill themselves from voice.
Features
- Voice-to-form — Users speak naturally, fields fill automatically
- Secure by design — API key stays on your server, never exposed to the browser
- Zero config — Detects form fields from
nameattributes automatically - Double-check mode — Model confirms spelling of names, emails, and numbers before filling
- Thinking levels — Adjustable reasoning depth for complex or ambiguous inputs
- Works with any React app — No framework lock-in, just wrap your inputs
Quick Start
1. Install
npm install audio-forms2. Start the server
// server.ts
import { createAudioFormsServer } from 'audio-forms/server';
createAudioFormsServer({
apiKey: process.env.GEMINI_API_KEY,
port: 3001,
});npx tsx server.ts3. Add to your React app
import { AudioForm } from 'audio-forms';
function ContactForm() {
return (
<AudioForm serverUrl="ws://localhost:3001">
<input name="fullName" placeholder="Full Name" />
<input name="email" placeholder="Email" type="email" />
<input name="phone" placeholder="Phone" type="tel" />
</AudioForm>
);
}Click the mic button. Speak. Watch the form fill itself.
How It Works
Browser Your Server Gemini Live API
┌──────────┐ WebSocket ┌──────────┐ WebSocket ┌──────────┐
│ AudioForm│ ──────────────> │ Proxy │ ──────────────> │ Gemini │
│ (mic) │ <────────────── │ (apiKey) │ <────────────── │ (voice) │
└──────────┘ audio/tool └──────────┘ audio/tool └──────────┘- User clicks mic — browser captures audio (16kHz PCM)
- Audio streams to your server via WebSocket
- Server proxies to Gemini Live API (API key added here)
- Model extracts field values via function calling
- Fields update in real-time, model speaks back confirmations
API
<AudioForm> Props
| Prop | Type | Default | Description |
|------|------|---------|-------------|
| serverUrl | string | required | WebSocket URL of your audio-forms server |
| model | string | "gemini-3.1-flash-live-preview" | Gemini model to use |
| doubleCheck | boolean | false | Spell back and confirm values before filling |
| thinkingLevel | "none" \| "low" \| "medium" \| "high" | "none" | Model reasoning depth |
| onFieldUpdate | (field, value) => void | — | Called when a field is filled |
| onStatusChange | (status) => void | — | Status: idle, connecting, listening, processing, error |
| onError | (error) => void | — | Called on errors |
createAudioFormsServer(config)
| Option | Type | Default | Description |
|--------|------|---------|-------------|
| apiKey | string | required | Your Gemini API key |
| port | number | 3001 | WebSocket server port |
Examples
Basic
<AudioForm serverUrl="ws://localhost:3001">
<input name="name" placeholder="Name" />
<input name="email" placeholder="Email" />
</AudioForm>Accurate mode (recommended for names and emails)
<AudioForm serverUrl="ws://localhost:3001" doubleCheck={true}>
<input name="name" placeholder="Name" />
<input name="email" placeholder="Email" />
<input name="phone" placeholder="Phone" />
</AudioForm>With thinking for complex forms
<AudioForm
serverUrl="ws://localhost:3001"
doubleCheck={true}
thinkingLevel="medium"
>
<input name="name" placeholder="Full Name" />
<input name="email" placeholder="Email" />
<input name="phone" placeholder="Phone" />
<input name="address" placeholder="Address" />
<input name="dob" placeholder="Date of Birth" />
<textarea name="notes" placeholder="Notes" />
</AudioForm>With event handlers
<AudioForm
serverUrl="ws://localhost:3001"
onFieldUpdate={(field, value) => {
console.log(`Filled ${field}: ${value}`);
}}
onStatusChange={(status) => {
console.log('Status:', status);
}}
onError={(err) => {
console.error('Audio form error:', err);
}}
>
<input name="name" placeholder="Name" />
</AudioForm>Field Detection
The component automatically detects fields by scanning children for <input>, <textarea>, and <select> elements with a name attribute.
| Attribute | Used for |
|-----------|----------|
| name | Field identifier (sent to model) |
| placeholder | Human-readable label the model sees |
| aria-label | Fallback label if no placeholder |
Security
Your Gemini API key never reaches the browser. The architecture:
- Browser connects to your server via WebSocket
- Your server connects to Gemini Live API with the API key
- Messages are piped bidirectionally
- The client has no knowledge of or access to the API key
Get Started with LLMs
Want to integrate audio-forms using an AI coding assistant? Copy the contents of llms.txt into your LLM's context. It contains a complete integration guide formatted for AI consumption.
Requirements
- Node.js 18+
- React 18+
- A Gemini API key
- Browser with microphone access (HTTPS in production)
Development
git clone https://github.com/voiceshop/audio-forms
cd audio-forms
npm install
npm run build # Build the package
npm run example # Run the demo (http://localhost:3000)License
MIT
