audio-forms

v0.1.0

Published

9 days ago

Fill forms with voice using Gemini Live API

Downloads

129

0High
0Medium
0Low

vaibhavm

audio forms voice gemini live-api speech-to-form

audio-forms

Fill forms with voice. Powered by Gemini Live API.

A React component that lets users fill out forms by speaking. Drop in <AudioForm> around your inputs, run the included server, and your forms fill themselves from voice.

Features

Voice-to-form — Users speak naturally, fields fill automatically
Secure by design — API key stays on your server, never exposed to the browser
Zero config — Detects form fields from name attributes automatically
Double-check mode — Model confirms spelling of names, emails, and numbers before filling
Thinking levels — Adjustable reasoning depth for complex or ambiguous inputs
Works with any React app — No framework lock-in, just wrap your inputs

Quick Start

1. Install

npm install audio-forms

2. Start the server

// server.ts
import { createAudioFormsServer } from 'audio-forms/server';

createAudioFormsServer({
  apiKey: process.env.GEMINI_API_KEY,
  port: 3001,
});

npx tsx server.ts

3. Add to your React app

import { AudioForm } from 'audio-forms';

function ContactForm() {
  return (
    <AudioForm serverUrl="ws://localhost:3001">
      <input name="fullName" placeholder="Full Name" />
      <input name="email" placeholder="Email" type="email" />
      <input name="phone" placeholder="Phone" type="tel" />
    </AudioForm>
  );
}

Click the mic button. Speak. Watch the form fill itself.

How It Works

Browser                    Your Server                 Gemini Live API
┌──────────┐    WebSocket    ┌──────────┐    WebSocket    ┌──────────┐
│ AudioForm│ ──────────────> │  Proxy   │ ──────────────> │  Gemini  │
│ (mic)    │ <────────────── │ (apiKey) │ <────────────── │  (voice) │
└──────────┘    audio/tool   └──────────┘    audio/tool   └──────────┘

User clicks mic — browser captures audio (16kHz PCM)
Audio streams to your server via WebSocket
Server proxies to Gemini Live API (API key added here)
Model extracts field values via function calling
Fields update in real-time, model speaks back confirmations

API

`<AudioForm>` Props

| Prop | Type | Default | Description | |------|------|---------|-------------| | serverUrl | string | required | WebSocket URL of your audio-forms server | | model | string | "gemini-3.1-flash-live-preview" | Gemini model to use | | doubleCheck | boolean | false | Spell back and confirm values before filling | | thinkingLevel | "none" \| "low" \| "medium" \| "high" | "none" | Model reasoning depth | | onFieldUpdate | (field, value) => void | — | Called when a field is filled | | onStatusChange | (status) => void | — | Status: idle, connecting, listening, processing, error | | onError | (error) => void | — | Called on errors |

`createAudioFormsServer(config)`

| Option | Type | Default | Description | |--------|------|---------|-------------| | apiKey | string | required | Your Gemini API key | | port | number | 3001 | WebSocket server port |

Examples

Basic

<AudioForm serverUrl="ws://localhost:3001">
  <input name="name" placeholder="Name" />
  <input name="email" placeholder="Email" />
</AudioForm>

Accurate mode (recommended for names and emails)

<AudioForm serverUrl="ws://localhost:3001" doubleCheck={true}>
  <input name="name" placeholder="Name" />
  <input name="email" placeholder="Email" />
  <input name="phone" placeholder="Phone" />
</AudioForm>

With thinking for complex forms

<AudioForm
  serverUrl="ws://localhost:3001"
  doubleCheck={true}
  thinkingLevel="medium"
>
  <input name="name" placeholder="Full Name" />
  <input name="email" placeholder="Email" />
  <input name="phone" placeholder="Phone" />
  <input name="address" placeholder="Address" />
  <input name="dob" placeholder="Date of Birth" />
  <textarea name="notes" placeholder="Notes" />
</AudioForm>

With event handlers

<AudioForm
  serverUrl="ws://localhost:3001"
  onFieldUpdate={(field, value) => {
    console.log(`Filled ${field}: ${value}`);
  }}
  onStatusChange={(status) => {
    console.log('Status:', status);
  }}
  onError={(err) => {
    console.error('Audio form error:', err);
  }}
>
  <input name="name" placeholder="Name" />
</AudioForm>

Field Detection

The component automatically detects fields by scanning children for <input>, <textarea>, and <select> elements with a name attribute.

| Attribute | Used for | |-----------|----------| | name | Field identifier (sent to model) | | placeholder | Human-readable label the model sees | | aria-label | Fallback label if no placeholder |

Security

Your Gemini API key never reaches the browser. The architecture:

Browser connects to your server via WebSocket
Your server connects to Gemini Live API with the API key
Messages are piped bidirectionally
The client has no knowledge of or access to the API key

Get Started with LLMs

Want to integrate audio-forms using an AI coding assistant? Copy the contents of llms.txt into your LLM's context. It contains a complete integration guide formatted for AI consumption.

Requirements

Node.js 18+
React 18+
A Gemini API key
Browser with microphone access (HTTPS in production)

Development

git clone https://github.com/voiceshop/audio-forms
cd audio-forms
npm install
npm run build        # Build the package
npm run example      # Run the demo (http://localhost:3000)

License

MIT

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

audio-forms

Features

Quick Start

1. Install

2. Start the server

3. Add to your React app

How It Works

API

<AudioForm> Props

createAudioFormsServer(config)

Examples

Basic

Accurate mode (recommended for names and emails)

With thinking for complex forms

With event handlers

Field Detection

Security

Get Started with LLMs

Requirements

Development

License

`<AudioForm>` Props

`createAudioFormsServer(config)`