@myscheme/voice-form-filling
v0.1.2
Published
Voice-driven form filling demo using Azure Speech SDK and Amazon Bedrock.
Readme
Voice Form Filling Demo
Voice-first experience that extracts HTML forms, guides users through each question with Azure Speech Services, and uses Amazon Bedrock to intelligently map responses back to the correct form fields. The code is written in TypeScript and ships as an embeddable initialization helper plus a Vite-powered demo page.
Features
- Speech-to-text and text-to-speech via Azure Cognitive Services Speech SDK.
- Automatic extraction of standard HTML form fields, including validation metadata and select/radio/checkbox options.
- Routing of free-form user answers to the correct fields using an Amazon Bedrock model via AWS SDK-only integration.
- Multi-language greeting and prompt flow (English and Hindi) with configurable voice selection.
- Constraint-aware field assignment with audible retry prompts if user input violates form rules.
- Multi-step National Scholarship Portal registration demo covering personal, academic, bank, and document reference sections.
- Simple UI hooks to surface prompts, transcripts, and assignments inside any application.
Project Structure
voice-form-filling/
├─ src/ # Library source (compiled with `tsc`)
│ ├─ index.ts # Public initializeVoiceForm() entry point
│ ├─ VoiceFormService.ts
│ ├─ bedrockRouter.ts # Bedrock SDK integration helper
│ ├─ formExtractor.ts # DOM field parsing helpers
│ └─ types.ts # Shared type definitions
├─ tsconfig.json # Library TypeScript config (emits to dist/)
├─ tsconfig.app.json # Demo TypeScript config used by Vite tooling
├─ vite.config.ts # Vite dev/build configuration (root set to demo/)
└─ package.jsonPrerequisites
- Node.js 18+
- Azure Speech resource with a subscription key and region.
- (Optional) Amazon Bedrock access with an identity that can invoke your chosen model.
Setup
npm installCopy your secrets into environment variables or keep them handy—the demo prompts for them at runtime so they are never committed.
Running the Demo
npm run devThen open the printed Vite dev server URL (default http://localhost:3045). Provide:
- Azure Speech Key and Region – required for speech recognition and synthesis.
- Provide your credentials via Vite env vars before running:
VITE_AZURE_SPEECH_KEY,VITE_AZURE_SPEECH_REGION,VITE_BEDROCK_MODEL_ID,VITE_BEDROCK_REGION,VITE_AWS_ACCESS_KEY_ID,VITE_AWS_SECRET_ACCESS_KEY, and optionallyVITE_AWS_SESSION_TOKEN.
Press Start Voice Flow to begin. The assistant will ask for your language preference (English or Hindi), announce each pending field in the National Scholarship Portal-style form, read options for select/radio/checkbox inputs, and listen for your spoken answers. When speech input falls silent, the Bedrock router determines which fields were answered and fills the underlying HTML form. The activity log in the left panel mirrors spoken prompts, transcripts, and field assignments, while the on-page stepper highlights the current section.
Testing on Mobile (HTTPS Required)
Browsers only expose the microphone to pages loaded in a secure context (HTTPS or
http://localhost). When you visit the dev server from a phone or tablet usinghttp://<your-ip>:3045, mobile browsers block audio capture and the mic never starts.Generate a trusted development certificate (for example with
mkcert) and point the Vite dev server at it:mkcert -install mkcert -key certs/dev-key.pem -cert-file certs/dev-cert.pem 127.0.0.1 ::1 <your-ip> mysite.test DEV_SERVER_USE_HTTPS=true \ DEV_SERVER_SSL_KEY=certs/dev-key.pem \ DEV_SERVER_SSL_CERT=certs/dev-cert.pem \ npm run devThe new environment variables enable HTTPS in
vite.config.ts. Update the hostnames passed tomkcertto match how you access the site on mobile, then loadhttps://<your-ip>:3045(accept the certificate the first time). Once served over HTTPS, the microphone prompt appears and speech recognition works on mobile browsers.
Embedding in Your Application
import { initializeVoiceForm, AwsBedrockRouter } from "voice-form-filling";
import { BedrockRuntimeClient } from "@aws-sdk/client-bedrock-runtime";
const controller = await initializeVoiceForm({
formSelector: "#checkout-form",
azureSpeech: {
subscriptionKey: process.env.AZURE_SPEECH_KEY!,
region: process.env.AZURE_SPEECH_REGION!,
},
bedrock: {
modelId: "anthropic.claude-3-haiku-20240307",
router: new AwsBedrockRouter({
client: new BedrockRuntimeClient({ region: "us-east-1" }),
modelId: "anthropic.claude-3-haiku-20240307",
}),
},
});
await controller.start();Use the optional uiHooks callbacks to surface transcripts or status updates, and call controller.stop() when the user leaves the flow.
Amazon Bedrock Notes
- Browsers should not store long-lived AWS secrets. For production, expose a secure backend endpoint that proxies Bedrock requests and pass it into the library as a custom
BedrockRouterimplementation. - The included
AwsBedrockRouterhelper formats prompts for Anthropic Claude 3 models. Adapt the prompt builder if you prefer other providers.
Developing the Library
npm run typecheck
npm run buildCompiled artifacts land in dist/, mirroring the structure in src/.
Next Steps
- Extend validation rules (e.g., custom date ranges, dependent questions).
- Persist conversation state to resume flows.
License
This project is provided for demonstration purposes without an explicit license. Add one before distributing.
