@myscheme/voice-navigation-sdk
v0.1.5
Published
Voice navigation SDK using Azure Speech and AWS Bedrock
Maintainers
Readme
Voice Navigation SDK
A TypeScript SDK for voice-controlled navigation using Azure Speech-to-Text and AWS Bedrock for intent understanding.
🚀 Features
- 🎤 Real-time speech recognition using Azure Speech SDK
- 🤖 AI-powered intent extraction using AWS Bedrock (Claude)
- 🌐 Dynamic page navigation via XML configuration
- 🧭 Voice-controlled navigation actions
- 🖱️ Rich browser and media controls
- ♿ Accessibility-first design
- 🎨 Customizable floating UI control
- � Flexible button options - use default or your own custom button
- 📍 Configurable button placement - position the default button anywhere
- �📦 Full TypeScript support
- 🔄 Reusable across multiple websites
- 🔍 Vector search integration with OpenSearch
📦 Installation
npm install @myscheme/voice-navigation-sdkDependencies
The following packages are automatically installed:
@aws-sdk/client-bedrock-runtime- AWS Bedrock integrationmicrosoft-cognitiveservices-speech-sdk- Azure Speech-to-Text
🎯 Quick Start
Basic Setup
import { initNavigationOnMicrophone } from "@myscheme/voice-navigation-sdk";
const controller = initNavigationOnMicrophone({
// Azure Speech configuration
azure: {
subscriptionKey: "your-azure-subscription-key",
region: "centralindia", // e.g., 'eastus', 'westus'
},
// AWS Bedrock configuration
aws: {
accessKeyId: "your-aws-access-key-id",
secretAccessKey: "your-aws-secret-access-key",
modelId: "anthropic.claude-3-sonnet-20240229-v1:0",
region: "ap-south-1",
},
// Optional: Default language
language: "en-IN",
// Optional: Auto-start voice control
autoStart: false,
});With Dynamic Pages
const controller = initNavigationOnMicrophone({
azure: {
/* ... */
},
aws: {
/* ... */
},
// Dynamic page navigation
pages: {
xml: "/navigation-pages.xml",
xmlType: "url", // or "string" for inline XML
},
});⚙️ Configuration
NavigationConfig Options
| Property | Type | Required | Description |
| ----------------------- | --------- | -------- | ---------------------------------------------- |
| azure.subscriptionKey | string | ✅ | Azure Speech subscription key |
| azure.region | string | ✅ | Azure Speech region |
| aws.accessKeyId | string | ✅ | AWS access key ID |
| aws.secretAccessKey | string | ✅ | AWS secret access key |
| aws.modelId | string | ✅ | AWS Bedrock model ID |
| aws.region | string | ❌ | AWS region (default: 'ap-south-1') |
| language | string | ❌ | Speech recognition language (default: 'en-IN') |
| autoStart | boolean | ❌ | Auto-start voice control (default: false) |
| actionHandlers | object | ❌ | Custom action callbacks |
| pages | object | ❌ | Dynamic page navigation configuration |
| opensearch | object | ❌ | OpenSearch vector search configuration |
| ui | object | ❌ | UI customization (button placement & style) |
OpenSearch Configuration
When providing the optional opensearch block:
| Property | Type | Required | Description |
| --------------- | ---------- | -------- | --------------------------------------------------------------- |
| node | string | ✅ | OpenSearch cluster URL |
| username | string | ✅ | OpenSearch username |
| password | string | ✅ | OpenSearch password |
| index | string | ✅ | Index name containing embeddings |
| vectorField | string | ❌ | Embedding field (default: embedding) |
| size | number | ❌ | Result count (default: 5) |
| numCandidates | number | ❌ | k-NN candidates (default: Math.max(size*4,20)) |
| minScore | number | ❌ | Minimum match score (default: 0) |
| sourceFields | string[] | ❌ | Source fields to retrieve |
| apiPath | string | ❌ | Proxy endpoint (default: /api/voice-navigation/vector-search) |
UI Configuration
The library provides flexible button options - use the default floating button or integrate with your own custom button.
Configuration Options
| Property | Type | Required | Description |
| ---------------------- | ----------------- | -------- | ------------------------------------------------------------- |
| showDefaultButton | boolean | ❌ | Show library's default button (default: true) |
| customButtonSelector | string | ❌ | CSS selector for your custom button (e.g., "#my-voice-btn") |
| buttonPlacement | ButtonPlacement | ❌ | Position of default button (default: "bottom-right") |
ButtonPlacement options: "bottom-right" | "bottom-left" | "top-right" | "top-left" | "center-right" | "center-left"
Option 1: Default Button with Custom Placement
Use the library's pre-styled floating button at different positions:
const controller = initNavigationOnMicrophone({
azure: {
/* ... */
},
aws: {
/* ... */
},
// Place button at top-left corner
ui: {
showDefaultButton: true,
buttonPlacement: "top-left",
},
});Available placements:
"bottom-right"(default) - Bottom right corner"bottom-left"- Bottom left corner"top-right"- Top right corner"top-left"- Top left corner"center-right"- Vertically centered on right"center-left"- Vertically centered on left
Option 2: Custom Button
Use your own button design and have the library attach voice control functionality to it:
1. Create your button in HTML:
<button id="my-voice-button" class="my-custom-style">
<span>🎤 Voice Control</span>
</button>2. Configure the SDK to use your button:
const controller = initNavigationOnMicrophone({
azure: {
/* ... */
},
aws: {
/* ... */
},
ui: {
showDefaultButton: false,
customButtonSelector: "#my-voice-button",
},
});Important Notes for Custom Buttons:
- Button must exist in DOM before SDK initialization - Add the SDK initialization code after your button is rendered, or use
DOMContentLoadedevent - The library will automatically add
aria-pressedandaria-labelattributes for accessibility - The button can be any clickable element (button, div, etc.)
- The library handles all click events and state management
- Visual feedback panel will display near the button when voice control is active
- You are responsible for styling the button (pressed, disabled, etc.) based on your design requirements
React Example with Custom Button:
import { useEffect, useRef } from "react";
import { initNavigationOnMicrophone } from "@myscheme/voice-navigation-sdk";
export default function VoiceNavigationProvider() {
const controllerRef = useRef(null);
useEffect(() => {
// Initialize after button is rendered
controllerRef.current = initNavigationOnMicrophone({
azure: {
subscriptionKey: process.env.NEXT_PUBLIC_AZURE_SPEECH_KEY!,
region: process.env.NEXT_PUBLIC_AZURE_SPEECH_REGION!,
},
aws: {
accessKeyId: process.env.NEXT_PUBLIC_AWS_ACCESS_KEY_ID!,
secretAccessKey: process.env.NEXT_PUBLIC_AWS_SECRET_ACCESS_KEY!,
modelId: "anthropic.claude-3-sonnet-20240229-v1:0",
region: "ap-south-1",
},
ui: {
showDefaultButton: false,
customButtonSelector: "#voice-control-btn",
},
});
return () => controllerRef.current?.destroy();
}, []);
return (
<button
id="voice-control-btn"
className="px-4 py-2 bg-blue-600 text-white rounded-full hover:bg-blue-700"
>
🎤 Voice Control
</button>
);
}Next.js App Router Example:
// app/components/voice-button.tsx
"use client";
export default function VoiceButton() {
return (
<button
id="voice-nav-button"
className="fixed bottom-4 right-4 w-16 h-16 rounded-full bg-gradient-to-r from-purple-500 to-pink-500 text-white shadow-lg hover:scale-110 transition-transform"
aria-label="Activate voice navigation"
>
🎤
</button>
);
}
// app/providers/voice-navigation.tsx
("use client");
import { useEffect } from "react";
import { initNavigationOnMicrophone } from "@myscheme/voice-navigation-sdk";
export default function VoiceNavigationProvider() {
useEffect(() => {
const controller = initNavigationOnMicrophone({
azure: {
/* ... */
},
aws: {
/* ... */
},
ui: {
showDefaultButton: false,
customButtonSelector: "#voice-nav-button",
},
});
return () => controller.destroy();
}, []);
return null;
}
// app/layout.tsx
import VoiceButton from "./components/voice-button";
import VoiceNavigationProvider from "./providers/voice-navigation";
export default function RootLayout({ children }) {
return (
<html>
<body>
<VoiceButton />
<VoiceNavigationProvider />
{children}
</body>
</html>
);
}Vanilla JavaScript Example:
<!DOCTYPE html>
<html>
<head>
<style>
.voice-btn {
position: fixed;
bottom: 20px;
right: 20px;
width: 60px;
height: 60px;
border-radius: 50%;
background: linear-gradient(135deg, #667eea 0%, #764ba2 100%);
color: white;
border: none;
cursor: pointer;
font-size: 24px;
box-shadow: 0 4px 12px rgba(0, 0, 0, 0.3);
transition: transform 0.2s;
}
.voice-btn:hover {
transform: scale(1.1);
}
.voice-btn:active {
transform: scale(0.95);
}
</style>
</head>
<body>
<button id="voice-control" class="voice-btn">🎤</button>
<script type="module">
import { initNavigationOnMicrophone } from "@myscheme/voice-navigation-sdk";
const controller = initNavigationOnMicrophone({
azure: {
subscriptionKey: "your-key",
region: "your-region",
},
aws: {
accessKeyId: "your-key",
secretAccessKey: "your-secret",
modelId: "anthropic.claude-3-sonnet-20240229-v1:0",
region: "ap-south-1",
},
ui: {
showDefaultButton: false,
customButtonSelector: "#voice-control",
},
});
</script>
</body>
</html>🌐 Dynamic Page Navigation
Instead of hardcoding page navigation, configure pages dynamically using XML or direct configuration.
Method 1: XML File (Recommended)
Create XML file (public/navigation-pages.xml):
<?xml version="1.0" encoding="UTF-8"?>
<navigation>
<pages>
<page
id="home"
name="Home"
path="/"
keywords="main,homepage,start"
description="Main homepage"
/>
<page
id="about"
name="About"
path="/about"
keywords="information,company"
description="About us page"
/>
<page
id="contact"
name="Contact"
path="/contact"
keywords="reach,support,help"
description="Contact information"
/>
</pages>
</navigation>Configure SDK:
const controller = initNavigationOnMicrophone({
// ... azure & aws config ...
pages: {
xml: "/navigation-pages.xml",
xmlType: "url",
},
});Method 2: Remote XML URL
pages: {
xml: "https://yoursite.com/api/navigation-pages.xml",
xmlType: "url",
}Method 3: Inline XML String
const xmlConfig = `<?xml version="1.0" encoding="UTF-8"?>
<navigation>
<pages>
<page id="home" name="Home" path="/" />
<page id="about" name="About" path="/about" />
</pages>
</navigation>`;
pages: {
xml: xmlConfig,
xmlType: "string",
}Method 4: Direct Configuration
pages: {
pages: [
{
id: "home",
name: "Home",
path: "/",
keywords: ["main", "start"]
},
{
id: "about",
name: "About",
path: "/about",
keywords: ["information", "company"]
},
],
}XML Schema Reference
Required attributes:
id- Unique identifier (createsnavigate_<id>action)name- Display name for the pagepath- URL path to navigate to
Optional attributes:
keywords- Comma-separated keywords for voice matchingdescription- Page description
Voice Command Examples
With the configuration above, users can say:
- "Go to home" → navigates to
/ - "Open about page" → navigates to
/about - "Show me contact" → navigates to
/contact - "Take me to the team page" → navigates to
/team
🎬 Supported Actions
Dynamic Page Navigation
Configure your own pages (see above). Each page automatically gets a navigate_<id> action.
Core Navigation Actions
Scrolling:
scroll_up/scroll_downscroll_left/scroll_rightscroll_top/scroll_bottompage_up/page_down
Zoom:
zoom_in/zoom_out
Browser:
go_back/go_forwardreload_pageprint_pagecopy_url
UI Controls:
open_menu/close_menufocus_searchtoggle_fullscreen/exit_fullscreen
Media:
play_media/pause_mediamute_media/unmute_media
Other:
search_content- Vector search (requires OpenSearch)stop- Stop voice control
📚 API Reference
VoiceNavigationController
Methods
start(): Promise<void>
Start voice control and begin listening.
await controller.start();stop(): Promise<void>
Stop voice control and process pending speech.
await controller.stop();setLanguage(language: string): void
Change the speech recognition language.
controller.setLanguage("hi-IN"); // Switch to HindisetAutoStart(enabled: boolean): void
Enable or disable auto-start on future page loads.
controller.setAutoStart(true);destroy(): void
Clean up and remove the controller.
controller.destroy();Events
Listen for SDK events:
// State changes
window.addEventListener("navigate:state-change", (event) => {
console.log("State:", event.detail.state);
});
// Action detection
window.addEventListener("navigate:action-detected", (event) => {
console.log("Action:", event.detail.action);
});
// Action performance
window.addEventListener("navigate:action-performed", (event) => {
console.log("Performed:", event.detail.performed);
});
// Errors
window.addEventListener("navigate:error", (event) => {
console.error("Error:", event.detail.error);
});🔧 Framework Integration Examples
React
import { useEffect, useRef } from "react";
import { VoiceNavigationController } from "@myscheme/voice-navigation-sdk";
export function VoiceNavigation() {
const controllerRef = useRef<VoiceNavigationController | null>(null);
useEffect(() => {
controllerRef.current = new VoiceNavigationController({
azure: {
subscriptionKey: import.meta.env.VITE_AZURE_SPEECH_KEY,
region: import.meta.env.VITE_AZURE_SPEECH_REGION,
},
aws: {
accessKeyId: import.meta.env.VITE_AWS_ACCESS_KEY_ID,
secretAccessKey: import.meta.env.VITE_AWS_SECRET_ACCESS_KEY,
modelId: "anthropic.claude-3-sonnet-20240229-v1:0",
region: "ap-south-1",
},
pages: {
xml: "/navigation-pages.xml",
xmlType: "url",
},
language: "en-US",
autoStart: false,
});
return () => {
controllerRef.current?.destroy();
};
}, []);
return null;
}Next.js (App Router)
// app/providers/voice-navigation-provider.tsx
"use client";
import { useEffect } from "react";
import { VoiceNavigationController } from "@myscheme/voice-navigation-sdk";
export function VoiceNavigationProvider() {
useEffect(() => {
const controller = new VoiceNavigationController({
azure: {
subscriptionKey: process.env.NEXT_PUBLIC_AZURE_SPEECH_KEY!,
region: process.env.NEXT_PUBLIC_AZURE_SPEECH_REGION!,
},
aws: {
accessKeyId: process.env.NEXT_PUBLIC_AWS_ACCESS_KEY_ID!,
secretAccessKey: process.env.NEXT_PUBLIC_AWS_SECRET_ACCESS_KEY!,
modelId: "anthropic.claude-3-sonnet-20240229-v1:0",
region: "ap-south-1",
},
pages: {
xml: "/navigation-pages.xml",
xmlType: "url",
},
});
return () => controller.destroy();
}, []);
return null;
}
// app/layout.tsx
import { VoiceNavigationProvider } from "./providers/voice-navigation-provider";
export default function RootLayout({ children }) {
return (
<html>
<body>
<VoiceNavigationProvider />
{children}
</body>
</html>
);
}Next.js (Pages Router)
// pages/_app.js
import { useEffect, useRef } from "react";
import { initNavigationOnMicrophone } from "@myscheme/voice-navigation-sdk";
export default function App({ Component, pageProps }) {
const voiceControllerRef = useRef(null);
useEffect(() => {
if (typeof window === "undefined") return;
try {
voiceControllerRef.current = initNavigationOnMicrophone({
azure: {
subscriptionKey: process.env.NEXT_PUBLIC_AZURE_SPEECH_KEY,
region: process.env.NEXT_PUBLIC_AZURE_SPEECH_REGION,
},
aws: {
accessKeyId: process.env.NEXT_PUBLIC_AWS_ACCESS_KEY_ID,
secretAccessKey: process.env.NEXT_PUBLIC_AWS_SECRET_ACCESS_KEY,
modelId: "anthropic.claude-3-sonnet-20240229-v1:0",
region: "ap-south-1",
},
pages: {
xml: "/navigation-pages.xml",
xmlType: "url",
},
language: "en-IN",
autoStart: false,
});
} catch (error) {
console.error("Failed to initialize voice navigation:", error);
}
return () => {
voiceControllerRef.current?.destroy?.();
voiceControllerRef.current = null;
};
}, []);
return <Component {...pageProps} />;
}🚀 Advanced Usage
Vector Search with OpenSearch
1. Configure SDK with OpenSearch:
initNavigationOnMicrophone({
azure: {
/* ... */
},
aws: {
/* ... */
},
opensearch: {
node: process.env.NEXT_PUBLIC_OPENSEARCH_NODE!,
username: process.env.NEXT_PUBLIC_OPENSEARCH_USERNAME!,
password: process.env.NEXT_PUBLIC_OPENSEARCH_PASSWORD!,
index: "my-embeddings-index",
vectorField: "embedding",
size: 5,
apiPath: "/api/voice-navigation/vector-search",
},
});2. Create proxy endpoint (pages/api/voice-navigation/vector-search.ts):
import type { NextApiRequest, NextApiResponse } from "next";
import { createOpenSearchProxyHandler } from "@myscheme/voice-navigation-sdk/server";
const handler = createOpenSearchProxyHandler({
allowedOrigins: process.env.OPENSEARCH_ALLOWED_ORIGINS?.split(","),
});
export default async function vectorSearchProxy(
req: NextApiRequest,
res: NextApiResponse,
) {
await handler(req, res);
}
export const config = {
api: {
bodyParser: false,
},
};Custom Service Initialization
Use individual services separately:
import {
AzureSpeechService,
BedrockService,
} from "@myscheme/voice-navigation-sdk";
// Azure Speech Service
const azureService = new AzureSpeechService({
subscriptionKey: "your-key",
region: "your-region",
});
const tokenResponse = await azureService.fetchToken();
// Bedrock Service
const bedrockService = new BedrockService({
region: "ap-south-1",
accessKeyId: "your-key",
secretAccessKey: "your-secret",
modelId: "anthropic.claude-3-sonnet-20240229-v1:0",
});
const action = await bedrockService.extractAction("zoom in");Programmatic Page Management
import { PageRegistry, setPageRegistry } from "@myscheme/voice-navigation-sdk";
// Create custom registry
const registry = new PageRegistry([
{ id: "home", name: "Home", path: "/" },
{ id: "about", name: "About", path: "/about" },
]);
// Add pages dynamically
registry.addPage({
id: "blog",
name: "Blog",
path: "/blog",
keywords: ["articles", "posts"],
});
// Set as global registry
setPageRegistry(registry);Custom Action Handlers
import { performAgentAction } from "@myscheme/voice-navigation-sdk";
const result = performAgentAction("zoom_in", {
onStop: () => console.log("Stopped"),
});
console.log("Action performed:", result.performed);
console.log("New zoom:", result.info.newZoom);🐛 Troubleshooting
Pages Not Loading
Symptoms:
- Voice commands navigate to wrong page
navigate_searchis triggered instead of page-specific actions- Console shows "Unknown action"
Solutions:
Check XML file exists:
- File should be in
public/folder - Verify URL:
http://localhost:3000/navigation-pages.xml
- File should be in
Validate XML syntax:
<?xml version="1.0" encoding="UTF-8"?> <navigation> <pages> <page id="unique_id" name="Display Name" path="/path" /> </pages> </navigation>Check browser console:
- Look for
[VoiceNavigation]messages - Should see: "✓ Page registry initialized with X pages"
- Look for
Debug in console:
console.log("Registry size:", window.__navigatePageRegistry?.size); console.log("Pages:", window.__navigatePageRegistry?.getAllPages());Clear cache and hard refresh:
- Clear Next.js cache:
rm -rf .next - Hard refresh browser:
Ctrl+Shift+R/Cmd+Shift+R
- Clear Next.js cache:
Voice Not Recognized
Solutions:
- Check microphone permissions
- Reduce background noise
- Speak clearly at normal pace
- Ensure microphone is working in other apps
Wrong Page Navigation
Solutions:
- Add more specific keywords to pages
- Make page names more distinct
- Remove overlapping keywords
- Use exact page names in commands
SDK Initialization Fails
Solutions:
- Check all required config values are provided
- Verify AWS credentials are valid
- Confirm Azure subscription key is active
- Check browser console for error messages
Testing Checklist
- [ ] XML file accessible in browser
- [ ] Pages config in SDK initialization
- [ ] Browser console shows successful load
- [ ]
window.__navigatePageRegistry.size > 0 - [ ] No 404 errors in Network tab
- [ ] Hard refresh after changes
💡 Best Practices
Security
Never expose credentials in client code:
// ❌ BAD aws: { accessKeyId: "AKIAXXXXXXXX", secretAccessKey: "xxxxxxxxxxxxx", } // ✅ GOOD aws: { accessKeyId: process.env.NEXT_PUBLIC_AWS_ACCESS_KEY_ID!, secretAccessKey: process.env.NEXT_PUBLIC_AWS_SECRET_ACCESS_KEY!, }Use temporary credentials:
- AWS Cognito for user-specific tokens
- STS for temporary access keys
- API Gateway with authorization
Implement backend proxy:
- Proxy Bedrock requests through your server
- Never expose AWS keys to the browser
- Use environment-specific configurations
Performance
- Use XML files over direct config for better caching
- Add specific keywords to reduce AI processing time
- Limit the number of pages to essential navigation
- Enable auto-start carefully (consider user experience)
User Experience
- Clear page names: Use descriptive, unique names
- Relevant keywords: Include synonyms and common phrases
- Test voice commands: Try different phrasings
- Provide UI feedback: Listen for SDK events
- Handle errors gracefully: Show helpful error messages
Page Configuration
- Use descriptive IDs:
privacy_policynotpp - Add multiple keywords: Cover variations and synonyms
- Keep paths accurate: Match your actual routes
- Start small: Begin with core pages, expand gradually
- Test variations: Try different voice commands
🌐 Browser Support
Requirements:
- Modern browser with ES2020+ support
- Microphone access
MediaDevicesAPIfetchAPIDOMParser(for XML)
Tested on:
- Chrome 90+
- Firefox 88+
- Safari 14+
- Edge 90+
🔒 Security Considerations
⚠️ Important Security Notes:
Credential Protection:
- Never hardcode AWS/Azure credentials
- Use environment variables
- Rotate keys regularly
- Use least-privilege IAM policies
Production Setup:
- Implement AWS Cognito for temporary credentials
- Use API Gateway for request authorization
- Proxy sensitive operations through backend
- Enable CloudWatch logging for monitoring
Best Practices:
- Validate all user inputs
- Sanitize XML content
- Implement rate limiting
- Monitor API usage
- Use HTTPS only
📄 Environment Variables
Create .env.local (Next.js) or .env file:
# AWS Bedrock
NEXT_PUBLIC_AWS_REGION=ap-south-1
NEXT_PUBLIC_AWS_ACCESS_KEY_ID=your_access_key_here
NEXT_PUBLIC_AWS_SECRET_ACCESS_KEY=your_secret_key_here
# Azure Speech
NEXT_PUBLIC_AZURE_SPEECH_KEY=your_subscription_key_here
NEXT_PUBLIC_AZURE_SPEECH_REGION=centralindia
# OpenSearch (Optional)
NEXT_PUBLIC_OPENSEARCH_NODE=https://your-cluster.example.com
NEXT_PUBLIC_OPENSEARCH_USERNAME=your_username
NEXT_PUBLIC_OPENSEARCH_PASSWORD=your_password
NEXT_PUBLIC_OPENSEARCH_INDEX=your_index📝 License
MIT
🤝 Contributing
Contributions are welcome! This library is currently in beta phase.
📧 Support
For issues and questions, please open an issue on the repository.
🎯 Version
Current version: 0.1.0 (Beta)
This library is under active development. APIs may change between releases.
