@iloveagents/azure-voice-live-react
v0.1.1
Published
React library for Azure AI Foundry Voice Live API. Full TypeScript support with optimized presets and configuration helpers.
Maintainers
Readme
@iloveagents/azure-voice-live-react
A comprehensive, production-ready React library for Azure Voice Live API with complete feature coverage and TypeScript support.
Overview
Azure Voice Live enables real-time voice conversations with AI models through native audio streaming. This library provides a complete React implementation with full API coverage and a fluent configuration API.
Key Features:
- Complete API Coverage - All Azure Voice Live parameters supported and typed
- TypeScript First - Comprehensive type definitions with full IntelliSense support
- Production Ready - Enterprise-grade code with proper error handling and validation
- Fluent API - 25+ composable helper functions for streamlined configuration
- React Hooks - Modern hooks-based architecture with integrated microphone capture
- Zero Config Audio - Microphone auto-starts when session is ready (no manual coordination)
- Avatar Support - Real-time avatar video with GPU-accelerated chroma key compositing
- Audio Enhancements - Built-in echo cancellation, noise suppression, and semantic VAD
- Function Calling - Complete tool support with async executor pattern
- Zero Dependencies - No external runtime dependencies (React only)
Installation
npm install @iloveagents/azure-voice-live-reactOr using other package managers:
yarn add @iloveagents/azure-voice-live-react
pnpm add @iloveagents/azure-voice-live-reactSecurity
Important: Never commit API keys to version control.
For Development:
- Use environment variables (
.envfiles) - Add
.envto.gitignore - Example:
apiKey: process.env.VITE_AZURE_SPEECH_KEY
For Production:
- Recommended: Use backend proxy with Microsoft Entra ID (MSAL) authentication
- Use managed identities for Azure-hosted applications
- Never expose API keys in client-side code
Quick Start
Basic Implementation
import { useVoiceLive, VoiceLiveAvatar } from '@iloveagents/azure-voice-live-react';
function VoiceAssistant() {
// Microphone automatically starts when session is ready!
const { videoStream, connect, disconnect, connectionState } = useVoiceLive({
connection: {
resourceName: 'your-azure-resource-name',
apiKey: process.env.AZURE_VOICE_LIVE_KEY,
model: 'gpt-realtime', // GPT-4o Realtime model (recommended)
},
session: {
instructions: 'You are a helpful AI assistant.',
voice: 'en-US-Ava:DragonHDLatestNeural',
},
});
return (
<div>
<VoiceLiveAvatar videoStream={videoStream} />
<button onClick={connect} disabled={connectionState === 'connected'}>
Connect
</button>
<button onClick={disconnect} disabled={connectionState === 'disconnected'}>
Disconnect
</button>
<p>Status: {connectionState}</p>
</div>
);
}That's it! The microphone automatically:
- Requests permissions when you call
connect() - Starts capturing when the session is ready
- Stops when you call
disconnect()
No manual audio coordination needed!
Voice-Only with Visualization
For voice-only applications, the hook provides a pre-configured audioAnalyser for effortless visualization:
import { useRef, useEffect } from 'react';
import { useVoiceLive, createVoiceLiveConfig } from '@iloveagents/azure-voice-live-react';
function VoiceVisualizer() {
const canvasRef = useRef<HTMLCanvasElement>(null);
const audioRef = useRef<HTMLAudioElement>(null);
const config = createVoiceLiveConfig({
connection: {
resourceName: 'your-resource-name',
apiKey: process.env.AZURE_VOICE_LIVE_KEY,
},
});
// Get pre-configured audio analyser - no manual setup needed!
const { connect, disconnect, audioStream, audioAnalyser, connectionState } = useVoiceLive(config);
// Connect audio stream for playback
useEffect(() => {
if (audioRef.current && audioStream) {
audioRef.current.srcObject = audioStream;
audioRef.current.play().catch(console.error);
}
}, [audioStream]);
// Visualize audio using the pre-configured analyser
useEffect(() => {
if (!audioAnalyser || !canvasRef.current) return;
const canvas = canvasRef.current;
const ctx = canvas.getContext('2d');
if (!ctx) return;
const dataArray = new Uint8Array(audioAnalyser.frequencyBinCount);
const draw = () => {
requestAnimationFrame(draw);
audioAnalyser.getByteFrequencyData(dataArray);
// Your visualization logic here
// No need to create AudioContext or AnalyserNode manually!
};
draw();
}, [audioAnalyser]);
return (
<div>
<canvas ref={canvasRef} width={800} height={200} />
<audio ref={audioRef} autoPlay hidden />
<button onClick={connect}>Start</button>
<button onClick={disconnect}>Stop</button>
</div>
);
}No audio complexity - the hook handles:
- AudioContext creation and configuration (48kHz, low-latency)
- Professional-grade Lanczos-3 resampling (24kHz → 48kHz)
- AnalyserNode setup for visualization
- Audio routing and stream management
- Proper cleanup on disconnect
Configuration API
Simple Configuration Builder
Use createVoiceLiveConfig to build your configuration:
import { useVoiceLive, createVoiceLiveConfig } from '@iloveagents/azure-voice-live-react';
const config = createVoiceLiveConfig({
connection: {
resourceName: 'your-resource-name',
apiKey: process.env.AZURE_VOICE_LIVE_KEY,
},
session: {
instructions: 'You are a helpful assistant.',
voice: 'en-US-Ava:DragonHDLatestNeural',
},
});
const { videoStream, connect } = useVoiceLive(config);Fluent Helper Functions
Build custom configurations using composable helper functions:
import {
useVoiceLive,
withHDVoice,
withSemanticVAD,
withEchoCancellation,
withDeepNoiseReduction,
compose
} from '@iloveagents/azure-voice-live-react';
// Compose multiple configuration helpers
const enhanceAudio = compose(
withEchoCancellation,
withDeepNoiseReduction,
(config) => withSemanticVAD({
threshold: 0.5,
removeFillerWords: true,
interruptResponse: true,
}, config),
(config) => withHDVoice('en-US-Ava:DragonHDLatestNeural', {
temperature: 0.9,
rate: '1.1'
}, config)
);
const { videoStream, connect } = useVoiceLive({
connection: {
resourceName: 'your-resource-name',
apiKey: process.env.AZURE_VOICE_LIVE_KEY,
},
session: enhanceAudio({
instructions: 'You are a helpful assistant.',
}),
});Available Helper Functions
Voice Configuration:
withVoice(voice, config)- Configure voice (string or VoiceConfig)withHDVoice(name, options, config)- Configure HD voice with temperature/rate controlwithCustomVoice(name, config)- Configure custom trained voice
Avatar Configuration:
withAvatar(character, style, options, config)- Configure avatar character and stylewithTransparentBackground(config, options?)- Enable transparent background with chroma key (default green, customizable)withBackgroundImage(url, config)- Add custom background imagewithAvatarCrop(crop, config)- Configure video cropping for portrait mode
Turn Detection:
withSemanticVAD(options, config)- Azure Semantic VAD (recommended)withMultilingualVAD(languages, options, config)- Multi-language semantic VADwithEndOfUtterance(options, config)- Advanced end-of-utterance detectionwithoutTurnDetection(config)- Disable automatic turn detection (manual mode)
Audio Enhancements:
withEchoCancellation(config)- Enable server-side echo cancellationwithoutEchoCancellation(config)- Disable echo cancellationwithDeepNoiseReduction(config)- Azure deep noise suppressionwithNearFieldNoiseReduction(config)- Near-field noise reductionwithoutNoiseReduction(config)- Disable noise reductionwithSampleRate(rate, config)- Set sample rate (16000 or 24000 Hz)
Output Features:
withViseme(config)- Enable viseme data for lip-sync animationwithWordTimestamps(config)- Enable word-level audio timestampswithTranscription(options, config)- Enable input audio transcriptionwithoutTranscription(config)- Disable transcription
Function Calling:
withTools(tools, config)- Add function toolswithToolChoice(choice, config)- Set tool choice behavior ('auto', 'none', 'required')
Composition:
compose(...fns)- Compose multiple configuration functions
API Reference
useVoiceLive(config) Hook
Main hook for Azure Voice Live API integration with integrated microphone capture.
Parameters:
interface UseVoiceLiveConfig {
// Connection configuration
connection: {
resourceName: string; // Azure AI Foundry resource name
apiKey: string; // Azure API key
model?: string; // Model name (default: 'gpt-realtime')
apiVersion?: string; // API version (default: '2025-10-01')
};
// Session configuration (optional)
session?: VoiceLiveSessionConfig;
// Auto-connect on mount (default: false)
autoConnect?: boolean;
// Microphone configuration
autoStartMic?: boolean; // Auto-start mic when ready (default: true)
audioSampleRate?: number; // Sample rate (default: 24000)
audioConstraints?: MediaTrackConstraints | boolean; // Microphone selection
// Event handler for all Voice Live events
onEvent?: (event: VoiceLiveEvent) => void;
// Tool executor for function calling
toolExecutor?: (toolCall: ToolCall) => Promise<any>;
}Returns:
interface UseVoiceLiveReturn {
// Connection state
connectionState: 'disconnected' | 'connecting' | 'connected';
// Media streams
videoStream: MediaStream | null; // Avatar video stream (WebRTC)
audioStream: MediaStream | null; // Audio stream for playback
// Audio visualization (voice-only mode)
audioContext: AudioContext | null; // Web Audio API context
audioAnalyser: AnalyserNode | null; // Pre-configured analyser for visualization
// Microphone state and control
isMicActive: boolean; // Whether microphone is capturing
startMic: () => Promise<void>; // Manually start microphone
stopMic: () => void; // Manually stop microphone
// Connection methods
connect: () => Promise<void>; // Establish connection
disconnect: () => void; // Close connection
// Communication methods
sendEvent: (event: any) => void; // Send custom event to API
updateSession: (config: Partial<VoiceLiveSessionConfig>) => void; // Update session
// Advanced features
isReady: boolean; // Whether session is ready for interaction
error: string | null; // Error message if any
}Microphone Control:
By default, the microphone automatically starts when the session is ready (autoStartMic: true). You can:
// Use default auto-start behavior (recommended)
const { connect, disconnect } = useVoiceLive(config);
// Manual control
const { connect, startMic, stopMic, isMicActive } = useVoiceLive({
...config,
autoStartMic: false, // Disable auto-start
});
// Select specific microphone device
const { connect } = useVoiceLive({
...config,
audioConstraints: { deviceId: 'specific-device-id' },
});VoiceLiveAvatar Component
Component for rendering avatar video with optional chroma key compositing.
Props:
interface VoiceLiveAvatarProps {
videoStream: MediaStream | null;
// Chroma key settings
enableChromaKey?: boolean; // Enable green screen removal
chromaKeyColor?: string; // Key color (default: '#00FF00')
chromaKeySimilarity?: number; // Color similarity (0-1, default: 0.4)
chromaKeySmoothness?: number; // Edge smoothness (0-1, default: 0.1)
// Styling
className?: string;
style?: React.CSSProperties;
// Callbacks
onVideoReady?: () => void; // Called when video is ready
}useAudioCapture() Hook
Note: Microphone capture is now integrated into useVoiceLive. You typically don't need this hook unless you're building custom audio processing pipelines.
Hook for standalone microphone audio capture with AudioWorklet processing.
Parameters:
interface AudioCaptureConfig {
sampleRate?: number; // Sample rate (default: 24000)
workletPath?: string; // Custom AudioWorklet path (optional)
audioConstraints?: MediaTrackConstraints; // getUserMedia constraints
onAudioData?: (data: ArrayBuffer) => void; // Audio data callback
autoStart?: boolean; // Auto-start capture (default: false)
}Returns:
interface AudioCaptureReturn {
isCapturing: boolean; // Capture state
startCapture: () => Promise<void>; // Start capturing
stopCapture: () => void; // Stop capturing
pauseCapture: () => void; // Pause capture
resumeCapture: () => void; // Resume capture
}Advanced Example (Custom Processing):
import { useAudioCapture } from '@iloveagents/azure-voice-live-react';
// Only needed for custom audio processing outside of Voice Live
const { startCapture, stopCapture, isCapturing } = useAudioCapture({
sampleRate: 24000,
onAudioData: (audioData) => {
// Custom processing logic here
processAudioData(audioData);
},
});Audio Helper Utilities
Convenience helpers for audio processing.
arrayBufferToBase64()
Low-level utility for converting ArrayBuffer to base64 string safely.
Usage:
import { arrayBufferToBase64 } from '@iloveagents/azure-voice-live-react';
const base64 = arrayBufferToBase64(audioData);
sendEvent({ type: 'input_audio_buffer.append', audio: base64 });Note: Uses chunked conversion (32KB chunks) to avoid stack overflow from spread operator.
When to use: This is only needed for advanced use cases where you're manually processing audio data. For standard usage, microphone capture is integrated into useVoiceLive and handles this automatically.
Session Configuration
The session parameter supports all Azure Voice Live API options:
interface VoiceLiveSessionConfig {
// System instructions
instructions?: string;
// Model parameters
temperature?: number; // Response creativity (0-1)
maxResponseOutputTokens?: number; // Maximum response length
// Voice configuration
voice?: string | VoiceConfig;
// Turn detection
turnDetection?: TurnDetectionConfig | null;
// Audio enhancements
inputAudioEchoCancellation?: EchoCancellationConfig | null;
inputAudioNoiseReduction?: NoiseReductionConfig | null;
inputAudioSamplingRate?: 16000 | 24000;
// Avatar configuration
avatar?: AvatarConfig;
// Function calling
tools?: ToolDefinition[];
toolChoice?: 'auto' | 'none' | 'required';
// Output configuration
animation?: AnimationConfig; // Viseme output
outputAudioTimestampTypes?: TimestampType[]; // Word timestamps
// Input transcription
inputAudioTranscription?: TranscriptionConfig | null;
// Additional parameters...
}For complete type definitions, see the TypeScript types included with the package.
Advanced Examples
Avatar with Transparent Background
import {
useVoiceLive,
VoiceLiveAvatar,
withAvatar,
withTransparentBackground,
compose
} from '@iloveagents/azure-voice-live-react';
const configureAvatar = compose(
(config) => withAvatar('lisa', 'casual-standing', {
resolution: { width: 1920, height: 1080 },
bitrate: 2000000,
}, config),
withTransparentBackground // No color needed - uses default green
);
function AvatarApp() {
const { videoStream, connect } = useVoiceLive({
connection: {
resourceName: process.env.AZURE_RESOURCE_NAME,
apiKey: process.env.AZURE_API_KEY,
},
session: configureAvatar({
instructions: 'You are a helpful assistant.',
}),
});
return <VoiceLiveAvatar videoStream={videoStream} enableChromaKey />;
}Function Calling
import { useVoiceLive, withTools } from '@iloveagents/azure-voice-live-react';
const weatherTool = {
type: 'function',
name: 'get_weather',
description: 'Get current weather for a location',
parameters: {
type: 'object',
properties: {
location: {
type: 'string',
description: 'City name or zip code'
}
},
required: ['location']
}
};
function WeatherAssistant() {
const executeTool = async (toolCall: ToolCall) => {
if (toolCall.name === 'get_weather') {
const { location } = toolCall.arguments;
// Fetch weather data
const weather = await fetchWeather(location);
return weather;
}
};
const { videoStream, connect } = useVoiceLive({
connection: {
resourceName: process.env.AZURE_RESOURCE_NAME,
apiKey: process.env.AZURE_API_KEY,
},
session: withTools([weatherTool], {
instructions: 'You are a weather assistant with access to real-time weather data.',
}),
toolExecutor: executeTool,
});
return <VoiceLiveAvatar videoStream={videoStream} />;
}Event Handling
import { useVoiceLive } from '@iloveagents/azure-voice-live-react';
function EventMonitor() {
const handleEvent = (event: VoiceLiveEvent) => {
switch (event.type) {
case 'session.created':
console.log('Session established');
break;
case 'conversation.item.input_audio_transcription.completed':
console.log('User said:', event.transcript);
break;
case 'response.audio_transcript.delta':
console.log('Assistant saying:', event.delta);
break;
case 'response.done':
console.log('Response complete');
break;
case 'error':
console.error('Error occurred:', event.error);
break;
}
};
const { videoStream, connect } = useVoiceLive({
connection: {
resourceName: process.env.AZURE_RESOURCE_NAME,
apiKey: process.env.AZURE_API_KEY,
},
session: {
instructions: 'You are a helpful assistant.',
},
onEvent: handleEvent,
});
return <VoiceLiveAvatar videoStream={videoStream} />;
}1. Use Recommended Defaults
The library defaults to optimal settings:
- Model:
gpt-realtime - Turn Detection:
azure_semantic_vad(most reliable) - Sample Rate: 24000 Hz (best quality)
- Echo Cancellation: Enabled
- Noise Suppression: Enabled
- Auto-start Mic: Enabled (no manual coordination needed)
2. Enable Audio Enhancements
For production deployments, always enable audio enhancements:
const enhanceAudio = compose(
withEchoCancellation,
withDeepNoiseReduction,
(config) => withSemanticVAD({ threshold: 0.5 }, config)
);3. Use Azure Semantic VAD
Azure Semantic VAD provides superior turn detection compared to simple volume-based detection:
withSemanticVAD({
threshold: 0.5, // Detection threshold
removeFillerWords: true, // Remove "um", "uh", etc.
interruptResponse: true, // Allow user interruptions
endOfUtteranceDetection: { // Advanced end-of-speech detection
model: 'semantic_detection_v1',
thresholdLevel: 'medium',
timeoutMs: 1000,
}
})4. Handle Errors Properly
Implement robust error handling:
onEvent: (event) => {
if (event.type === 'error') {
console.error('Voice Live error:', event.error);
// Implement retry logic or user notification
}
}5. Secure API Keys
Never expose API keys in client-side code:
// ❌ Bad - API key in code
const apiKey = 'your-api-key-here';
// ✅ Good - Use environment variables
const apiKey = process.env.AZURE_VOICE_LIVE_KEY;
// ✅ Better - Fetch from secure backend
const apiKey = await fetchApiKeyFromBackend();Requirements
Peer Dependencies
- React: ≥16.8.0 (Hooks support required)
- React DOM: ≥16.8.0
Browser Requirements
- Modern browser with WebRTC support
- WebAudio API support
- AudioWorklet support (for microphone capture)
Azure Requirements
- Azure AI Foundry resource with Voice Live API enabled
- Deployed voice-enabled model (
gpt-realtimeorgpt-realtime-mini) - Valid API key with appropriate permissions
Azure Setup
Create Azure AI Foundry Resource
- Navigate to Azure Portal
- Create new AI Foundry resource
- Note your resource name
Enable Voice Live API
- In your AI Foundry resource, enable Voice Live API
- Deploy a voice-enabled model (gpt-realtime recommended)
Get API Key
- Navigate to Keys and Endpoint
- Copy your API key
- Store securely (use environment variables)
For detailed setup instructions, see Azure Voice Live Documentation.
Examples & Playground
The library includes a comprehensive playground with working examples for all features:
Voice Examples
- Voice Chat - Simple - Basic voice-only implementation
- Voice Chat - Advanced - Full configuration with semantic VAD
- Voice Chat - Secure Proxy - Backend proxy with API key
- Voice Chat - Secure Proxy (MSAL) - User-level authentication
Avatar Examples
- Avatar - Simple - Basic avatar with video
- Avatar - Advanced - Chroma key + noise suppression
- Avatar - Secure Proxy - Backend proxy with API key
- Avatar - Secure Proxy (MSAL) - User-level authentication
Advanced Features
- Function Calling - Tool/function integration
- Audio Visualizer - Real-time audio visualization
- Viseme Animation - Custom avatar lip-sync
- Agent Service - Azure AI Foundry Agent integration
Running the Playground
# Install and start
npm install
npm run devOpen http://localhost:3001 to explore all examples.
Contributing
Contributions are welcome! Please open an issue or pull request on GitHub.
License
MIT © iloveagents
Support
- Issues: GitHub Issues
- Discussions: GitHub Discussions
- Documentation: API Reference
Acknowledgments
Built for Azure Voice Live API. For official Microsoft documentation, visit:
Built by iloveagents
