@iloveagents/azure-voice-live-react

v0.1.1

Published

2 months ago

React library for Azure AI Foundry Voice Live API. Full TypeScript support with optimized presets and configuration helpers.

0High
0Medium
0Low

ltwlf

azure azure-ai-foundry voice-live react typescript voice-ai realtime-ai gpt-realtime speech-to-text text-to-speech avatar webrtc react-hooks

@iloveagents/azure-voice-live-react

A comprehensive, production-ready React library for Azure Voice Live API with complete feature coverage and TypeScript support.

Overview

Azure Voice Live enables real-time voice conversations with AI models through native audio streaming. This library provides a complete React implementation with full API coverage and a fluent configuration API.

Key Features:

Complete API Coverage - All Azure Voice Live parameters supported and typed
TypeScript First - Comprehensive type definitions with full IntelliSense support
Production Ready - Enterprise-grade code with proper error handling and validation
Fluent API - 25+ composable helper functions for streamlined configuration
React Hooks - Modern hooks-based architecture with integrated microphone capture
Zero Config Audio - Microphone auto-starts when session is ready (no manual coordination)
Avatar Support - Real-time avatar video with GPU-accelerated chroma key compositing
Audio Enhancements - Built-in echo cancellation, noise suppression, and semantic VAD
Function Calling - Complete tool support with async executor pattern
Zero Dependencies - No external runtime dependencies (React only)

Installation

npm install @iloveagents/azure-voice-live-react

Or using other package managers:

yarn add @iloveagents/azure-voice-live-react
pnpm add @iloveagents/azure-voice-live-react

Security

Important: Never commit API keys to version control.

For Development:

Use environment variables (.env files)
Add .env to .gitignore
Example: apiKey: process.env.VITE_AZURE_SPEECH_KEY

For Production:

Recommended: Use backend proxy with Microsoft Entra ID (MSAL) authentication
Use managed identities for Azure-hosted applications
Never expose API keys in client-side code

Quick Start

Basic Implementation

import { useVoiceLive, VoiceLiveAvatar } from '@iloveagents/azure-voice-live-react';

function VoiceAssistant() {
  // Microphone automatically starts when session is ready!
  const { videoStream, connect, disconnect, connectionState } = useVoiceLive({
    connection: {
      resourceName: 'your-azure-resource-name',
      apiKey: process.env.AZURE_VOICE_LIVE_KEY,
      model: 'gpt-realtime', // GPT-4o Realtime model (recommended)
    },
    session: {
      instructions: 'You are a helpful AI assistant.',
      voice: 'en-US-Ava:DragonHDLatestNeural',
    },
  });

  return (
    <div>
      <VoiceLiveAvatar videoStream={videoStream} />
      <button onClick={connect} disabled={connectionState === 'connected'}>
        Connect
      </button>
      <button onClick={disconnect} disabled={connectionState === 'disconnected'}>
        Disconnect
      </button>
      <p>Status: {connectionState}</p>
    </div>
  );
}

That's it! The microphone automatically:

Requests permissions when you call connect()
Starts capturing when the session is ready
Stops when you call disconnect()

No manual audio coordination needed!

Voice-Only with Visualization

For voice-only applications, the hook provides a pre-configured audioAnalyser for effortless visualization:

import { useRef, useEffect } from 'react';
import { useVoiceLive, createVoiceLiveConfig } from '@iloveagents/azure-voice-live-react';

function VoiceVisualizer() {
  const canvasRef = useRef<HTMLCanvasElement>(null);
  const audioRef = useRef<HTMLAudioElement>(null);

  const config = createVoiceLiveConfig({
    connection: {
      resourceName: 'your-resource-name',
      apiKey: process.env.AZURE_VOICE_LIVE_KEY,
    },
  });

  // Get pre-configured audio analyser - no manual setup needed!
  const { connect, disconnect, audioStream, audioAnalyser, connectionState } = useVoiceLive(config);

  // Connect audio stream for playback
  useEffect(() => {
    if (audioRef.current && audioStream) {
      audioRef.current.srcObject = audioStream;
      audioRef.current.play().catch(console.error);
    }
  }, [audioStream]);

  // Visualize audio using the pre-configured analyser
  useEffect(() => {
    if (!audioAnalyser || !canvasRef.current) return;

    const canvas = canvasRef.current;
    const ctx = canvas.getContext('2d');
    if (!ctx) return;

    const dataArray = new Uint8Array(audioAnalyser.frequencyBinCount);

    const draw = () => {
      requestAnimationFrame(draw);
      audioAnalyser.getByteFrequencyData(dataArray);

      // Your visualization logic here
      // No need to create AudioContext or AnalyserNode manually!
    };
    draw();
  }, [audioAnalyser]);

  return (
    <div>
      <canvas ref={canvasRef} width={800} height={200} />
      <audio ref={audioRef} autoPlay hidden />
      <button onClick={connect}>Start</button>
      <button onClick={disconnect}>Stop</button>
    </div>
  );
}

No audio complexity - the hook handles:

AudioContext creation and configuration (48kHz, low-latency)
Professional-grade Lanczos-3 resampling (24kHz → 48kHz)
AnalyserNode setup for visualization
Audio routing and stream management
Proper cleanup on disconnect

Configuration API

Simple Configuration Builder

Use createVoiceLiveConfig to build your configuration:

import { useVoiceLive, createVoiceLiveConfig } from '@iloveagents/azure-voice-live-react';

const config = createVoiceLiveConfig({
  connection: {
    resourceName: 'your-resource-name',
    apiKey: process.env.AZURE_VOICE_LIVE_KEY,
  },
  session: {
    instructions: 'You are a helpful assistant.',
    voice: 'en-US-Ava:DragonHDLatestNeural',
  },
});

const { videoStream, connect } = useVoiceLive(config);

Fluent Helper Functions

Build custom configurations using composable helper functions:

import {
  useVoiceLive,
  withHDVoice,
  withSemanticVAD,
  withEchoCancellation,
  withDeepNoiseReduction,
  compose
} from '@iloveagents/azure-voice-live-react';

// Compose multiple configuration helpers
const enhanceAudio = compose(
  withEchoCancellation,
  withDeepNoiseReduction,
  (config) => withSemanticVAD({
    threshold: 0.5,
    removeFillerWords: true,
    interruptResponse: true,
  }, config),
  (config) => withHDVoice('en-US-Ava:DragonHDLatestNeural', {
    temperature: 0.9,
    rate: '1.1'
  }, config)
);

const { videoStream, connect } = useVoiceLive({
  connection: {
    resourceName: 'your-resource-name',
    apiKey: process.env.AZURE_VOICE_LIVE_KEY,
  },
  session: enhanceAudio({
    instructions: 'You are a helpful assistant.',
  }),
});

Available Helper Functions

Voice Configuration:

withVoice(voice, config) - Configure voice (string or VoiceConfig)
withHDVoice(name, options, config) - Configure HD voice with temperature/rate control
withCustomVoice(name, config) - Configure custom trained voice

Avatar Configuration:

withAvatar(character, style, options, config) - Configure avatar character and style
withTransparentBackground(config, options?) - Enable transparent background with chroma key (default green, customizable)
withBackgroundImage(url, config) - Add custom background image
withAvatarCrop(crop, config) - Configure video cropping for portrait mode

Turn Detection:

withSemanticVAD(options, config) - Azure Semantic VAD (recommended)
withMultilingualVAD(languages, options, config) - Multi-language semantic VAD
withEndOfUtterance(options, config) - Advanced end-of-utterance detection
withoutTurnDetection(config) - Disable automatic turn detection (manual mode)

Audio Enhancements:

withEchoCancellation(config) - Enable server-side echo cancellation
withoutEchoCancellation(config) - Disable echo cancellation
withDeepNoiseReduction(config) - Azure deep noise suppression
withNearFieldNoiseReduction(config) - Near-field noise reduction
withoutNoiseReduction(config) - Disable noise reduction
withSampleRate(rate, config) - Set sample rate (16000 or 24000 Hz)

Output Features:

withViseme(config) - Enable viseme data for lip-sync animation
withWordTimestamps(config) - Enable word-level audio timestamps
withTranscription(options, config) - Enable input audio transcription
withoutTranscription(config) - Disable transcription

Function Calling:

withTools(tools, config) - Add function tools
withToolChoice(choice, config) - Set tool choice behavior ('auto', 'none', 'required')

Composition:

compose(...fns) - Compose multiple configuration functions

API Reference

`useVoiceLive(config)` Hook

Main hook for Azure Voice Live API integration with integrated microphone capture.

Parameters:

interface UseVoiceLiveConfig {
  // Connection configuration
  connection: {
    resourceName: string;      // Azure AI Foundry resource name
    apiKey: string;            // Azure API key
    model?: string;            // Model name (default: 'gpt-realtime')
    apiVersion?: string;       // API version (default: '2025-10-01')
  };

  // Session configuration (optional)
  session?: VoiceLiveSessionConfig;

  // Auto-connect on mount (default: false)
  autoConnect?: boolean;

  // Microphone configuration
  autoStartMic?: boolean;                       // Auto-start mic when ready (default: true)
  audioSampleRate?: number;                     // Sample rate (default: 24000)
  audioConstraints?: MediaTrackConstraints | boolean; // Microphone selection

  // Event handler for all Voice Live events
  onEvent?: (event: VoiceLiveEvent) => void;

  // Tool executor for function calling
  toolExecutor?: (toolCall: ToolCall) => Promise<any>;
}

Returns:

interface UseVoiceLiveReturn {
  // Connection state
  connectionState: 'disconnected' | 'connecting' | 'connected';

  // Media streams
  videoStream: MediaStream | null;      // Avatar video stream (WebRTC)
  audioStream: MediaStream | null;      // Audio stream for playback

  // Audio visualization (voice-only mode)
  audioContext: AudioContext | null;    // Web Audio API context
  audioAnalyser: AnalyserNode | null;   // Pre-configured analyser for visualization

  // Microphone state and control
  isMicActive: boolean;                 // Whether microphone is capturing
  startMic: () => Promise<void>;        // Manually start microphone
  stopMic: () => void;                  // Manually stop microphone

  // Connection methods
  connect: () => Promise<void>;         // Establish connection
  disconnect: () => void;                // Close connection

  // Communication methods
  sendEvent: (event: any) => void;      // Send custom event to API
  updateSession: (config: Partial<VoiceLiveSessionConfig>) => void; // Update session

  // Advanced features
  isReady: boolean;                     // Whether session is ready for interaction
  error: string | null;                 // Error message if any
}

Microphone Control:

By default, the microphone automatically starts when the session is ready (autoStartMic: true). You can:

// Use default auto-start behavior (recommended)
const { connect, disconnect } = useVoiceLive(config);

// Manual control
const { connect, startMic, stopMic, isMicActive } = useVoiceLive({
  ...config,
  autoStartMic: false, // Disable auto-start
});

// Select specific microphone device
const { connect } = useVoiceLive({
  ...config,
  audioConstraints: { deviceId: 'specific-device-id' },
});

`VoiceLiveAvatar` Component

Component for rendering avatar video with optional chroma key compositing.

Props:

interface VoiceLiveAvatarProps {
  videoStream: MediaStream | null;

  // Chroma key settings
  enableChromaKey?: boolean;            // Enable green screen removal
  chromaKeyColor?: string;              // Key color (default: '#00FF00')
  chromaKeySimilarity?: number;         // Color similarity (0-1, default: 0.4)
  chromaKeySmoothness?: number;         // Edge smoothness (0-1, default: 0.1)

  // Styling
  className?: string;
  style?: React.CSSProperties;

  // Callbacks
  onVideoReady?: () => void;            // Called when video is ready
}

`useAudioCapture()` Hook

Note: Microphone capture is now integrated into useVoiceLive. You typically don't need this hook unless you're building custom audio processing pipelines.

Hook for standalone microphone audio capture with AudioWorklet processing.

Parameters:

interface AudioCaptureConfig {
  sampleRate?: number;                  // Sample rate (default: 24000)
  workletPath?: string;                 // Custom AudioWorklet path (optional)
  audioConstraints?: MediaTrackConstraints; // getUserMedia constraints
  onAudioData?: (data: ArrayBuffer) => void; // Audio data callback
  autoStart?: boolean;                  // Auto-start capture (default: false)
}

Returns:

interface AudioCaptureReturn {
  isCapturing: boolean;                 // Capture state
  startCapture: () => Promise<void>;    // Start capturing
  stopCapture: () => void;              // Stop capturing
  pauseCapture: () => void;             // Pause capture
  resumeCapture: () => void;            // Resume capture
}

Advanced Example (Custom Processing):

import { useAudioCapture } from '@iloveagents/azure-voice-live-react';

// Only needed for custom audio processing outside of Voice Live
const { startCapture, stopCapture, isCapturing } = useAudioCapture({
  sampleRate: 24000,
  onAudioData: (audioData) => {
    // Custom processing logic here
    processAudioData(audioData);
  },
});

Audio Helper Utilities

Convenience helpers for audio processing.

`arrayBufferToBase64()`

Low-level utility for converting ArrayBuffer to base64 string safely.

Usage:

import { arrayBufferToBase64 } from '@iloveagents/azure-voice-live-react';

const base64 = arrayBufferToBase64(audioData);
sendEvent({ type: 'input_audio_buffer.append', audio: base64 });

Note: Uses chunked conversion (32KB chunks) to avoid stack overflow from spread operator.

When to use: This is only needed for advanced use cases where you're manually processing audio data. For standard usage, microphone capture is integrated into useVoiceLive and handles this automatically.

Session Configuration

The session parameter supports all Azure Voice Live API options:

interface VoiceLiveSessionConfig {
  // System instructions
  instructions?: string;

  // Model parameters
  temperature?: number;                  // Response creativity (0-1)
  maxResponseOutputTokens?: number;      // Maximum response length

  // Voice configuration
  voice?: string | VoiceConfig;

  // Turn detection
  turnDetection?: TurnDetectionConfig | null;

  // Audio enhancements
  inputAudioEchoCancellation?: EchoCancellationConfig | null;
  inputAudioNoiseReduction?: NoiseReductionConfig | null;
  inputAudioSamplingRate?: 16000 | 24000;

  // Avatar configuration
  avatar?: AvatarConfig;

  // Function calling
  tools?: ToolDefinition[];
  toolChoice?: 'auto' | 'none' | 'required';

  // Output configuration
  animation?: AnimationConfig;           // Viseme output
  outputAudioTimestampTypes?: TimestampType[]; // Word timestamps

  // Input transcription
  inputAudioTranscription?: TranscriptionConfig | null;

  // Additional parameters...
}

For complete type definitions, see the TypeScript types included with the package.

Advanced Examples

Avatar with Transparent Background

import {
  useVoiceLive,
  VoiceLiveAvatar,
  withAvatar,
  withTransparentBackground,
  compose
} from '@iloveagents/azure-voice-live-react';

const configureAvatar = compose(
  (config) => withAvatar('lisa', 'casual-standing', {
    resolution: { width: 1920, height: 1080 },
    bitrate: 2000000,
  }, config),
  withTransparentBackground  // No color needed - uses default green
);

function AvatarApp() {
  const { videoStream, connect } = useVoiceLive({
    connection: {
      resourceName: process.env.AZURE_RESOURCE_NAME,
      apiKey: process.env.AZURE_API_KEY,
    },
    session: configureAvatar({
      instructions: 'You are a helpful assistant.',
    }),
  });

  return <VoiceLiveAvatar videoStream={videoStream} enableChromaKey />;
}

Function Calling

import { useVoiceLive, withTools } from '@iloveagents/azure-voice-live-react';

const weatherTool = {
  type: 'function',
  name: 'get_weather',
  description: 'Get current weather for a location',
  parameters: {
    type: 'object',
    properties: {
      location: {
        type: 'string',
        description: 'City name or zip code'
      }
    },
    required: ['location']
  }
};

function WeatherAssistant() {
  const executeTool = async (toolCall: ToolCall) => {
    if (toolCall.name === 'get_weather') {
      const { location } = toolCall.arguments;
      // Fetch weather data
      const weather = await fetchWeather(location);
      return weather;
    }
  };

  const { videoStream, connect } = useVoiceLive({
    connection: {
      resourceName: process.env.AZURE_RESOURCE_NAME,
      apiKey: process.env.AZURE_API_KEY,
    },
    session: withTools([weatherTool], {
      instructions: 'You are a weather assistant with access to real-time weather data.',
    }),
    toolExecutor: executeTool,
  });

  return <VoiceLiveAvatar videoStream={videoStream} />;
}

Event Handling

import { useVoiceLive } from '@iloveagents/azure-voice-live-react';

function EventMonitor() {
  const handleEvent = (event: VoiceLiveEvent) => {
    switch (event.type) {
      case 'session.created':
        console.log('Session established');
        break;

      case 'conversation.item.input_audio_transcription.completed':
        console.log('User said:', event.transcript);
        break;

      case 'response.audio_transcript.delta':
        console.log('Assistant saying:', event.delta);
        break;

      case 'response.done':
        console.log('Response complete');
        break;

      case 'error':
        console.error('Error occurred:', event.error);
        break;
    }
  };

  const { videoStream, connect } = useVoiceLive({
    connection: {
      resourceName: process.env.AZURE_RESOURCE_NAME,
      apiKey: process.env.AZURE_API_KEY,
    },
    session: {
      instructions: 'You are a helpful assistant.',
    },
    onEvent: handleEvent,
  });

  return <VoiceLiveAvatar videoStream={videoStream} />;
}

1. Use Recommended Defaults

The library defaults to optimal settings:

Model: gpt-realtime
Turn Detection: azure_semantic_vad (most reliable)
Sample Rate: 24000 Hz (best quality)
Echo Cancellation: Enabled
Noise Suppression: Enabled
Auto-start Mic: Enabled (no manual coordination needed)

2. Enable Audio Enhancements

For production deployments, always enable audio enhancements:

const enhanceAudio = compose(
  withEchoCancellation,
  withDeepNoiseReduction,
  (config) => withSemanticVAD({ threshold: 0.5 }, config)
);

3. Use Azure Semantic VAD

Azure Semantic VAD provides superior turn detection compared to simple volume-based detection:

withSemanticVAD({
  threshold: 0.5,                  // Detection threshold
  removeFillerWords: true,         // Remove "um", "uh", etc.
  interruptResponse: true,         // Allow user interruptions
  endOfUtteranceDetection: {       // Advanced end-of-speech detection
    model: 'semantic_detection_v1',
    thresholdLevel: 'medium',
    timeoutMs: 1000,
  }
})

4. Handle Errors Properly

Implement robust error handling:

onEvent: (event) => {
  if (event.type === 'error') {
    console.error('Voice Live error:', event.error);
    // Implement retry logic or user notification
  }
}

5. Secure API Keys

Never expose API keys in client-side code:

// ❌ Bad - API key in code
const apiKey = 'your-api-key-here';

// ✅ Good - Use environment variables
const apiKey = process.env.AZURE_VOICE_LIVE_KEY;

// ✅ Better - Fetch from secure backend
const apiKey = await fetchApiKeyFromBackend();

Requirements

Peer Dependencies

React: ≥16.8.0 (Hooks support required)
React DOM: ≥16.8.0

Browser Requirements

Modern browser with WebRTC support
WebAudio API support
AudioWorklet support (for microphone capture)

Azure Requirements

Azure AI Foundry resource with Voice Live API enabled
Deployed voice-enabled model (gpt-realtime or gpt-realtime-mini)
Valid API key with appropriate permissions

Azure Setup

Create Azure AI Foundry Resource
- Navigate to Azure Portal
- Create new AI Foundry resource
- Note your resource name
Enable Voice Live API
- In your AI Foundry resource, enable Voice Live API
- Deploy a voice-enabled model (gpt-realtime recommended)
Get API Key
- Navigate to Keys and Endpoint
- Copy your API key
- Store securely (use environment variables)

For detailed setup instructions, see Azure Voice Live Documentation.

Examples & Playground

The library includes a comprehensive playground with working examples for all features:

Voice Examples

Voice Chat - Simple - Basic voice-only implementation
Voice Chat - Advanced - Full configuration with semantic VAD
Voice Chat - Secure Proxy - Backend proxy with API key
Voice Chat - Secure Proxy (MSAL) - User-level authentication

Avatar Examples

Avatar - Simple - Basic avatar with video
Avatar - Advanced - Chroma key + noise suppression
Avatar - Secure Proxy - Backend proxy with API key
Avatar - Secure Proxy (MSAL) - User-level authentication

Advanced Features

Function Calling - Tool/function integration
Audio Visualizer - Real-time audio visualization
Viseme Animation - Custom avatar lip-sync
Agent Service - Azure AI Foundry Agent integration

Running the Playground

# Install and start
npm install
npm run dev

Open http://localhost:3001 to explore all examples.

Contributing

Contributions are welcome! Please open an issue or pull request on GitHub.

License

Support

Issues: GitHub Issues
Discussions: GitHub Discussions
Documentation: API Reference

Acknowledgments

Built for Azure Voice Live API. For official Microsoft documentation, visit:

Built by iloveagents

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

@iloveagents/azure-voice-live-react

Overview

Installation

Security

Quick Start

Basic Implementation

Voice-Only with Visualization

Configuration API

Simple Configuration Builder

Fluent Helper Functions

Available Helper Functions

API Reference

useVoiceLive(config) Hook

VoiceLiveAvatar Component

useAudioCapture() Hook

Audio Helper Utilities

arrayBufferToBase64()

Session Configuration

Advanced Examples

Avatar with Transparent Background

Function Calling

Event Handling

1. Use Recommended Defaults

2. Enable Audio Enhancements

3. Use Azure Semantic VAD

4. Handle Errors Properly

5. Secure API Keys

Requirements

Peer Dependencies

Browser Requirements

Azure Requirements

Azure Setup

Examples & Playground

Voice Examples

Avatar Examples

Advanced Features

Running the Playground

Contributing

License

Support

Acknowledgments

`useVoiceLive(config)` Hook

`VoiceLiveAvatar` Component

`useAudioCapture()` Hook

`arrayBufferToBase64()`