npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2025 – Pkg Stats / Ryan Hefner

@iloveagents/azure-voice-live-react

v0.1.1

Published

React library for Azure AI Foundry Voice Live API. Full TypeScript support with optimized presets and configuration helpers.

Readme

@iloveagents/azure-voice-live-react

npm version TypeScript License: MIT

A comprehensive, production-ready React library for Azure Voice Live API with complete feature coverage and TypeScript support.

Overview

Azure Voice Live enables real-time voice conversations with AI models through native audio streaming. This library provides a complete React implementation with full API coverage and a fluent configuration API.

Key Features:

  • Complete API Coverage - All Azure Voice Live parameters supported and typed
  • TypeScript First - Comprehensive type definitions with full IntelliSense support
  • Production Ready - Enterprise-grade code with proper error handling and validation
  • Fluent API - 25+ composable helper functions for streamlined configuration
  • React Hooks - Modern hooks-based architecture with integrated microphone capture
  • Zero Config Audio - Microphone auto-starts when session is ready (no manual coordination)
  • Avatar Support - Real-time avatar video with GPU-accelerated chroma key compositing
  • Audio Enhancements - Built-in echo cancellation, noise suppression, and semantic VAD
  • Function Calling - Complete tool support with async executor pattern
  • Zero Dependencies - No external runtime dependencies (React only)

Installation

npm install @iloveagents/azure-voice-live-react

Or using other package managers:

yarn add @iloveagents/azure-voice-live-react
pnpm add @iloveagents/azure-voice-live-react

Security

Important: Never commit API keys to version control.

For Development:

  • Use environment variables (.env files)
  • Add .env to .gitignore
  • Example: apiKey: process.env.VITE_AZURE_SPEECH_KEY

For Production:

  • Recommended: Use backend proxy with Microsoft Entra ID (MSAL) authentication
  • Use managed identities for Azure-hosted applications
  • Never expose API keys in client-side code

Quick Start

Basic Implementation

import { useVoiceLive, VoiceLiveAvatar } from '@iloveagents/azure-voice-live-react';

function VoiceAssistant() {
  // Microphone automatically starts when session is ready!
  const { videoStream, connect, disconnect, connectionState } = useVoiceLive({
    connection: {
      resourceName: 'your-azure-resource-name',
      apiKey: process.env.AZURE_VOICE_LIVE_KEY,
      model: 'gpt-realtime', // GPT-4o Realtime model (recommended)
    },
    session: {
      instructions: 'You are a helpful AI assistant.',
      voice: 'en-US-Ava:DragonHDLatestNeural',
    },
  });

  return (
    <div>
      <VoiceLiveAvatar videoStream={videoStream} />
      <button onClick={connect} disabled={connectionState === 'connected'}>
        Connect
      </button>
      <button onClick={disconnect} disabled={connectionState === 'disconnected'}>
        Disconnect
      </button>
      <p>Status: {connectionState}</p>
    </div>
  );
}

That's it! The microphone automatically:

  • Requests permissions when you call connect()
  • Starts capturing when the session is ready
  • Stops when you call disconnect()

No manual audio coordination needed!

Voice-Only with Visualization

For voice-only applications, the hook provides a pre-configured audioAnalyser for effortless visualization:

import { useRef, useEffect } from 'react';
import { useVoiceLive, createVoiceLiveConfig } from '@iloveagents/azure-voice-live-react';

function VoiceVisualizer() {
  const canvasRef = useRef<HTMLCanvasElement>(null);
  const audioRef = useRef<HTMLAudioElement>(null);

  const config = createVoiceLiveConfig({
    connection: {
      resourceName: 'your-resource-name',
      apiKey: process.env.AZURE_VOICE_LIVE_KEY,
    },
  });

  // Get pre-configured audio analyser - no manual setup needed!
  const { connect, disconnect, audioStream, audioAnalyser, connectionState } = useVoiceLive(config);

  // Connect audio stream for playback
  useEffect(() => {
    if (audioRef.current && audioStream) {
      audioRef.current.srcObject = audioStream;
      audioRef.current.play().catch(console.error);
    }
  }, [audioStream]);

  // Visualize audio using the pre-configured analyser
  useEffect(() => {
    if (!audioAnalyser || !canvasRef.current) return;

    const canvas = canvasRef.current;
    const ctx = canvas.getContext('2d');
    if (!ctx) return;

    const dataArray = new Uint8Array(audioAnalyser.frequencyBinCount);

    const draw = () => {
      requestAnimationFrame(draw);
      audioAnalyser.getByteFrequencyData(dataArray);

      // Your visualization logic here
      // No need to create AudioContext or AnalyserNode manually!
    };
    draw();
  }, [audioAnalyser]);

  return (
    <div>
      <canvas ref={canvasRef} width={800} height={200} />
      <audio ref={audioRef} autoPlay hidden />
      <button onClick={connect}>Start</button>
      <button onClick={disconnect}>Stop</button>
    </div>
  );
}

No audio complexity - the hook handles:

  • AudioContext creation and configuration (48kHz, low-latency)
  • Professional-grade Lanczos-3 resampling (24kHz → 48kHz)
  • AnalyserNode setup for visualization
  • Audio routing and stream management
  • Proper cleanup on disconnect

Configuration API

Simple Configuration Builder

Use createVoiceLiveConfig to build your configuration:

import { useVoiceLive, createVoiceLiveConfig } from '@iloveagents/azure-voice-live-react';

const config = createVoiceLiveConfig({
  connection: {
    resourceName: 'your-resource-name',
    apiKey: process.env.AZURE_VOICE_LIVE_KEY,
  },
  session: {
    instructions: 'You are a helpful assistant.',
    voice: 'en-US-Ava:DragonHDLatestNeural',
  },
});

const { videoStream, connect } = useVoiceLive(config);

Fluent Helper Functions

Build custom configurations using composable helper functions:

import {
  useVoiceLive,
  withHDVoice,
  withSemanticVAD,
  withEchoCancellation,
  withDeepNoiseReduction,
  compose
} from '@iloveagents/azure-voice-live-react';

// Compose multiple configuration helpers
const enhanceAudio = compose(
  withEchoCancellation,
  withDeepNoiseReduction,
  (config) => withSemanticVAD({
    threshold: 0.5,
    removeFillerWords: true,
    interruptResponse: true,
  }, config),
  (config) => withHDVoice('en-US-Ava:DragonHDLatestNeural', {
    temperature: 0.9,
    rate: '1.1'
  }, config)
);

const { videoStream, connect } = useVoiceLive({
  connection: {
    resourceName: 'your-resource-name',
    apiKey: process.env.AZURE_VOICE_LIVE_KEY,
  },
  session: enhanceAudio({
    instructions: 'You are a helpful assistant.',
  }),
});

Available Helper Functions

Voice Configuration:

  • withVoice(voice, config) - Configure voice (string or VoiceConfig)
  • withHDVoice(name, options, config) - Configure HD voice with temperature/rate control
  • withCustomVoice(name, config) - Configure custom trained voice

Avatar Configuration:

  • withAvatar(character, style, options, config) - Configure avatar character and style
  • withTransparentBackground(config, options?) - Enable transparent background with chroma key (default green, customizable)
  • withBackgroundImage(url, config) - Add custom background image
  • withAvatarCrop(crop, config) - Configure video cropping for portrait mode

Turn Detection:

  • withSemanticVAD(options, config) - Azure Semantic VAD (recommended)
  • withMultilingualVAD(languages, options, config) - Multi-language semantic VAD
  • withEndOfUtterance(options, config) - Advanced end-of-utterance detection
  • withoutTurnDetection(config) - Disable automatic turn detection (manual mode)

Audio Enhancements:

  • withEchoCancellation(config) - Enable server-side echo cancellation
  • withoutEchoCancellation(config) - Disable echo cancellation
  • withDeepNoiseReduction(config) - Azure deep noise suppression
  • withNearFieldNoiseReduction(config) - Near-field noise reduction
  • withoutNoiseReduction(config) - Disable noise reduction
  • withSampleRate(rate, config) - Set sample rate (16000 or 24000 Hz)

Output Features:

  • withViseme(config) - Enable viseme data for lip-sync animation
  • withWordTimestamps(config) - Enable word-level audio timestamps
  • withTranscription(options, config) - Enable input audio transcription
  • withoutTranscription(config) - Disable transcription

Function Calling:

  • withTools(tools, config) - Add function tools
  • withToolChoice(choice, config) - Set tool choice behavior ('auto', 'none', 'required')

Composition:

  • compose(...fns) - Compose multiple configuration functions

API Reference

useVoiceLive(config) Hook

Main hook for Azure Voice Live API integration with integrated microphone capture.

Parameters:

interface UseVoiceLiveConfig {
  // Connection configuration
  connection: {
    resourceName: string;      // Azure AI Foundry resource name
    apiKey: string;            // Azure API key
    model?: string;            // Model name (default: 'gpt-realtime')
    apiVersion?: string;       // API version (default: '2025-10-01')
  };

  // Session configuration (optional)
  session?: VoiceLiveSessionConfig;

  // Auto-connect on mount (default: false)
  autoConnect?: boolean;

  // Microphone configuration
  autoStartMic?: boolean;                       // Auto-start mic when ready (default: true)
  audioSampleRate?: number;                     // Sample rate (default: 24000)
  audioConstraints?: MediaTrackConstraints | boolean; // Microphone selection

  // Event handler for all Voice Live events
  onEvent?: (event: VoiceLiveEvent) => void;

  // Tool executor for function calling
  toolExecutor?: (toolCall: ToolCall) => Promise<any>;
}

Returns:

interface UseVoiceLiveReturn {
  // Connection state
  connectionState: 'disconnected' | 'connecting' | 'connected';

  // Media streams
  videoStream: MediaStream | null;      // Avatar video stream (WebRTC)
  audioStream: MediaStream | null;      // Audio stream for playback

  // Audio visualization (voice-only mode)
  audioContext: AudioContext | null;    // Web Audio API context
  audioAnalyser: AnalyserNode | null;   // Pre-configured analyser for visualization

  // Microphone state and control
  isMicActive: boolean;                 // Whether microphone is capturing
  startMic: () => Promise<void>;        // Manually start microphone
  stopMic: () => void;                  // Manually stop microphone

  // Connection methods
  connect: () => Promise<void>;         // Establish connection
  disconnect: () => void;                // Close connection

  // Communication methods
  sendEvent: (event: any) => void;      // Send custom event to API
  updateSession: (config: Partial<VoiceLiveSessionConfig>) => void; // Update session

  // Advanced features
  isReady: boolean;                     // Whether session is ready for interaction
  error: string | null;                 // Error message if any
}

Microphone Control:

By default, the microphone automatically starts when the session is ready (autoStartMic: true). You can:

// Use default auto-start behavior (recommended)
const { connect, disconnect } = useVoiceLive(config);

// Manual control
const { connect, startMic, stopMic, isMicActive } = useVoiceLive({
  ...config,
  autoStartMic: false, // Disable auto-start
});

// Select specific microphone device
const { connect } = useVoiceLive({
  ...config,
  audioConstraints: { deviceId: 'specific-device-id' },
});

VoiceLiveAvatar Component

Component for rendering avatar video with optional chroma key compositing.

Props:

interface VoiceLiveAvatarProps {
  videoStream: MediaStream | null;

  // Chroma key settings
  enableChromaKey?: boolean;            // Enable green screen removal
  chromaKeyColor?: string;              // Key color (default: '#00FF00')
  chromaKeySimilarity?: number;         // Color similarity (0-1, default: 0.4)
  chromaKeySmoothness?: number;         // Edge smoothness (0-1, default: 0.1)

  // Styling
  className?: string;
  style?: React.CSSProperties;

  // Callbacks
  onVideoReady?: () => void;            // Called when video is ready
}

useAudioCapture() Hook

Note: Microphone capture is now integrated into useVoiceLive. You typically don't need this hook unless you're building custom audio processing pipelines.

Hook for standalone microphone audio capture with AudioWorklet processing.

Parameters:

interface AudioCaptureConfig {
  sampleRate?: number;                  // Sample rate (default: 24000)
  workletPath?: string;                 // Custom AudioWorklet path (optional)
  audioConstraints?: MediaTrackConstraints; // getUserMedia constraints
  onAudioData?: (data: ArrayBuffer) => void; // Audio data callback
  autoStart?: boolean;                  // Auto-start capture (default: false)
}

Returns:

interface AudioCaptureReturn {
  isCapturing: boolean;                 // Capture state
  startCapture: () => Promise<void>;    // Start capturing
  stopCapture: () => void;              // Stop capturing
  pauseCapture: () => void;             // Pause capture
  resumeCapture: () => void;            // Resume capture
}

Advanced Example (Custom Processing):

import { useAudioCapture } from '@iloveagents/azure-voice-live-react';

// Only needed for custom audio processing outside of Voice Live
const { startCapture, stopCapture, isCapturing } = useAudioCapture({
  sampleRate: 24000,
  onAudioData: (audioData) => {
    // Custom processing logic here
    processAudioData(audioData);
  },
});

Audio Helper Utilities

Convenience helpers for audio processing.

arrayBufferToBase64()

Low-level utility for converting ArrayBuffer to base64 string safely.

Usage:

import { arrayBufferToBase64 } from '@iloveagents/azure-voice-live-react';

const base64 = arrayBufferToBase64(audioData);
sendEvent({ type: 'input_audio_buffer.append', audio: base64 });

Note: Uses chunked conversion (32KB chunks) to avoid stack overflow from spread operator.

When to use: This is only needed for advanced use cases where you're manually processing audio data. For standard usage, microphone capture is integrated into useVoiceLive and handles this automatically.

Session Configuration

The session parameter supports all Azure Voice Live API options:

interface VoiceLiveSessionConfig {
  // System instructions
  instructions?: string;

  // Model parameters
  temperature?: number;                  // Response creativity (0-1)
  maxResponseOutputTokens?: number;      // Maximum response length

  // Voice configuration
  voice?: string | VoiceConfig;

  // Turn detection
  turnDetection?: TurnDetectionConfig | null;

  // Audio enhancements
  inputAudioEchoCancellation?: EchoCancellationConfig | null;
  inputAudioNoiseReduction?: NoiseReductionConfig | null;
  inputAudioSamplingRate?: 16000 | 24000;

  // Avatar configuration
  avatar?: AvatarConfig;

  // Function calling
  tools?: ToolDefinition[];
  toolChoice?: 'auto' | 'none' | 'required';

  // Output configuration
  animation?: AnimationConfig;           // Viseme output
  outputAudioTimestampTypes?: TimestampType[]; // Word timestamps

  // Input transcription
  inputAudioTranscription?: TranscriptionConfig | null;

  // Additional parameters...
}

For complete type definitions, see the TypeScript types included with the package.

Advanced Examples

Avatar with Transparent Background

import {
  useVoiceLive,
  VoiceLiveAvatar,
  withAvatar,
  withTransparentBackground,
  compose
} from '@iloveagents/azure-voice-live-react';

const configureAvatar = compose(
  (config) => withAvatar('lisa', 'casual-standing', {
    resolution: { width: 1920, height: 1080 },
    bitrate: 2000000,
  }, config),
  withTransparentBackground  // No color needed - uses default green
);

function AvatarApp() {
  const { videoStream, connect } = useVoiceLive({
    connection: {
      resourceName: process.env.AZURE_RESOURCE_NAME,
      apiKey: process.env.AZURE_API_KEY,
    },
    session: configureAvatar({
      instructions: 'You are a helpful assistant.',
    }),
  });

  return <VoiceLiveAvatar videoStream={videoStream} enableChromaKey />;
}

Function Calling

import { useVoiceLive, withTools } from '@iloveagents/azure-voice-live-react';

const weatherTool = {
  type: 'function',
  name: 'get_weather',
  description: 'Get current weather for a location',
  parameters: {
    type: 'object',
    properties: {
      location: {
        type: 'string',
        description: 'City name or zip code'
      }
    },
    required: ['location']
  }
};

function WeatherAssistant() {
  const executeTool = async (toolCall: ToolCall) => {
    if (toolCall.name === 'get_weather') {
      const { location } = toolCall.arguments;
      // Fetch weather data
      const weather = await fetchWeather(location);
      return weather;
    }
  };

  const { videoStream, connect } = useVoiceLive({
    connection: {
      resourceName: process.env.AZURE_RESOURCE_NAME,
      apiKey: process.env.AZURE_API_KEY,
    },
    session: withTools([weatherTool], {
      instructions: 'You are a weather assistant with access to real-time weather data.',
    }),
    toolExecutor: executeTool,
  });

  return <VoiceLiveAvatar videoStream={videoStream} />;
}

Event Handling

import { useVoiceLive } from '@iloveagents/azure-voice-live-react';

function EventMonitor() {
  const handleEvent = (event: VoiceLiveEvent) => {
    switch (event.type) {
      case 'session.created':
        console.log('Session established');
        break;

      case 'conversation.item.input_audio_transcription.completed':
        console.log('User said:', event.transcript);
        break;

      case 'response.audio_transcript.delta':
        console.log('Assistant saying:', event.delta);
        break;

      case 'response.done':
        console.log('Response complete');
        break;

      case 'error':
        console.error('Error occurred:', event.error);
        break;
    }
  };

  const { videoStream, connect } = useVoiceLive({
    connection: {
      resourceName: process.env.AZURE_RESOURCE_NAME,
      apiKey: process.env.AZURE_API_KEY,
    },
    session: {
      instructions: 'You are a helpful assistant.',
    },
    onEvent: handleEvent,
  });

  return <VoiceLiveAvatar videoStream={videoStream} />;
}

1. Use Recommended Defaults

The library defaults to optimal settings:

  • Model: gpt-realtime
  • Turn Detection: azure_semantic_vad (most reliable)
  • Sample Rate: 24000 Hz (best quality)
  • Echo Cancellation: Enabled
  • Noise Suppression: Enabled
  • Auto-start Mic: Enabled (no manual coordination needed)

2. Enable Audio Enhancements

For production deployments, always enable audio enhancements:

const enhanceAudio = compose(
  withEchoCancellation,
  withDeepNoiseReduction,
  (config) => withSemanticVAD({ threshold: 0.5 }, config)
);

3. Use Azure Semantic VAD

Azure Semantic VAD provides superior turn detection compared to simple volume-based detection:

withSemanticVAD({
  threshold: 0.5,                  // Detection threshold
  removeFillerWords: true,         // Remove "um", "uh", etc.
  interruptResponse: true,         // Allow user interruptions
  endOfUtteranceDetection: {       // Advanced end-of-speech detection
    model: 'semantic_detection_v1',
    thresholdLevel: 'medium',
    timeoutMs: 1000,
  }
})

4. Handle Errors Properly

Implement robust error handling:

onEvent: (event) => {
  if (event.type === 'error') {
    console.error('Voice Live error:', event.error);
    // Implement retry logic or user notification
  }
}

5. Secure API Keys

Never expose API keys in client-side code:

// ❌ Bad - API key in code
const apiKey = 'your-api-key-here';

// ✅ Good - Use environment variables
const apiKey = process.env.AZURE_VOICE_LIVE_KEY;

// ✅ Better - Fetch from secure backend
const apiKey = await fetchApiKeyFromBackend();

Requirements

Peer Dependencies

  • React: ≥16.8.0 (Hooks support required)
  • React DOM: ≥16.8.0

Browser Requirements

  • Modern browser with WebRTC support
  • WebAudio API support
  • AudioWorklet support (for microphone capture)

Azure Requirements

  • Azure AI Foundry resource with Voice Live API enabled
  • Deployed voice-enabled model (gpt-realtime or gpt-realtime-mini)
  • Valid API key with appropriate permissions

Azure Setup

  1. Create Azure AI Foundry Resource

    • Navigate to Azure Portal
    • Create new AI Foundry resource
    • Note your resource name
  2. Enable Voice Live API

    • In your AI Foundry resource, enable Voice Live API
    • Deploy a voice-enabled model (gpt-realtime recommended)
  3. Get API Key

    • Navigate to Keys and Endpoint
    • Copy your API key
    • Store securely (use environment variables)

For detailed setup instructions, see Azure Voice Live Documentation.

Examples & Playground

The library includes a comprehensive playground with working examples for all features:

Voice Examples

Avatar Examples

Advanced Features

Running the Playground

# Install and start
npm install
npm run dev

Open http://localhost:3001 to explore all examples.

Contributing

Contributions are welcome! Please open an issue or pull request on GitHub.

License

MIT © iloveagents

Support

Acknowledgments

Built for Azure Voice Live API. For official Microsoft documentation, visit:


Built by iloveagents