@iloveagents/foundry-voice-live-react

v0.2.1

Published

a day ago

React library for Microsoft Foundry Voice Live API. Full TypeScript support with optimized presets and configuration helpers.

0High
0Medium
0Low

ltwlf

microsoft foundry voice-live azure react typescript voice-ai realtime-ai gpt-realtime speech-to-text text-to-speech avatar webrtc react-hooks

Foundry Voice Live React SDK

React hooks and components for Microsoft Foundry Voice Live API. Build real-time voice AI apps with Azure video avatars, Live2D avatars, 3D avatars, audio visualizers, function calling, and TypeScript support.

Install

npm install @iloveagents/foundry-voice-live-react

Quick Start

Region Availability: The default model (gpt-realtime) is only available in East US 2 and Sweden Central regions. Make sure your Azure AI Foundry resource is deployed in one of these regions. See Microsoft docs for current availability.

Voice Only

import { useVoiceLive } from '@iloveagents/foundry-voice-live-react';

function App() {
  const { connect, disconnect, connectionState, audioStream } = useVoiceLive({
    connection: {
      resourceName: 'your-foundry-resource',  // Azure AI Foundry resource name
      apiKey: 'your-foundry-api-key',         // For dev only - see "Production" below
    },
    session: {
      instructions: 'You are a helpful assistant.',
    },
  });

  return (
    <>
      <p>Status: {connectionState}</p>
      <button onClick={connect} disabled={connectionState === 'connected'}>Start</button>
      <button onClick={disconnect} disabled={connectionState !== 'connected'}>Stop</button>
      <audio ref={el => { if (el && audioStream) el.srcObject = audioStream; }} autoPlay />
    </>
  );
}

With Avatar

import { useVoiceLive, VoiceLiveAvatar } from '@iloveagents/foundry-voice-live-react';

function App() {
  const { videoStream, audioStream, connect, disconnect } = useVoiceLive({
    connection: {
      resourceName: 'your-foundry-resource',
      apiKey: 'your-foundry-api-key',
    },
    session: {
      instructions: 'You are a helpful assistant.',
      voice: { name: 'en-US-AvaMultilingualNeural', type: 'azure-standard' },
      avatar: { character: 'lisa', style: 'casual-sitting' },
    },
  });

  return (
    <>
      <VoiceLiveAvatar videoStream={videoStream} audioStream={audioStream} />
      <button onClick={connect}>Start</button>
      <button onClick={disconnect}>Stop</button>
    </>
  );
}

Microphone starts automatically when connected. No manual audio setup needed.

Production

Never expose API keys in client-side code. Use a proxy server to secure your credentials.

1. Start the Proxy

# Docker (recommended)
docker run -p 8080:8080 \
  -e FOUNDRY_RESOURCE_NAME=your-foundry-resource \
  -e FOUNDRY_API_KEY="your-api-key" \
  -e ALLOWED_ORGINS="*" \
  ghcr.io/iloveagents/foundry-voice-live-proxy:latest

Or with npx:

FOUNDRY_RESOURCE_NAME=your-foundry-resource \
FOUNDRY_API_KEY="your-api-key" \
ALLOWED_ORGINS="*" \
npx @iloveagents/foundry-voice-live-proxy-node

2. Connect from Your App

import { useVoiceLive } from '@iloveagents/foundry-voice-live-react';

function App() {
  const { connect, disconnect, connectionState, audioStream } = useVoiceLive({
    connection: {
      proxyUrl: 'ws://localhost:8080/ws',  // Proxy handles auth
    },
    session: {
      instructions: 'You are a helpful assistant.',
    },
  });

  return (
    <>
      <p>Status: {connectionState}</p>
      <button onClick={connect}>Start</button>
      <button onClick={disconnect}>Stop</button>
      <audio ref={el => { if (el && audioStream) el.srcObject = audioStream; }} autoPlay />
    </>
  );
}

Authentication Options

API Key via Proxy — Backend holds the key, client uses proxyUrl
MSAL Token — Pass token in query string: proxyUrl + '?token=' + msalToken

See proxy package docs and proxy examples.

Configuration Helpers

Session Builder (Recommended)

Use the fluent sessionConfig() builder for clean, chainable configuration:

import { useVoiceLive, sessionConfig } from '@iloveagents/foundry-voice-live-react';

const config = sessionConfig()
  .instructions('You are a helpful assistant.')
  .hdVoice('en-US-Ava:DragonHDLatestNeural', { temperature: 0.8 })
  .avatar('lisa', 'casual-sitting', { codec: 'h264' })
  .semanticVAD({ multilingual: true, interruptResponse: true })
  .echoCancellation()
  .noiseReduction()
  .build();

const { videoStream, audioStream } = useVoiceLive({
  connection: { resourceName: 'your-foundry-resource', apiKey: 'your-key' },
  session: config,
});

Builder Methods

| Method | Description | | ------ | ----------- | | .instructions(text) | Set system prompt | | .voice(name) | Set voice by name | | .hdVoice(name, { temperature?, rate? }) | Set HD voice with options | | .customVoice(name) | Set custom voice | | .avatar(character, style, options?) | Configure avatar | | .transparentBackground() | Enable chroma key background | | .backgroundImage(url) | Set avatar background image | | .semanticVAD(options?) | Configure turn detection (use { multilingual: true } for 10-language support) | | .endOfUtterance(options?) | Add end-of-utterance detection | | .noTurnDetection() | Disable turn detection (manual mode) | | .echoCancellation() | Enable server echo cancellation | | .noiseReduction(type?) | Enable noise reduction ('deep' or 'nearField') | | .transcription(options?) | Configure input transcription | | .viseme() | Enable viseme output (lip-sync) | | .wordTimestamps() | Enable word timestamps | | .tools(tools) | Add function tools | | .toolChoice(choice) | Set tool choice mode | | .build() | Build the final config |

Transcription with Phrase Lists

Improve speech recognition accuracy for specific terms:

const config = sessionConfig()
  .transcription({
    model: 'azure-speech',
    language: 'en',
    phraseList: ['Neo QLED TV', 'TUF Gaming', 'AutoQuote Explorer'],
  })
  .build();

Note: phraseList and customSpeech require model: 'azure-speech' and don't work with gpt-realtime models.

Function Calling

Define tools the AI can call, then handle execution and send results back:

import { useRef, useCallback } from 'react';
import { useVoiceLive } from '@iloveagents/foundry-voice-live-react';

const sendEventRef = useRef<(event: any) => void>(() => {});

const toolExecutor = useCallback((name: string, args: string, callId: string) => {
  const parsedArgs = JSON.parse(args);
  let result = {};

  if (name === 'get_weather') {
    result = { temperature: '72°F', location: parsedArgs.location };
  }

  // Send result back to the API
  sendEventRef.current({
    type: 'conversation.item.create',
    item: { type: 'function_call_output', call_id: callId, output: JSON.stringify(result) },
  });
  sendEventRef.current({ type: 'response.create' });
}, []);

const { connect, sendEvent } = useVoiceLive({
  connection: { resourceName: 'your-foundry-resource', apiKey: 'your-key' },
  session: {
    instructions: 'You can check the weather.',
    tools: [{
      type: 'function',
      name: 'get_weather',
      description: 'Get weather for a location',
      parameters: { type: 'object', properties: { location: { type: 'string' } }, required: ['location'] },
    }],
    toolChoice: 'auto',
  },
  toolExecutor,
});

sendEventRef.current = sendEvent;

Event Handling

const { connect } = useVoiceLive({
  connection: { resourceName: 'your-foundry-resource', apiKey: 'your-key' },
  onEvent: (event) => {
    switch (event.type) {
      case 'session.created':
        console.log('Connected');
        break;
      case 'conversation.item.input_audio_transcription.completed':
        console.log('User:', event.transcript);
        break;
      case 'response.audio_transcript.delta':
        console.log('AI:', event.delta);
        break;
      case 'error':
        console.error(event.error);
        break;
    }
  },
});

API

`useVoiceLive(config)`

Returns:

{
  connectionState: 'disconnected' | 'connecting' | 'connected';
  videoStream: MediaStream | null;    // Avatar video
  audioStream: MediaStream | null;    // Audio playback
  audioAnalyser: AnalyserNode | null; // For visualization
  isMicActive: boolean;
  connect: () => Promise<void>;
  disconnect: () => void;
  sendEvent: (event: any) => void;
  updateSession: (config) => void;
  error: string | null;
}

`VoiceLiveAvatar`

<VoiceLiveAvatar
  videoStream={videoStream}        // Required: video from useVoiceLive
  audioStream={audioStream}        // Required: audio from useVoiceLive
  enableChromaKey={true}           // Remove green background
  chromaKeyColor="#00FF00"         // Key color
  chromaKeySimilarity={0.4}        // Color match threshold
  chromaKeySmoothness={0.1}        // Edge smoothness
  loadingMessage="Loading..."      // Shown before video starts
/>

Examples

Working examples for all features:

| Example | Description | | ----------------------------------------------------------------------------------------------------------------------- | -------------------- | | Voice Basic | Minimal voice chat | | Voice Advanced | VAD, noise reduction | | Voice Proxy | Secure proxy pattern | | Voice MSAL | Entra ID auth | | Avatar Basic | Avatar video | | Avatar Advanced | Chroma key, 1080p | | Function Calling | Tool integration | | Audio Visualizer | Waveform display | | Viseme | Lip-sync data | | Live2D Avatar | Live2D integration | | 3D Avatar | React Three Fiber | | Agent Service | Foundry Agent |

Run examples locally:

git clone https://github.com/iLoveAgents/foundry-voice-live
cd foundry-voice-live
just install
just dev          # Opens at http://localhost:3001

Proxy Package - Secure WebSocket proxy for production
Voice Live API Docs - Microsoft documentation
Examples - Full working examples
iLoveAgents Blog - Guides for Microsoft Foundry & Agent Framework

Support

If this library made your life easier, a coffee is a simple way to say thanks ☕ It directly supports maintenance and future features.

License

MIT - iLoveAgents