@tamasha/voice-agent-sdk

v1.0.0

Published

a day ago

Agent-aware React SDK for Azure OpenAI voice agents with built-in agent management

0High
0Medium
0Low

sde2-codes

nomercy0-aashish

voice agent azure openai realtime webrtc react user-insights agent-aware conversation-ai

Voice Agent SDK

The @tamasha/voice-agent-sdk is an Agent-Aware React SDK designed to integrate Tamasha's voice agent capabilities directly into your React applications. It handles WebRTC connections to Azure OpenAI, manages audio recording, and provides a simple hook-based API for interacting with specialized AI agents.

Installation

npm install @tamasha/voice-agent-sdk
# or
yarn add @tamasha/voice-agent-sdk

Quick Start

1. Wrap your app with `VoiceAgentProvider` (Optional but recommended for shared state)

While the hook can be used standalone, wrapping your app allows for potential future global state management.

import { VoiceAgentProvider } from '@tamasha/voice-agent-sdk';

function App() {
  return (
    <VoiceAgentProvider>
      <YourComponent />
    </VoiceAgentProvider>
  );
}

2. Use the `useVoiceAgent` Hook

The core of the SDK is the useVoiceAgent hook. It manages the entire lifecycle of the voice session.

import React, { useEffect } from 'react';
import { useVoiceAgent } from '@tamasha/voice-agent-sdk';

const VoiceAssistant = () => {
  const {
    status,        // 'idle' | 'connecting' | 'connected' | 'speaking' | 'listening' | 'processing' | 'ended'
    isConnected,   // boolean
    isAiSpeaking,  // boolean
    duration,      // number (seconds)
    audioRef,      // React Ref for the audio element
    start,         // Function to start the session
    stop,          // Function to stop the session
    error          // Error object if something goes wrong
  } = useVoiceAgent({
    // Required Configuration
    agentType: 'user-insights',  // The type of agent to load
    ephemeralKey: 'YOUR_EPHEMERAL_KEY', // From your backend
    token: 'YOUR_USER_TOKEN',           // Auth token
    workspaceId: 'YOUR_WORKSPACE_ID',
    playerId: 'CURRENT_USER_ID',
    sessionId: 'UNIQUE_SESSION_ID',     // Or auto-generated by backend
    appVersion: '1.0.0',
    environment: 'development',         // 'development' | 'staging' | 'production'
    
    // Optional
    name: 'User Name',                  // For personalized greetings
    language: 'english',                // Preferred language
    webhook: {                          // Webhook for session completion events
      url: 'https://your-api.com/webhook',
      events: ['call.ended']
    }
  });

  return (
    <div>
      <h2>Status: {status}</h2>
      <p>Duration: {duration}s</p>
      
      {/* Hidden audio element for playback */}
      <audio ref={audioRef} autoPlay />

      <div style={{ gap: '10px', display: 'flex' }}>
        <button onClick={start} disabled={isConnected}>
          Start Conversation
        </button>
        
        <button onClick={stop} disabled={!isConnected}>
          End Conversation
        </button>
      </div>

      {error && <p style={{ color: 'red' }}>Error: {error.message}</p>}
    </div>
  );
};

Available Agents

`user-insights`

An agent designed to collect user profile information through natural conversation. It intelligently gathers data across multiple categories:

Personal Info: Name, age, occupation, etc.
Interests: Hobbies and social context.
Preferences: Host matching preferences.
Experience: Usage of similar apps.
Safety: Monitors for risk flags (NSFW, etc.).

Configuration:

Set agentType: 'user-insights' in the useVoiceAgent config.

API Reference

`VoiceAgentConfig`

| Property | Type | Required | Description | |----------|------|----------|-------------| | agentType | string | Yes | Identifier for the agent behavior (e.g., 'user-insights'). | | ephemeralKey | string | Yes | Temporary key for Azure OpenAI connection. | | token | string | Yes | Authentication token for backend API calls. | | workspaceId | string | Yes | Your workspace identifier. | | playerId | string | Yes | Unique ID of the current user. | | sessionId | string | Yes | Unique session identifier for this call. | | appVersion | string | Yes | Version of your application. | | environment | 'development' \| 'staging' \| 'production' | Yes | Target backend environment. | | name | string | No | User's name for personalized interactions. | | language | string | No | Preferred language for the agent. | | webhook | WebhookConfig | No | Configuration for server-side event webhooks. |

`VoiceAgentReturn`

The object returned by useVoiceAgent:

status: Current state of the agent.
isConnected: Boolean helper for connection state.
isAiSpeaking: Boolean indicating if the agent is currently talking.
duration: Duration of the current session in seconds.
sessionId: The active session ID.
error: Any error encountered during the session.
start(): Async function to initialize connection and recording.
stop(): Async function to end session, upload recording, and disconnect.
audioRef: React RefObject<HTMLAudioElement> that must be attached to an <audio> element in your UI for sound to work.

How It Works

Initialization: The hook initializes WebRTC manager and Tool Executor based on the agentType.
Connection: start() establishes a WebRTC connection to Azure OpenAI using the ephemeralKey.
Conversation:
- Audio is streamed bi-directionally.
- The AI uses "Tools" (function calling) to save data to your backend (/voice-api/user-insights).
- The SDK handles all tool execution automatically.
Completion:
- stop() or the AI's end_call tool triggers the shutdown sequence.
- The recording is stopped and uploaded.
- A final summary event is sent to the backend.

Development

To build the SDK locally:

npm install
npm run build

The output will be in the dist/ folder.

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme