@tamasha/voice-agent-sdk
v1.0.0
Published
Agent-aware React SDK for Azure OpenAI voice agents with built-in agent management
Readme
Voice Agent SDK
The @tamasha/voice-agent-sdk is an Agent-Aware React SDK designed to integrate Tamasha's voice agent capabilities directly into your React applications. It handles WebRTC connections to Azure OpenAI, manages audio recording, and provides a simple hook-based API for interacting with specialized AI agents.
Installation
npm install @tamasha/voice-agent-sdk
# or
yarn add @tamasha/voice-agent-sdkQuick Start
1. Wrap your app with VoiceAgentProvider (Optional but recommended for shared state)
While the hook can be used standalone, wrapping your app allows for potential future global state management.
import { VoiceAgentProvider } from '@tamasha/voice-agent-sdk';
function App() {
return (
<VoiceAgentProvider>
<YourComponent />
</VoiceAgentProvider>
);
}2. Use the useVoiceAgent Hook
The core of the SDK is the useVoiceAgent hook. It manages the entire lifecycle of the voice session.
import React, { useEffect } from 'react';
import { useVoiceAgent } from '@tamasha/voice-agent-sdk';
const VoiceAssistant = () => {
const {
status, // 'idle' | 'connecting' | 'connected' | 'speaking' | 'listening' | 'processing' | 'ended'
isConnected, // boolean
isAiSpeaking, // boolean
duration, // number (seconds)
audioRef, // React Ref for the audio element
start, // Function to start the session
stop, // Function to stop the session
error // Error object if something goes wrong
} = useVoiceAgent({
// Required Configuration
agentType: 'user-insights', // The type of agent to load
ephemeralKey: 'YOUR_EPHEMERAL_KEY', // From your backend
token: 'YOUR_USER_TOKEN', // Auth token
workspaceId: 'YOUR_WORKSPACE_ID',
playerId: 'CURRENT_USER_ID',
sessionId: 'UNIQUE_SESSION_ID', // Or auto-generated by backend
appVersion: '1.0.0',
environment: 'development', // 'development' | 'staging' | 'production'
// Optional
name: 'User Name', // For personalized greetings
language: 'english', // Preferred language
webhook: { // Webhook for session completion events
url: 'https://your-api.com/webhook',
events: ['call.ended']
}
});
return (
<div>
<h2>Status: {status}</h2>
<p>Duration: {duration}s</p>
{/* Hidden audio element for playback */}
<audio ref={audioRef} autoPlay />
<div style={{ gap: '10px', display: 'flex' }}>
<button onClick={start} disabled={isConnected}>
Start Conversation
</button>
<button onClick={stop} disabled={!isConnected}>
End Conversation
</button>
</div>
{error && <p style={{ color: 'red' }}>Error: {error.message}</p>}
</div>
);
};Available Agents
user-insights
An agent designed to collect user profile information through natural conversation. It intelligently gathers data across multiple categories:
- Personal Info: Name, age, occupation, etc.
- Interests: Hobbies and social context.
- Preferences: Host matching preferences.
- Experience: Usage of similar apps.
- Safety: Monitors for risk flags (NSFW, etc.).
Configuration:
- Set
agentType: 'user-insights'in theuseVoiceAgentconfig.
API Reference
VoiceAgentConfig
| Property | Type | Required | Description |
|----------|------|----------|-------------|
| agentType | string | Yes | Identifier for the agent behavior (e.g., 'user-insights'). |
| ephemeralKey | string | Yes | Temporary key for Azure OpenAI connection. |
| token | string | Yes | Authentication token for backend API calls. |
| workspaceId | string | Yes | Your workspace identifier. |
| playerId | string | Yes | Unique ID of the current user. |
| sessionId | string | Yes | Unique session identifier for this call. |
| appVersion | string | Yes | Version of your application. |
| environment | 'development' \| 'staging' \| 'production' | Yes | Target backend environment. |
| name | string | No | User's name for personalized interactions. |
| language | string | No | Preferred language for the agent. |
| webhook | WebhookConfig | No | Configuration for server-side event webhooks. |
VoiceAgentReturn
The object returned by useVoiceAgent:
status: Current state of the agent.isConnected: Boolean helper for connection state.isAiSpeaking: Boolean indicating if the agent is currently talking.duration: Duration of the current session in seconds.sessionId: The active session ID.error: Any error encountered during the session.start(): Async function to initialize connection and recording.stop(): Async function to end session, upload recording, and disconnect.audioRef: React RefObject<HTMLAudioElement>that must be attached to an<audio>element in your UI for sound to work.
How It Works
- Initialization: The hook initializes WebRTC manager and Tool Executor based on the
agentType. - Connection:
start()establishes a WebRTC connection to Azure OpenAI using theephemeralKey. - Conversation:
- Audio is streamed bi-directionally.
- The AI uses "Tools" (function calling) to save data to your backend (
/voice-api/user-insights). - The SDK handles all tool execution automatically.
- Completion:
stop()or the AI'send_calltool triggers the shutdown sequence.- The recording is stopped and uploaded.
- A final summary event is sent to the backend.
Development
To build the SDK locally:
npm install
npm run buildThe output will be in the dist/ folder.
