@voxdiscover/voiceserver

v0.1.0

Published

12 days ago

Framework-agnostic TypeScript SDK for Voxdiscover voice agents

Downloads

0High
0Medium
0Low

max_voxdiscover

voice ai sdk webrtc daily voxdiscover voice-agent

@voxdiscover/voiceserver

Framework-agnostic TypeScript SDK for Voice_server voice agents. Provides session token authentication, Daily.js WebRTC integration, and typed events for building voice-enabled applications.

Installation

npm install @voxdiscover/voiceserver @daily-co/daily-js
# or
pnpm add @voxdiscover/voiceserver @daily-co/daily-js
# or
yarn add @voxdiscover/voiceserver @daily-co/daily-js

Note: @daily-co/daily-js is a peer dependency and must be installed separately.

Quick Start

1. Obtain Session Token

First, obtain a session token from your backend (which calls Voice_server's session API):

// Your backend endpoint
const response = await fetch('https://voiceserver.voxdiscover.com/api/voice-session', {
  method: 'POST',
  headers: { 'Content-Type': 'application/json' },
  body: JSON.stringify({ userId: 'user_123' }),
});

const { token } = await response.json();

2. Initialize SDK

import { VoiceAgent } from '@voxdiscover/voiceserver';

const agent = new VoiceAgent({ token });

3. Subscribe to Events

// Connection state changes
agent.on('connection:state', (state) => {
  console.log('Connection state:', state);
  // states: 'connecting' | 'connected' | 'reconnecting' | 'disconnected' | 'failed'
});

// Transcripts (streaming)
agent.on('transcript:interim', ({ text, speaker }) => {
  console.log(`[interim] ${speaker}: ${text}`);
});

agent.on('transcript:final', ({ text, speaker }) => {
  console.log(`[final] ${speaker}: ${text}`);
});

// Errors
agent.on('connection:error', (error) => {
  console.error('Connection error:', error.message);
  if (error.context?.suggestion) {
    console.log('Suggestion:', error.context.suggestion);
  }
});

4. Connect to Voice Agent

try {
  await agent.connect();
  console.log('Connected! State:', agent.state);
} catch (error) {
  console.error('Failed to connect:', error);
}

5. Control Audio

// Mute microphone
agent.mute();

// Unmute microphone
agent.unmute();

6. Disconnect

await agent.disconnect();

API Reference

`VoiceAgent`

Main SDK class for managing voice conversations.

Constructor

new VoiceAgent(config: VoiceAgentConfig)

Config options:

token (required): Session token from backend
baseUrl (optional): Backend base URL for validation (default: https://voiceserver.voxdiscover.com)
reconnection (optional):
- enabled (default: true): Enable automatic reconnection
- maxAttempts (default: 5): Max reconnection attempts

Properties

state: Current connection state (read-only)
- 'connecting' - Establishing connection
- 'connected' - Successfully connected
- 'reconnecting' - Attempting to reconnect
- 'disconnected' - Not connected
- 'failed' - Connection failed

Methods

connect(): Promise<void> - Connect to voice session (validates token, joins Daily room, starts remote audio)
disconnect(): Promise<void> - Disconnect and cleanup resources (leaves room, releases audio elements)
mute(): void - Mute microphone
unmute(): void - Unmute microphone

Events

Subscribe to events using agent.on(event, callback):

Connection events:

connection:state - (state: ConnectionState) => void
connection:error - (error: VoiceAgentError) => void

Transcript events:

transcript:interim - (data: TranscriptData) => void - Partial transcripts (not emitted by all agent types)
transcript:final - (data: TranscriptData) => void - One event per completed turn; emitted in real-time as each user or agent turn finishes

Audio events:

audio:muted - () => void
audio:unmuted - () => void

Session events:

session:expiring - (expiresIn: number) => void - 5 minutes before expiration

Error Handling

The SDK provides typed error classes for different scenarios:

import {
  VoiceAgentError,
  TokenExpiredError,
  TokenInvalidError,
  ConnectionFailedError,
  PermissionDeniedError,
} from '@voxdiscover/voiceserver';

try {
  await agent.connect();
} catch (error) {
  // Pattern 1: instanceof checks
  if (error instanceof TokenExpiredError) {
    console.log('Token expired, requesting new session...');
    // Request new token from backend
  } else if (error instanceof PermissionDeniedError) {
    console.log('Microphone permission denied');
    // Show permission request UI
  }

  // Pattern 2: code property checks
  if (error.code === 'CONNECTION_FAILED' && error.retryable) {
    console.log('Retryable error, will auto-reconnect');
  }

  // Access error details
  console.log('Message:', error.message);
  console.log('Suggestion:', error.context?.suggestion);
  console.log('Retryable:', error.retryable);
}

Error types:

TokenExpiredError - Session token expired (non-retryable)
TokenInvalidError - Token malformed or invalid (non-retryable)
ConnectionFailedError - WebRTC connection failed (retryable)
PermissionDeniedError - Microphone permission denied (non-retryable)
NetworkError - Network error during API call (retryable)

Complete Example

import { VoiceAgent, TokenExpiredError } from '@voxdiscover/voiceserver';

async function startVoiceCall() {
  // 1. Get session token from your backend
  const { token } = await fetch('/api/voice-session', {
    method: 'POST',
    body: JSON.stringify({ agentId: 'support-agent' }),
  }).then(r => r.json());

  // 2. Initialize agent
  const agent = new VoiceAgent({
    token,
    baseUrl: 'https://voiceserver.voxdiscover.com',
    reconnection: { enabled: true, maxAttempts: 5 },
  });

  // 3. Set up event listeners
  agent.on('connection:state', (state) => {
    updateUI({ connectionState: state });
  });

  agent.on('transcript:final', ({ text, speaker }) => {
    addMessageToChat({ speaker, text });
  });

  agent.on('connection:error', async (error) => {
    if (error instanceof TokenExpiredError) {
      // Refresh token and reconnect
      const { token: newToken } = await refreshSession();
      // Create new agent with fresh token
      await startVoiceCall();
    } else {
      showError(error.message, error.context?.suggestion);
    }
  });

  // 4. Connect
  try {
    await agent.connect();
    showUI('connected');
  } catch (error) {
    showUI('error', error.message);
  }

  return agent;
}

// Usage in UI event handlers
document.getElementById('startCall').addEventListener('click', async () => {
  const agent = await startVoiceCall();

  document.getElementById('muteBtn').addEventListener('click', () => {
    agent.mute();
  });

  document.getElementById('endCall').addEventListener('click', async () => {
    await agent.disconnect();
  });
});

Analytics Hooks

The SDK provides standardized analytics hooks for integrating with observability platforms like Segment, DataDog, or PostHog. Analytics hooks emit lifecycle and error events only (not transcripts or audio events) to keep analytics data clean.

Registering an Analytics Callback

import { VoiceAgent } from '@voxdiscover/voiceserver';

const agent = new VoiceAgent({ token });

agent.onAnalyticsEvent((event) => {
  console.log('Analytics event:', event.eventType, {
    sessionId: event.sessionId,
    agentId: event.agentId,
    userId: event.userId,
    timestamp: event.timestamp,
  });
});

Event Types

| Event Type | When Emitted | |------------|-------------| | session_started | WebRTC connection established (joined Daily room) | | session_ended | Session disconnected (explicit disconnect) | | connection_failed | Connection error (token invalid, network failure, etc.) | | agent_swap_completed | Agent hot-swap completed successfully | | agent_swap_failed | Agent hot-swap failed | | error | Categorized SDK error requiring developer attention |

Event Payload Structure

interface AnalyticsEvent {
  timestamp: number;        // Unix timestamp in milliseconds
  eventType: string;        // One of the event types above
  sessionId: string;        // Session identifier from token
  agentId?: string;         // Agent identifier from token
  userId?: string;          // User identifier from session context
  customContext?: Record<string, any>;  // Context from session creation
  error?: {
    code: string;           // Programmatic error code
    message: string;        // Human-readable error description
    retryable: boolean;     // Whether operation can be retried
  };
}

Integration with Segment

import Analytics from 'analytics';
import segmentPlugin from '@analytics/segment';

// Initialize Segment
const analytics = Analytics({
  app: 'my-voice-app',
  plugins: [
    segmentPlugin({
      writeKey: 'YOUR_SEGMENT_WRITE_KEY',
    }),
  ],
});

// Register analytics callback
const agent = new VoiceAgent({ token });

agent.onAnalyticsEvent((event) => {
  analytics.track(event.eventType, {
    session_id: event.sessionId,
    agent_id: event.agentId,
    user_id: event.userId,
    timestamp: event.timestamp,
    // Error details (only present on failure events)
    ...(event.error && {
      error_code: event.error.code,
      error_message: event.error.message,
      error_retryable: event.error.retryable,
    }),
    // Custom context (from session creation)
    ...event.customContext,
  });
});

await agent.connect();

Integration with DataDog / PostHog

Any analytics platform that accepts key-value event properties works the same way:

agent.onAnalyticsEvent((event) => {
  // PostHog example
  posthog.capture(event.eventType, {
    distinct_id: event.userId,
    session_id: event.sessionId,
    agent_id: event.agentId,
    $timestamp: new Date(event.timestamp).toISOString(),
  });

  // DataDog example
  datadogRum.addAction(event.eventType, {
    session_id: event.sessionId,
    user_id: event.userId,
  });
});

Multiple Callbacks

Multiple callbacks can be registered - all receive every event:

// Log to console
agent.onAnalyticsEvent((event) => {
  console.log('[Voice Analytics]', event.eventType, event.sessionId);
});

// Send to Segment
agent.onAnalyticsEvent((event) => {
  analytics.track(event.eventType, { session_id: event.sessionId });
});

// Send to custom backend
agent.onAnalyticsEvent((event) => {
  fetch('/api/analytics', {
    method: 'POST',
    body: JSON.stringify(event),
  });
});

IMPORTANT: Read-Only Callbacks

Analytics callbacks MUST be read-only. Do NOT call SDK methods (connect, disconnect, mute, etc.) inside a callback. Calling SDK methods from within an analytics callback creates a circular event chain that triggers the circuit breaker and disables analytics for the remainder of the session.

Do NOT do this:

// WRONG: Calling SDK methods inside analytics callback
agent.onAnalyticsEvent((event) => {
  if (event.eventType === 'session_ended') {
    agent.connect(); // This will trigger another analytics event -> infinite loop
  }
});

Do this instead:

// CORRECT: React to events outside the callback
agent.on('connection:state', (state) => {
  if (state === 'disconnected') {
    handleReconnect(); // Handle reconnection in event listener, not analytics callback
  }
});

agent.onAnalyticsEvent((event) => {
  // Read-only: only forward events to external services
  myAnalytics.track(event.eventType, { session_id: event.sessionId });
});

TypeScript Support

The SDK is written in TypeScript and includes full type definitions. All types are exported:

import type {
  VoiceAgentConfig,
  ConnectionState,
  TranscriptData,
  VoiceAgentEvents,
  VoiceAgentErrorCode,
} from '@voxdiscover/voiceserver';

Internal Architecture Notes

Headless Mode Audio

VoiceAgent uses DailyIframe.createCallObject() (headless mode), which does not auto-play remote audio. The SDK manages this internally: a track-started handler creates an <Audio> element per remote participant and pipes the incoming track through it. No additional setup is needed in your application.

Transcript Delivery

Transcripts are streamed in real-time over Daily's app-message channel. Each completed turn (user or agent) triggers one transcript:final event. The server uses Pipecat's OutputTransportMessageFrame API to broadcast each turn to all room participants.

Browser Support

Chrome 90+
Firefox 88+
Safari 14+
Edge 90+

Requirements:

WebRTC support
getUserMedia API support
ES2022 features

License

MIT

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

@voxdiscover/voiceserver

Installation

Quick Start

1. Obtain Session Token

2. Initialize SDK

3. Subscribe to Events

4. Connect to Voice Agent

5. Control Audio

6. Disconnect

API Reference

VoiceAgent

Constructor

Properties

Methods

Events

Error Handling

Complete Example

Analytics Hooks

Registering an Analytics Callback

Event Types

Event Payload Structure

Integration with Segment

Integration with DataDog / PostHog

Multiple Callbacks

IMPORTANT: Read-Only Callbacks

TypeScript Support

Internal Architecture Notes

Headless Mode Audio

Transcript Delivery

Browser Support

License

`VoiceAgent`