@hamsa-ai/voice-agents-sdk

v0.4.6

Published

8 days ago

Hamsa AI - Voice Agents JavaScript SDK

Downloads

1,291

0High
0Medium
0Low

hamsaai

noor-bayari

arab voice agent web nodejs sdk hamsa ai

Hamsa Voice Agents Web SDK

Hamsa Voice Agents Web SDK is a JavaScript library for integrating voice agents from https://dashboard.tryhamsa.com. This SDK provides a seamless way to incorporate voice interactions into your web applications with high-quality real-time audio communication.

Installation

Install the SDK via npm:

npm i @hamsa-ai/voice-agents-sdk

Usage

Using via npm

First, import the package in your code:

import { HamsaVoiceAgent } from "@hamsa-ai/voice-agents-sdk";

Initialize the SDK with your API key:

const agent = new HamsaVoiceAgent(API_KEY);

Using via CDN

Include the script from a CDN:

<script src="https://unpkg.com/@hamsa-ai/voice-agents-sdk@LATEST_VERSION/dist/index.umd.js"></script>

Then, you can initialize the agent like this:

const agent = new HamsaVoiceAgent("YOUR_API_KEY");

agent.on("callStarted", () => {
  console.log("Conversation has started!");
});

// Example: Start a call
// agent.start({ agentId: 'YOUR_AGENT_ID' });

Make sure to replace LATEST_VERSION with the actual latest version number.

Start a Conversation with an Existing Agent

Start a conversation with an existing agent by calling the "start" function. You can create and manage agents in our Dashboard or using our API (see: https://docs.tryhamsa.com):

agent.start({
  agentId: YOUR_AGENT_ID,
  params: {
    param1: "NAME",
    param2: "NAME2",
  },
  voiceEnablement: true,
  userId: "user-123", // Optional user tracking
  preferHeadphonesForIosDevices: true, // iOS audio optimization
  connectionDelay: {
    android: 3000, // 3 second delay for Android
    ios: 0,
    default: 0,
  },
});

When creating an agent, you can add parameters to your pre-defined values. For example, you can set your Greeting Message to: "Hello {{name}}, how can I help you today?" and pass the "name" as a parameter to use the correct name of the user.

Pause/Resume a Conversation

To pause the conversation, call the "pause" function. This will prevent the SDK from sending or receiving new data until you resume the conversation:

agent.pause();

To resume the conversation:

agent.resume();

End a Conversation

To end a conversation, simply call the "end" function:

agent.end();

Advanced Audio Controls

The SDK provides comprehensive audio control features for professional voice applications:

Volume Management

// Set agent voice volume (0.0 to 1.0)
agent.setVolume(0.8);

// Get current output volume
const currentVolume = agent.getOutputVolume();
console.log(`Volume: ${Math.round(currentVolume * 100)}%`);

// Get user microphone input level
const inputLevel = agent.getInputVolume();
if (inputLevel > 0.1) {
  showUserSpeakingIndicator();
}

Microphone Control

// Mute/unmute microphone
agent.setMicMuted(true);  // Mute
agent.setMicMuted(false); // Unmute

// Check mute status
if (agent.isMicMuted()) {
  showUnmutePrompt();
}

// Toggle microphone
const currentMuted = agent.isMicMuted();
agent.setMicMuted(!currentMuted);

// Listen for microphone events
agent.on('micMuted', () => {
  document.getElementById('micButton').classList.add('muted');
});

agent.on('micUnmuted', () => {
  document.getElementById('micButton').classList.remove('muted');
});

Audio Visualization

Create real-time audio visualizers using frequency data:

// Input visualizer (user's microphone)
function createInputVisualizer() {
  const canvas = document.getElementById('inputVisualizer');
  const ctx = canvas.getContext('2d');
  
  function draw() {
    const frequencyData = agent.getInputByteFrequencyData();
    
    ctx.clearRect(0, 0, canvas.width, canvas.height);
    const barWidth = canvas.width / frequencyData.length;
    
    for (let i = 0; i < frequencyData.length; i++) {
      const barHeight = (frequencyData[i] / 255) * canvas.height;
      ctx.fillStyle = `hsl(${i * 2}, 70%, 60%)`;
      ctx.fillRect(i * barWidth, canvas.height - barHeight, barWidth, barHeight);
    }
    
    requestAnimationFrame(draw);
  }
  
  draw();
}

// Output visualizer (agent's voice)
function createOutputVisualizer() {
  const canvas = document.getElementById('outputVisualizer');
  const ctx = canvas.getContext('2d');
  
  agent.on('speaking', () => {
    function draw() {
      const frequencyData = agent.getOutputByteFrequencyData();
      
      if (frequencyData.length > 0) {
        ctx.clearRect(0, 0, canvas.width, canvas.height);
        
        // Draw voice characteristics
        for (let i = 0; i < frequencyData.length; i++) {
          const barHeight = (frequencyData[i] / 255) * canvas.height;
          ctx.fillStyle = `hsl(${240 + i}, 70%, 60%)`;
          ctx.fillRect(i * 2, canvas.height - barHeight, 2, barHeight);
        }
        
        requestAnimationFrame(draw);
      }
    }
    draw();
  });
}

Audio Capture

Capture raw audio data from the agent or user for forwarding to third-party services, custom recording, or advanced audio processing.

The SDK provides three levels of API for different use cases:

Level 1: Simple Callback (Recommended for Most Users)

The easiest way - just pass a callback to start():

// Dead simple - captures agent audio automatically
await agent.start({
  agentId: 'agent-123',
  voiceEnablement: true,
  onAudioData: (audioData) => {
    // Send to third-party service
    thirdPartyWebSocket.send(audioData);
  }
});

This automatically:

✅ Captures agent audio only
✅ Uses opus-webm format (efficient, compressed)
✅ Delivers 100ms chunks (good balance of latency/efficiency)
✅ Starts immediately when call connects
✅ No timing issues or event handling needed

Level 2: Inline Configuration

Need more control? Use captureAudio options:

await agent.start({
  agentId: 'agent-123',
  voiceEnablement: true,
  captureAudio: {
    source: 'both',       // Capture both agent and user
    format: 'pcm-f32',    // Raw PCM for processing
    bufferSize: 4096,
    onData: (audioData, metadata) => {
      if (metadata.source === 'agent') {
        processAgentAudio(audioData);
      } else {
        processUserAudio(audioData);
      }
    }
  }
});

Level 3: Dynamic Control

For advanced users who need runtime control:

// Start without capture
await agent.start({
  agentId: 'agent-123',
  voiceEnablement: true
});

// Enable capture later, conditionally
if (userWantsRecording) {
  agent.enableAudioCapture({
    source: 'agent',
    format: 'opus-webm',
    chunkSize: 100,
    callback: (audioData, metadata) => {
      thirdPartyWebSocket.send(audioData);
    }
  });
}

// Disable when done
agent.disableAudioCapture();

Audio Capture Formats

The SDK supports three audio formats:

opus-webm (default, recommended)
- Efficient Opus codec in WebM container
- Small file size, good quality
- Best for forwarding to services or recording
- audioData is an ArrayBuffer
pcm-f32
- Raw PCM audio as Float32Array
- Values range from -1.0 to 1.0
- Best for audio analysis or DSP
- audioData is a Float32Array
pcm-i16
- Raw PCM audio as Int16Array
- Values range from -32768 to 32767
- Best for compatibility with audio APIs
- audioData is an Int16Array

Common Use Cases

Forward agent audio to third-party service:

const socket = new WebSocket('wss://your-service.com/audio');

agent.enableAudioCapture({
  source: 'agent',
  format: 'opus-webm',
  chunkSize: 100,
  callback: (audioData, metadata) => {
    socket.send(audioData);
  }
});

Capture both agent and user audio:

agent.enableAudioCapture({
  source: 'both',
  format: 'opus-webm',
  chunkSize: 100,
  callback: (audioData, metadata) => {
    if (metadata.source === 'agent') {
      processAgentAudio(audioData);
    } else {
      processUserAudio(audioData);
    }
  }
});

Advanced: Custom audio analysis with PCM:

agent.enableAudioCapture({
  source: 'agent',
  format: 'pcm-f32',
  bufferSize: 4096,
  callback: (audioData, metadata) => {
    const samples = audioData; // Float32Array

    // Calculate RMS volume
    let sum = 0;
    for (let i = 0; i < samples.length; i++) {
      sum += samples[i] * samples[i];
    }
    const rms = Math.sqrt(sum / samples.length);

    console.log('Agent voice level:', rms);

    // Apply custom DSP, analyze frequencies, etc.
    customAudioProcessor.process(samples, metadata.sampleRate);
  }
});

Real-time transcription:

const transcriptionWS = new WebSocket('wss://transcription-service.com');

agent.enableAudioCapture({
  source: 'user',
  format: 'opus-webm',
  chunkSize: 50, // Lower latency
  callback: (audioData, metadata) => {
    transcriptionWS.send(JSON.stringify({
      audio: Array.from(new Uint8Array(audioData)),
      timestamp: metadata.timestamp,
      participant: metadata.participant
    }));
  }
});

TypeScript support:

import { AudioCaptureOptions, AudioCaptureMetadata } from '@hamsa-ai/voice-agents-sdk';

const options: AudioCaptureOptions = {
  source: 'agent',
  format: 'pcm-f32',
  bufferSize: 4096,
  callback: (audioData: Float32Array | Int16Array | ArrayBuffer, metadata: AudioCaptureMetadata) => {
    console.log('Audio captured:', {
      participant: metadata.participant,
      source: metadata.source,        // 'agent' | 'user'
      trackId: metadata.trackId,
      timestamp: metadata.timestamp,
      sampleRate: metadata.sampleRate,  // For PCM formats
      channels: metadata.channels,      // For PCM formats
      format: metadata.format
    });
  }
};

agent.enableAudioCapture(options);

Advanced Configuration Options

Platform-Specific Optimizations

agent.start({
  agentId: "your-agent-id",
  
  // Optimize audio for iOS devices
  preferHeadphonesForIosDevices: true,
  
  // Platform-specific delays to prevent audio cutoff
  connectionDelay: {
    android: 3000, // Android needs longer delay for audio mode switching
    ios: 500,      // Shorter delay for iOS
    default: 1000  // Default for other platforms
  },
  
  // Disable wake lock for battery optimization
  disableWakeLock: false,
  
  // User tracking
  userId: "customer-12345"
});

Events

During the conversation, the SDK emits events to update your application about the conversation status.

Conversation Status Events

agent.on("callStarted", () => {
  console.log("Conversation has started!");
});
agent.on("callEnded", () => {
  console.log("Conversation has ended!");
});
agent.on("callPaused", () => {
  console.log("The conversation is paused");
});
agent.on("callResumed", () => {
  console.log("Conversation has resumed");
});

Agent Status Events

agent.on("speaking", () => {
  console.log("The agent is speaking");
});
agent.on("listening", () => {
  console.log("The agent is listening");
});

// Unified agent state change event
agent.on("agentStateChanged", (state) => {
  console.log("Agent state:", state);
  // state can be: 'idle', 'initializing', 'listening', 'thinking', 'speaking'
});

Conversation Script Events

agent.on("transcriptionReceived", (text) => {
  console.log("User speech transcription received", text);
});
agent.on("answerReceived", (text) => {
  console.log("Agent answer received", text);
});

Error Events

agent.on("closed", () => {
  console.log("Conversation was closed");
});
agent.on("error", (e) => {
  console.log("Error was received", e);
});

Advanced Analytics Events

The SDK provides comprehensive analytics for monitoring call quality, performance, and custom agent events:

// Real-time connection quality updates
agent.on("connectionQualityChanged", ({ quality, participant, metrics }) => {
  console.log(`Connection quality: ${quality}`, metrics);
});

// Periodic analytics updates (every second during calls)
agent.on("analyticsUpdated", (analytics) => {
  console.log("Call analytics:", analytics);
  // Contains: connectionStats, audioMetrics, performanceMetrics, etc.
});

// Participant events
agent.on("participantConnected", (participant) => {
  console.log("Participant joined:", participant.identity);
});

agent.on("participantDisconnected", (participant) => {
  console.log("Participant left:", participant.identity);
});

// Track subscription events (audio/video streams)
agent.on("trackSubscribed", ({ track, participant, trackStats }) => {
  console.log("New track:", track.kind, "from", participant);
});

agent.on("trackUnsubscribed", ({ track, participant }) => {
  console.log("Track ended:", track.kind, "from", participant);
});

// Connection state changes
agent.on("reconnecting", () => {
  console.log("Attempting to reconnect...");
});

agent.on("reconnected", () => {
  console.log("Successfully reconnected");
});

// Custom events from agents
agent.on("customEvent", (eventType, eventData, metadata) => {
  console.log(`Custom event: ${eventType}`, eventData);
  // Examples: flow_navigation, tool_execution, agent_state_change
});

Analytics & Monitoring

The SDK provides comprehensive real-time analytics for monitoring call quality, performance metrics, and custom agent events. Access analytics data through both synchronous methods and event-driven updates.

Analytics Architecture

The SDK uses a clean modular design with four specialized components:

Connection Management: Handles room connections, participants, and network state
Analytics Engine: Processes WebRTC statistics and performance metrics
Audio Management: Manages audio tracks, volume control, and quality monitoring
Tool Registry: Handles RPC method registration and client-side tool execution

Access analytics data through both synchronous methods and event-driven updates.

Synchronous Analytics Methods

Get real-time analytics data instantly for dashboards and monitoring:

// Connection quality and network statistics
const connectionStats = agent.getConnectionStats();
console.log(connectionStats);
/*
{
  quality: 'good',         // Connection quality: excellent/good/poor/lost
  connectionAttempts: 1,   // Total connection attempts
  reconnectionAttempts: 0, // Reconnection attempts
  connectionEstablishedTime: 250, // Time to establish connection (ms)
  isConnected: true        // Current connection status
}
*/

// Audio levels and quality metrics
const audioLevels = agent.getAudioLevels();
console.log(audioLevels);
/*
{
  userAudioLevel: 0.8,         // Current user audio level
  agentAudioLevel: 0.3,        // Current agent audio level
  userSpeakingTime: 30000,     // User speaking duration (ms)
  agentSpeakingTime: 20000,    // Agent speaking duration (ms)
  audioDropouts: 0,            // Audio interruption count
  echoCancellationActive: true,// Echo cancellation status
  volume: 1.0,                 // Current volume setting
  isPaused: false              // Pause state
}
*/

// Performance metrics
const performance = agent.getPerformanceMetrics();
console.log(performance);
/*
{
  responseTime: 1200,          // Total response time
  callDuration: 60000,         // Current call duration (ms)
  connectionEstablishedTime: 250, // Time to establish connection
  reconnectionCount: 0,        // Number of reconnections
  averageResponseTime: 1200    // Average response time
}
*/

// Participant information
const participants = agent.getParticipants();
console.log(participants);
/*
[
  {
    identity: "agent",
    sid: "participant-sid",
    connectionTime: 1638360000000,
    metadata: "agent-metadata"
  }
]
*/

// Track statistics (audio/video streams)
const trackStats = agent.getTrackStats();
console.log(trackStats);
/*
{
  totalTracks: 2,
  activeTracks: 2,
  audioElements: 1,
  trackDetails: [
    ["track-id", { trackId: "track-id", kind: "audio", participant: "agent" }]
  ]
}
*/

// Complete analytics snapshot
const analytics = agent.getCallAnalytics();
console.log(analytics);
/*
{
  connectionStats: { quality: 'good', connectionAttempts: 1, isConnected: true, ... },
  audioMetrics: { userAudioLevel: 0.8, agentAudioLevel: 0.3, ... },
  performanceMetrics: { callDuration: 60000, responseTime: 1200, ... },
  participants: [{ identity: 'agent', sid: 'participant-sid', ... }],
  trackStats: { totalTracks: 2, activeTracks: 2, ... },
  callStats: { connectionAttempts: 1, packetsLost: 0, ... },
  metadata: {
    callStartTime: 1638360000000,
    isConnected: true,
    isPaused: false,
    volume: 1.0
  }
}
*/

Real-time Dashboard Example

Build live monitoring dashboards using the analytics data:

// Update dashboard every second
const updateDashboard = () => {
  const stats = agent.getConnectionStats();
  const audio = agent.getAudioLevels();
  const performance = agent.getPerformanceMetrics();

  // Update UI elements
  document.getElementById("quality").textContent = stats.quality;
  document.getElementById("attempts").textContent = stats.connectionAttempts;
  document.getElementById("duration").textContent = `${Math.floor(
    performance.callDuration / 1000
  )}s`;
  document.getElementById("user-audio").style.width = `${
    audio.userAudioLevel * 100
  }%`;
  document.getElementById("agent-audio").style.width = `${
    audio.agentAudioLevel * 100
  }%`;
};

// Start dashboard updates when call begins
agent.on("callStarted", () => {
  const dashboardInterval = setInterval(updateDashboard, 1000);

  agent.on("callEnded", () => {
    clearInterval(dashboardInterval);
  });
});

Custom Event Tracking

Track custom events from your voice agents:

agent.on("customEvent", (eventType, eventData, metadata) => {
  switch (eventType) {
    case "flow_navigation":
      console.log("Agent navigated:", eventData.from, "->", eventData.to);
      // Track conversation flow
      break;

    case "tool_execution":
      console.log(
        "Tool called:",
        eventData.toolName,
        "Result:",
        eventData.success
      );
      // Monitor tool usage
      break;

    case "agent_state_change":
      console.log("Agent state:", eventData.state);
      // Track agent behavior
      break;

    case "user_intent_detected":
      console.log(
        "User intent:",
        eventData.intent,
        "Confidence:",
        eventData.confidence
      );
      // Analyze user intent
      break;

    default:
      console.log("Custom event:", eventType, eventData);
  }
});

Configuration Options

The SDK accepts optional configuration parameters:

const agent = new HamsaVoiceAgent("YOUR_API_KEY", {
  API_URL: "https://api.tryhamsa.com", // API endpoint (default)
});

Client-Side Tools

You can register client-side tools that the agent can call during conversations:

const tools = [
  {
    function_name: "getUserInfo",
    description: "Get user information",
    parameters: [
      {
        name: "userId",
        type: "string",
        description: "User ID to look up",
      },
    ],
    required: ["userId"],
    fn: async (userId) => {
      // Your tool implementation
      const userInfo = await fetchUserInfo(userId);
      return userInfo;
    },
  },
];

agent.start({
  agentId: "YOUR_AGENT_ID",
  tools: tools,
  voiceEnablement: true,
});

Migration from Previous Versions

If you're upgrading from a previous version, see the Migration Guide for detailed instructions. Connection details are now automatically managed and no longer need to be configured.

Browser Compatibility

This SDK supports modern browsers with WebRTC capabilities:

Chrome 60+
Firefox 60+
Safari 12+
Edge 79+

TypeScript Support

The SDK includes comprehensive TypeScript definitions with detailed analytics interfaces:

import {
  HamsaVoiceAgent,
  AgentState,
  AudioCaptureOptions,
  AudioCaptureMetadata,
  CallAnalyticsResult,
  ParticipantData,
  CustomEventMetadata,
} from "@hamsa-ai/voice-agents-sdk";

// All analytics methods return strongly typed data
const agent = new HamsaVoiceAgent("API_KEY");

// TypeScript will provide full autocomplete and type checking for all methods
const connectionStats = agent.getConnectionStats(); // ConnectionStatsResult | null
const audioLevels = agent.getAudioLevels(); // AudioLevelsResult | null
const performance = agent.getPerformanceMetrics(); // PerformanceMetricsResult | null
const participants = agent.getParticipants(); // ParticipantData[]
const trackStats = agent.getTrackStats(); // TrackStatsResult | null
const analytics = agent.getCallAnalytics(); // CallAnalyticsResult | null

// Advanced audio control methods
const outputVolume = agent.getOutputVolume(); // number
const inputVolume = agent.getInputVolume(); // number
const isMuted = agent.isMicMuted(); // boolean
const inputFreqData = agent.getInputByteFrequencyData(); // Uint8Array
const outputFreqData = agent.getOutputByteFrequencyData(); // Uint8Array

// Audio capture with full type safety
agent.enableAudioCapture({
  source: 'agent',
  format: 'opus-webm',
  chunkSize: 100,
  callback: (audioData: ArrayBuffer | Float32Array | Int16Array, metadata: AudioCaptureMetadata) => {
    // Full TypeScript autocomplete for metadata
    console.log(metadata.participant); // string
    console.log(metadata.source);      // 'agent' | 'user'
    console.log(metadata.timestamp);   // number
    console.log(metadata.trackId);     // string
    console.log(metadata.sampleRate);  // number | undefined
  }
});

// Strongly typed start options with all advanced features
await agent.start({
  agentId: "agent-id",
  voiceEnablement: true,
  userId: "user-123",
  params: {
    userName: "John Doe",
    sessionId: "session-456"
  },
  preferHeadphonesForIosDevices: true,
  connectionDelay: {
    android: 3000,
    ios: 500,
    default: 1000
  },
  disableWakeLock: false
});

// Strongly typed event handlers
agent.on("analyticsUpdated", (analytics: CallAnalyticsResult) => {
  console.log(analytics.connectionStats.quality); // string
  console.log(analytics.audioMetrics.userAudioLevel); // number
  console.log(analytics.performanceMetrics.callDuration); // number
  console.log(analytics.participants.length); // number
});

// Audio control events
agent.on("micMuted", () => {
  console.log("Microphone was muted");
});

agent.on("micUnmuted", () => {
  console.log("Microphone was unmuted");
});

// Agent state tracking with type safety
agent.on("agentStateChanged", (state: AgentState) => {
  console.log("Agent state:", state); // 'idle' | 'initializing' | 'listening' | 'thinking' | 'speaking'

  // TypeScript provides autocomplete and type checking
  if (state === 'thinking') {
    showThinkingIndicator();
  }
});

// Strongly typed custom events
agent.on(
  "customEvent",
  (eventType: string, eventData: any, metadata: CustomEventMetadata) => {
    console.log(metadata.timestamp); // number
    console.log(metadata.participant); // string
  }
);

// Strongly typed participant events
agent.on("participantConnected", (participant: ParticipantData) => {
  console.log(participant.identity); // string
  console.log(participant.connectionTime); // number
});

Use Cases

Agent State UI Updates

agent.on("agentStateChanged", (state) => {
  // Update UI based on agent state
  const statusElement = document.getElementById("agent-status");

  switch (state) {
    case 'idle':
      statusElement.textContent = "Agent is idle";
      statusElement.className = "status-idle";
      break;
    case 'initializing':
      statusElement.textContent = "Agent is starting...";
      statusElement.className = "status-initializing";
      break;
    case 'listening':
      statusElement.textContent = "Agent is listening";
      statusElement.className = "status-listening";
      showMicrophoneAnimation();
      break;
    case 'thinking':
      statusElement.textContent = "Agent is thinking...";
      statusElement.className = "status-thinking";
      showThinkingAnimation();
      break;
    case 'speaking':
      statusElement.textContent = "Agent is speaking";
      statusElement.className = "status-speaking";
      showSpeakerAnimation();
      break;
  }
});

Real-time Call Quality Monitoring

agent.on("connectionQualityChanged", ({ quality, metrics }) => {
  if (quality === "poor") {
    showNetworkWarning();
    logQualityIssue(metrics);
  }
});

Analytics Dashboard

const analytics = agent.getCallAnalytics();
sendToAnalytics({
  callDuration: analytics.callDuration,
  audioQuality: analytics.audioMetrics,
  participantCount: analytics.participants.length,
  performance: analytics.performanceMetrics,
});

Conversation Flow Analysis

agent.on("customEvent", (eventType, data) => {
  if (eventType === "flow_navigation") {
    trackConversationFlow(data.from, data.to);
    optimizeAgentResponses(data);
  }
});

Dependencies

livekit-client v2.15.4: Real-time communication infrastructure
events v3.3.0: EventEmitter for browser compatibility

The SDK uses LiveKit's native WebRTC capabilities for high-quality real-time audio communication and comprehensive analytics.