@sipgate/ai-flow-sdk
v1.2.0
Published
Official SDK for sipgate AI Flow
Downloads
627
Readme
@sipgate/ai-flow-sdk
Official SDK for sipgate AI Flow - A powerful TypeScript SDK for building AI-powered voice assistants with real-time speech processing capabilities.
Table of Contents
- Installation
- Quick Start
- Core Concepts
- API Reference
- Integration Guides
- Working Without the Assistant Wrapper
Installation
npm install @sipgate/ai-flow-sdk
# or
yarn add @sipgate/ai-flow-sdk
# or
pnpm add @sipgate/ai-flow-sdkRequirements:
- Node.js >= 22.0.0
- TypeScript 5.x recommended
Quick Start
Basic Assistant
import { AiFlowAssistant } from "@sipgate/ai-flow-sdk";
const assistant = AiFlowAssistant.create({
debug: true,
onSessionStart: async (event) => {
console.log(`Session started for ${event.session.phone_number}`);
return "Hello! How can I help you today?";
},
onUserSpeak: async (event) => {
const userText = event.text;
console.log(`User said: ${userText}`);
// Process user input and return response
return `You said: ${userText}`;
},
onSessionEnd: async (event) => {
console.log(`Session ${event.session.id} ended`);
},
onUserBargeIn: async (event) => {
console.log(`User interrupted with: ${event.text}`);
return "I'm listening, please continue.";
},
});Core Concepts
Event-Driven Architecture
The SDK uses an event-driven model where your assistant responds to events from the AI Flow service:
- Session Start - Called when a new call session begins
- User Speak - Called when the user says something (after speech-to-text)
- User Barge In - Called when the user interrupts the assistant
- Assistant Speak - Called after your assistant starts speaking (event may be left out)
- Assistant Speech Ended - Called when the assistant's speech playback ends
- User Input Timeout - Called when no user speech is detected within the configured timeout period
- Session End - Called when the call ends
Response Types
Event handlers can return three types of responses:
// 1. Simple string (automatically converted to speak action)
return "Hello, how can I help?";
// 2. Action object (for advanced control)
return {
type: AiFlowActionType.SPEAK,
session_id: event.session.id,
text: "Hello!",
barge_in: { strategy: BargeInStrategy.MINIMUM_CHARACTERS },
};
// 3. null/undefined (no response needed)
return null;API Reference
AiFlowAssistant
The main class for creating AI voice assistants.
AiFlowAssistant.create(options)
Creates a new assistant instance.
Options:
interface AiFlowAssistantOptions {
// Optional API key for authentication
apiKey?: string;
// Enable debug logging
debug?: boolean;
// Event handlers
onSessionStart?: (
event: AiFlowApiEventSessionStart
) => Promise<InvocationResponseType>;
onUserSpeak?: (
event: AiFlowApiEventUserSpeak
) => Promise<InvocationResponseType>;
onAssistantSpeak?: (
event: AiFlowApiEventAssistantSpeak
) => Promise<InvocationResponseType>;
onAssistantSpeechEnded?: (
event: AiFlowEventAssistantSpeechEnded
) => Promise<InvocationResponseType>;
onUserInputTimeout?: (
event: AiFlowEventUserInputTimeout
) => Promise<InvocationResponseType>;
onSessionEnd?: (
event: AiFlowApiEventSessionEnd
) => Promise<InvocationResponseType>;
onUserBargeIn?: (
event: AiFlowEventUserBargeIn
) => Promise<InvocationResponseType>;
}
type InvocationResponseType = AiFlowApiAction | string | null | undefined;Instance Methods
assistant.express()
Returns an Express.js middleware function for handling webhook requests.
app.post("/webhook", assistant.express());assistant.ws(websocket)
Returns a WebSocket message handler.
wss.on("connection", (ws) => {
ws.on("message", assistant.ws(ws));
});assistant.onEvent(event)
Manually process an event (useful for custom integrations).
const action = await assistant.onEvent(event);Event Types
SessionStart Event
Triggered when a new call session begins.
interface AiFlowApiEventSessionStart {
type: "session_start";
session: {
id: string; // UUID of the session
account_id: string; // Account identifier
phone_number: string; // Phone number for this flow session
direction?: "inbound" | "outbound"; // Call direction
from_phone_number: string; // Phone number of the caller
to_phone_number: string; // Phone number of the callee
};
}Example:
onSessionStart: async (event) => {
// Log session details
console.log(`${event.session.direction} call from ${event.session.from_phone_number} to ${event.session.to_phone_number}`);
// Return greeting
return "Welcome to our service!";
};UserSpeak Event
Triggered when the user speaks and speech-to-text completes.
interface AiFlowApiEventUserSpeak {
type: "user_speak";
text: string; // Recognized speech text
session: {
id: string;
account_id: string;
phone_number: string;
};
}Example:
onUserSpeak: async (event) => {
const intent = analyzeIntent(event.text);
if (intent === "help") {
return "I can help you with billing, support, or sales.";
}
return processUserInput(event.text);
};AssistantSpeak Event
Triggered after the assistant starts speaking. Event may be omitted for some text-to-speech models.
interface AiFlowApiEventAssistantSpeak {
type: "assistant_speak";
text?: string; // Text that was spoken
ssml?: string; // SSML that was used (if applicable)
duration_ms: number; // Duration of speech in milliseconds
speech_started_at: number; // Unix timestamp (ms) when speech started
session: SessionInfo;
}Example:
onAssistantSpeak: async (event) => {
console.log(`Spoke for ${event.duration_ms}ms`);
// Track conversation metrics
trackMetrics({
sessionId: event.session.id,
duration: event.duration_ms,
text: event.text,
});
};AssistantSpeechEnded Event
Triggered after the assistant finishes speaking.
interface AiFlowEventAssistantSpeechEnded {
type: "assistant_speech_ended";
session: SessionInfo;
}Example:
onAssistantSpeechEnded: async (event) => {
console.log(`Finished speaking for session ${event.session.id}`);
// Hangup if needed
};UserInputTimeout Event
Triggered when no user speech is detected within the configured timeout period after the assistant finishes speaking.
interface AiFlowEventUserInputTimeout {
type: "user_input_timeout";
session: SessionInfo;
}When Triggered:
- A
speakaction includes auser_input_timeout_secondsfield - The assistant finishes speaking (
assistant_speech_endedevent fires) - The specified timeout period elapses without any user speech detected
Example:
onUserInputTimeout: async (event) => {
console.log(`User input timeout for session ${event.session.id}`);
// Retry the question
return {
type: "speak",
session_id: event.session.id,
text: "Are you still there? Please say yes or no.",
user_input_timeout_seconds: 5
};
};Configuring Timeout:
Set user_input_timeout_seconds in the speak action:
onSessionStart: async (event) => {
return {
type: "speak",
session_id: event.session.id,
text: "What is your account number?",
user_input_timeout_seconds: 5 // Wait 5 seconds for response
};
};Common Use Cases:
// Hangup after multiple timeouts
const timeoutCounts = new Map<string, number>();
onUserInputTimeout: async (event) => {
const sessionId = event.session.id;
const count = (timeoutCounts.get(sessionId) || 0) + 1;
timeoutCounts.set(sessionId, count);
if (count >= 3) {
return {
type: "hangup",
session_id: sessionId
};
}
return {
type: "speak",
session_id: sessionId,
text: `I didn't hear anything. Please respond. Attempt ${count} of 3.`,
user_input_timeout_seconds: 5
};
};SessionEnd Event
Triggered when the call session ends.
interface AiFlowApiEventSessionEnd {
type: "session_end";
session: SessionInfo;
}Example:
onSessionEnd: async (event) => {
// Save conversation history
await saveConversation(event.session.id);
// Send analytics
await trackSessionEnd(event.session);
};Barge-In Detection
User interruptions are detected via the barged_in flag in user_speak events:
interface AiFlowEventUserSpeak {
type: "user_speak";
text: string; // Recognized speech text
barged_in?: boolean; // true if user interrupted assistant
session: SessionInfo;
}When barged_in is true, the user interrupted the assistant mid-speech. The SDK automatically routes these to your onUserBargeIn handler.
Action Types
Actions are responses that tell the AI Flow service what to do next.
Speak Action
Speaks text or SSML to the user.
interface AiFlowActionSpeak {
type: "speak";
session_id: string;
// Either text OR ssml (not both)
text?: string; // Plain text to speak
ssml?: string; // SSML markup for advanced control
// Optional configurations
tts?: TtsConfig; // TTS provider settings
barge_in?: BargeInConfig; // Barge-in behavior
}Examples:
// Simple text
return {
type: AiFlowActionType.SPEAK,
session_id: event.session.id,
text: "Hello, how can I help you?",
};
// With SSML
return {
type: AiFlowActionType.SPEAK,
session_id: event.session.id,
ssml: `
<speak version="1.0" xml:lang="en-US">
<voice name="en-US-JennyNeural">
<prosody rate="slow">Please listen carefully.</prosody>
<break time="500ms"/>
Your account balance is <say-as interpret-as="currency">$42.50</say-as>
</voice>
</speak>
`,
};
// With custom TTS provider
return {
type: AiFlowActionType.SPEAK,
session_id: event.session.id,
text: "Hello in a different voice",
tts: {
provider: TtsProvider.AZURE,
language: "en-US",
voice: "en-US-JennyNeural",
},
};Audio Action
Plays pre-recorded audio to the user.
interface AiFlowActionAudio {
type: "audio";
session_id: string;
audio: string; // Base64 encoded WAV (8kHz, mono, 16-bit)
barge_in?: BargeInConfig;
}Example:
// Play hold music or pre-recorded message
return {
type: AiFlowActionType.AUDIO,
session_id: event.session.id,
audio: base64EncodedWavData,
barge_in: {
strategy: BargeInStrategy.MINIMUM_CHARACTERS,
minimum_characters: 3,
},
};Hangup Action
Ends the call.
interface AiFlowActionHangup {
type: "hangup";
session_id: string;
}Example:
onUserSpeak: async (event) => {
if (event.text.toLowerCase().includes("goodbye")) {
return {
type: AiFlowActionType.HANGUP,
session_id: event.session.id,
};
}
};Transfer Action
Transfers the call to another phone number.
interface AiFlowActionTransfer {
type: "transfer";
session_id: string;
target_phone_number: string; // E.164 format recommended
caller_id_name: string;
caller_id_number: string;
}Example:
// Transfer to sales department
return {
type: AiFlowActionType.TRANSFER,
session_id: event.session.id,
target_phone_number: "+1234567890",
caller_id_name: "Sales Department",
caller_id_number: "+1234567890",
};DTMF Action
Sends DTMF (touch-tone) digits.
interface AiFlowActionDtmf {
type: "dtmf";
session_id: string;
dtmf: number; // Single digit (0-9)
}Example:
// Send DTMF digit
return {
type: AiFlowActionType.DTMF,
session_id: event.session.id,
dtmf: 5,
};Gather DTMF Action
Gathers DTMF (touch-tone) input from the user during a call.
interface AiFlowActionGatherDtmf {
type: "gather_dtmf";
session_id: string;
allowed_gathered_characters?: string; // Default: "0123456789"
audio_url?: string; // Optional audio to play while gathering
finish_on_key?: string; // Default: "#"
num_characters?: number; // Default: 1
timeout_ms?: number; // Default: 5000
}Configuration Options:
allowed_gathered_characters- Characters allowed as input (0-9, #, *). All other characters are ignored. Default:"0123456789"audio_url- URL of audio to play when gathering DTMF input (optional)finish_on_key- Key that signals the end of input. Default:"#"num_characters- Maximum number of characters to gather. Default:1timeout_ms- Milliseconds to wait for input before timing out. Default:5000
Examples:
// Gather a single digit with default settings
return {
type: AiFlowActionType.GATHER_DTMF,
session_id: event.session.id,
};
// Gather a 4-digit PIN with custom prompt
return {
type: AiFlowActionType.GATHER_DTMF,
session_id: event.session.id,
num_characters: 4,
audio_url: "https://example.com/prompts/enter-pin.wav",
finish_on_key: "#",
timeout_ms: 10000,
};
// Gather account number (digits only, up to 10 characters)
return {
type: AiFlowActionType.GATHER_DTMF,
session_id: event.session.id,
allowed_gathered_characters: "0123456789",
num_characters: 10,
finish_on_key: "#",
timeout_ms: 15000,
};
// Menu selection with * and # included
return {
type: AiFlowActionType.GATHER_DTMF,
session_id: event.session.id,
allowed_gathered_characters: "0123456789*#",
num_characters: 1,
audio_url: "https://example.com/prompts/menu.wav",
};Use Cases:
- Collecting PIN codes or account numbers
- Interactive voice menus (IVR)
- Verification codes
- Extension dialing
- Survey responses
BargeIn Action
Manually triggers barge-in (interrupts current playback).
interface AiFlowActionBargeIn {
type: "barge_in";
session_id: string;
}TTS Providers
Configure text-to-speech providers for different voices and languages. The SDK supports both Azure Cognitive Services and ElevenLabs for high-quality voice synthesis.
Azure Cognitive Services
Azure provides a wide range of neural voices across many languages and regions.
interface TtsProviderConfigAzure {
provider: TtsProvider.AZURE;
language?: string; // BCP-47 format (e.g., "en-US", "de-DE")
voice?: string; // Voice name (e.g., "en-US-JennyNeural")
}Examples:
// English (US) - Female
provider: {
provider: TtsProvider.AZURE,
language: "en-US",
voice: "en-US-JennyNeural"
}
// English (GB) - Female
provider: {
provider: TtsProvider.AZURE,
language: "en-GB",
voice: "en-GB-SoniaNeural"
}
// German - Male
provider: {
provider: TtsProvider.AZURE,
language: "de-DE",
voice: "de-DE-ConradNeural"
}
// Spanish - Female
provider: {
provider: TtsProvider.AZURE,
language: "es-ES",
voice: "es-ES-ElviraNeural"
}Popular Azure Voices:
| Language | Voice Name | Gender | Description | | -------- | ------------------ | ------ | ---------------------- | | en-US | en-US-JennyNeural | Female | Friendly, professional | | en-US | en-US-GuyNeural | Male | Clear, neutral | | en-GB | en-GB-SoniaNeural | Female | British, professional | | en-GB | en-GB-RyanNeural | Male | British, friendly | | de-DE | de-DE-KatjaNeural | Female | Professional, clear | | de-DE | de-DE-ConradNeural | Male | Deep, authoritative |
Full Voice List: See Azure TTS documentation for complete list of 400+ voices in 140+ languages.
ElevenLabs
ElevenLabs provides ultra-realistic AI voices optimized for conversational use cases.
interface TtsProviderConfigElevenLabs {
provider: TtsProvider.ELEVEN_LABS;
voice?: string; // Voice ID (e.g., "21m00Tcm4TlvDq8ikWAM")
}Examples:
// Using voice ID
provider: {
provider: TtsProvider.ELEVEN_LABS,
voice: "21m00Tcm4TlvDq8ikWAM" // Rachel
}
// Using default voice (optional)
provider: {
provider: TtsProvider.ELEVEN_LABS
}Available ElevenLabs Voices:
| Voice Name | ID | Description | Verified Locales | | ----------- | -------------------- | ------------------------------------------------------------------------- | ---------------------------------- | | Rachel | 21m00Tcm4TlvDq8ikWAM | Matter-of-fact, personable woman. Great for conversational use cases. | en-US | | Sarah | EXAVITQu4vr4xnSDxMaL | Young adult woman with a confident and warm, mature quality. | en-US, fr-FR, cmn-CN, hi-IN | | Laura | FGY2WhTYpPnrIDTdsKH5 | Young adult female delivers sunny enthusiasm with quirky attitude. | en-US, fr-FR, cmn-CN, de-DE | | George | JBFqnCBsd6RMkjVDRZzb | Warm resonance that instantly captivates listeners. | en-GB, fr-FR, ja-JP, cs-CZ | | Thomas | GBv7mTt0atIp3Br8iCZE | Soft and subdued male, optimal for narrations or meditations. | en-US | | Roger | CwhRBWXzGAHq8TQ4Fs17 | Easy going and perfect for casual conversations. | en-US, fr-FR, de-DE, nl-NL | | Eric | cjVigY5qzO86Huf0OWal | Smooth tenor pitch from a man in his 40s - perfect for agentic use cases. | en-US, fr-FR, de-DE, sk-SK | | Brian | nPczCjzI2devNBz1zQrb | Middle-aged man with resonant and comforting tone. | en-US, cmn-CN, de-DE, nl-NL | | Jessica | cgSgspJ2msm6clMCkdW9 | Young and playful American female, perfect for trendy content. | en-US, fr-FR, ja-JP, cmn-CN, de-DE | | Liam | TX3LPaxmHKxFdv7VOQHJ | Young adult with energy and warmth - suitable for reels and shorts. | en-US, de-DE, cs-CZ, pl-PL, tr-TR | | Alice | Xb7hH8MSUJpSbSDYk0k2 | Clear and engaging, friendly British woman suitable for e-learning. | en-GB, it-IT, fr-FR, ja-JP, pl-PL | | Daniel | onwK4e9ZLuTAKqWW03F9 | Strong voice perfect for professional broadcast or news. | en-GB, de-DE, tr-TR | | Lily | pFZP5JQG7iQjIQuC4Bku | Velvety British female delivers news with warmth and clarity. | it-IT, de-DE, cmn-CN, cs-CZ, nl-NL | | River | SAz9YHcvj6GT2YYXdXww | Relaxed, neutral voice ready for narrations or conversational projects. | en-US, it-IT, fr-FR, cmn-CN | | Charlie | IKne3meq5aSn9XLyUdCD | Young Australian male with confident and energetic voice. | en-AU, cmn-CN, fil-PH | | Aria | 9BWtsMINqrJLrRacOk9x | Middle-aged female with African-American accent. Calm with hint of rasp. | en-US, fr-FR, cmn-CN, tr-TR | | Matilda | XrExE9yKIg1WjnnlVkGX | Professional woman with pleasing alto pitch. Suitable for many use cases. | en-US, it-IT, fr-FR, de-DE | | Will | bIHbv24MWmeRgasZH58o | Conversational and laid back. | en-US, fr-FR, de-DE, cmn-CN, cs-CZ | | Chris | iP95p4xoKVk53GoZ742B | Natural and real, down-to-earth voice great across many use-cases. | en-US, fr-FR, sv-SE, hi-IN | | Bill | pqHfZKP75CvOlQylNhV4 | Friendly and comforting voice ready to narrate stories. | en-US, fr-FR, cmn-CN, de-DE, cs-CZ |
Note: 50+ voices available in total. The SDK includes full TypeScript type definitions for all voice IDs and names.
Choosing a TTS Provider
Use Azure when:
- You need support for many languages (140+ languages available)
- You want consistent quality across all locales
- You need specific regional accents or dialects
- Budget is a primary concern
Use ElevenLabs when:
- You need the most natural, human-like voices
- Conversational quality is critical (phone calls, virtual assistants)
- You're primarily working with English or common European languages
- You want voices with distinct personalities
Barge-In Configuration
Control how users can interrupt the assistant while speaking.
interface BargeInConfig {
strategy: BargeInStrategy;
minimum_characters?: number; // Default: 3
allow_after_ms?: number; // Delay before allowing interruption
}Strategies
BargeInStrategy.NONE
Disables barge-in completely. Audio plays fully without interruption.
barge_in: {
strategy: BargeInStrategy.NONE;
}Use cases:
- Critical information that must be heard
- Legal disclaimers
- Emergency instructions
BargeInStrategy.MANUAL
Allows manual barge-in via API only (no automatic detection).
barge_in: {
strategy: BargeInStrategy.MANUAL;
}Use cases:
- Custom interruption logic
- Button-triggered interruption
- External event-based interruption
BargeInStrategy.MINIMUM_CHARACTERS
Automatically detects barge-in when user speech exceeds character threshold.
barge_in: {
strategy: BargeInStrategy.MINIMUM_CHARACTERS,
minimum_characters: 5, // Trigger after 5 characters
allow_after_ms: 500 // Wait 500ms before allowing interruption
}Use cases:
- Natural conversation flow
- Customer service scenarios
- Interactive voice menus
Example with protection period:
return {
type: AiFlowActionType.SPEAK,
session_id: event.session.id,
text: "Your account number is 1234567890. Please write this down.",
barge_in: {
strategy: BargeInStrategy.MINIMUM_CHARACTERS,
minimum_characters: 10, // Require substantial speech
allow_after_ms: 2000, // Protect first 2 seconds
},
};Integration Guides
Express.js Integration
Complete example with error handling and logging:
import express from "express";
import { AiFlowAssistant } from "@sipgate/ai-flow-sdk";
const app = express();
app.use(express.json());
const assistant = AiFlowAssistant.create({
debug: process.env.NODE_ENV !== "production",
onSessionStart: async (event) => {
return "Welcome! How can I help you today?";
},
onUserSpeak: async (event) => {
// Your conversation logic here
return processUserInput(event.text);
},
onSessionEnd: async (event) => {
await cleanupSession(event.session.id);
},
});
// Webhook endpoint
app.post("/webhook", assistant.express());
// Health check
app.get("/health", (req, res) => {
res.json({ status: "ok" });
});
const PORT = process.env.PORT || 3000;
app.listen(PORT, () => {
console.log(`AI Flow assistant running on port ${PORT}`);
});WebSocket Integration
import WebSocket from "ws";
import { AiFlowAssistant } from "@sipgate/ai-flow-sdk";
const wss = new WebSocket.Server({
port: 8080,
perMessageDeflate: false,
});
const assistant = AiFlowAssistant.create({
onUserSpeak: async (event) => {
return "Hello from WebSocket!";
},
});
wss.on("connection", (ws, req) => {
console.log("New WebSocket connection");
ws.on("message", assistant.ws(ws));
ws.on("error", (error) => {
console.error("WebSocket error:", error);
});
ws.on("close", () => {
console.log("WebSocket connection closed");
});
});
console.log("WebSocket server listening on port 8080");Advanced Example: Customer Service Bot
A more complete example demonstrating state management and routing:
import { AiFlowAssistant, BargeInStrategy } from "@sipgate/ai-flow-sdk";
import express from "express";
// Session state management
const sessions = new Map<string, { state: string; data: any }>();
const assistant = AiFlowAssistant.create({
debug: true,
onSessionStart: async (event) => {
// Initialize session state
sessions.set(event.session.id, {
state: "greeting",
data: { attempts: 0 },
});
return {
type: "speak",
session_id: event.session.id,
text: "Welcome to customer support. How can I help you today? You can ask about billing, technical support, or sales.",
barge_in: {
strategy: BargeInStrategy.MINIMUM_CHARACTERS,
minimum_characters: 3,
},
};
},
onUserSpeak: async (event) => {
const session = sessions.get(event.session.id);
if (!session) return null;
const text = event.text.toLowerCase();
// Intent routing
if (text.includes("billing") || text.includes("invoice")) {
return {
type: "transfer",
session_id: event.session.id,
target_phone_number: "+1234567890",
caller_id_name: "Billing Department",
caller_id_number: "+1234567890",
};
}
if (text.includes("goodbye") || text.includes("bye")) {
return {
type: "speak",
session_id: event.session.id,
text: "Thank you for calling. Have a great day!",
barge_in: { strategy: BargeInStrategy.NONE }, // Don't allow interruption
};
}
if (text.includes("technical") || text.includes("support")) {
session.state = "technical_support";
return "I'll connect you with our technical support team. Please describe your issue.";
}
// Default response
session.data.attempts++;
if (session.data.attempts > 2) {
return "I'm having trouble understanding. Let me transfer you to a representative.";
}
return "I can help with billing, technical support, or sales. Which would you like?";
},
onUserBargeIn: async (event) => {
console.log(`User interrupted: ${event.text}`);
return "Yes, I'm listening.";
},
onSessionEnd: async (event) => {
// Cleanup session state
sessions.delete(event.session.id);
console.log(`Session ${event.session.id} ended`);
},
});
const app = express();
app.use(express.json());
app.post("/webhook", assistant.express());
app.listen(3000, () => {
console.log("Customer service bot running on port 3000");
});Working Without the Assistant Wrapper
If you prefer to work directly with the SDK's event and action system without using the AiFlowAssistant wrapper, you can manually handle events and construct actions.
Complete Event Reference
All events extend the base event structure:
interface BaseEvent {
session: {
id: string; // UUID of the session
account_id: string; // Account identifier
phone_number: string; // Phone number for this flow session
direction?: "inbound" | "outbound"; // Call direction
from_phone_number: string; // Phone number of the caller
to_phone_number: string; // Phone number of the callee
};
}All Event Types
| Event Type | Description | When Triggered |
| ----------------- | --------------------------- | ------------------------------------------ |
| session_start | Call session begins | When a new call is initiated |
| user_speak | User speech detected | After speech-to-text completes (includes barged_in flag) |
| assistant_speak | Assistant finished speaking | After TTS playback completes |
| session_end | Call session ends | When the call terminates |
Event Type Definitions
// session_start
interface AiFlowEventSessionStart {
type: "session_start";
session: {
id: string;
account_id: string;
phone_number: string; // Phone number for this flow session
direction?: "inbound" | "outbound"; // Call direction
from_phone_number: string;
to_phone_number: string;
};
}
// user_speak
interface AiFlowEventUserSpeak {
type: "user_speak";
text: string; // Recognized speech text
barged_in?: boolean; // true if user interrupted assistant
session: SessionInfo;
}
// assistant_speak
interface AiFlowEventAssistantSpeak {
type: "assistant_speak";
text?: string; // Text that was spoken
ssml?: string; // SSML that was used (if applicable)
duration_ms: number; // Duration of speech in milliseconds
speech_started_at: number; // Unix timestamp (ms) when speech started
session: SessionInfo;
}
// session_end
interface AiFlowEventSessionEnd {
type: "session_end";
session: SessionInfo;
}Complete Action Reference
All actions require a session_id and type field:
interface BaseAction {
session_id: string; // UUID from the event's session.id
type: string; // Action type identifier
}All Action Types
| Action Type | Description | Primary Use Case |
| -------------- | --------------------------- | --------------------------------------- |
| speak | Speak text or SSML | Respond to user with synthesized speech |
| audio | Play pre-recorded audio | Play hold music, pre-recorded messages |
| hangup | End the call | Terminate conversation |
| transfer | Transfer to another number | Route to human agent or department |
| dtmf | Send touch-tone digit | Navigate IVR systems |
| gather_dtmf | Gather touch-tone input | Collect PIN codes, menu selections |
| barge_in | Manually interrupt playback | Stop current audio immediately |
Action Type Definitions
// speak - Text-to-speech response
interface AiFlowActionSpeak {
type: "speak";
session_id: string;
// Provide either text OR ssml (not both)
text?: string;
ssml?: string;
// Optional configurations
provider?: {
provider: "azure" | "eleven_labs";
language?: string; // e.g., "en-US", "de-DE"
voice?: string; // Provider-specific voice ID/name
};
barge_in?: {
strategy: "none" | "manual" | "minimum_characters";
minimum_characters?: number; // Default: 3
allow_after_ms?: number; // Delay before allowing interruption
};
}
// audio - Play pre-recorded audio
interface AiFlowActionAudio {
type: "audio";
session_id: string;
audio: string; // Base64 encoded WAV (8kHz, mono, 16-bit PCM)
barge_in?: {
strategy: "none" | "manual" | "minimum_characters";
minimum_characters?: number;
allow_after_ms?: number;
};
}
// hangup - End call
interface AiFlowActionHangup {
type: "hangup";
session_id: string;
}
// transfer - Transfer call
interface AiFlowActionTransfer {
type: "transfer";
session_id: string;
target_phone_number: string; // E.164 format recommended
caller_id_name: string;
caller_id_number: string;
}
// dtmf - Send touch-tone digit
interface AiFlowActionDtmf {
type: "dtmf";
session_id: string;
dtmf: number; // Single digit (0-9)
}
// gather_dtmf - Gather touch-tone input
interface AiFlowActionGatherDtmf {
type: "gather_dtmf";
session_id: string;
allowed_gathered_characters?: string; // Default: "0123456789"
audio_url?: string; // Optional audio to play while gathering
finish_on_key?: string; // Default: "#"
num_characters?: number; // Default: 1
timeout_ms?: number; // Default: 5000
}
// barge_in - Manual interrupt
interface AiFlowActionBargeIn {
type: "barge_in";
session_id: string;
}Direct Integration Example
Here's how to handle events and construct actions without the assistant wrapper:
import express from "express";
import { AiFlowEventType, AiFlowActionType } from "@sipgate/ai-flow-sdk";
const app = express();
app.use(express.json());
app.post("/webhook", async (req, res) => {
const event = req.body;
let action = null;
switch (event.type) {
case "session_start":
action = {
type: AiFlowActionType.SPEAK,
session_id: event.session.id,
text: "Welcome to our service!",
barge_in: {
strategy: "minimum_characters",
minimum_characters: 5,
},
};
break;
case "user_speak":
if (event.text.toLowerCase().includes("menu")) {
// Present IVR menu using DTMF gathering
action = {
type: AiFlowActionType.GATHER_DTMF,
session_id: event.session.id,
num_characters: 1,
audio_url: "https://example.com/prompts/main-menu.wav",
timeout_ms: 10000,
};
} else if (event.barged_in) {
// User interrupted
console.log(`User interrupted with: ${event.text}`);
action = {
type: AiFlowActionType.SPEAK,
session_id: event.session.id,
text: "I'm listening, go ahead.",
};
} else if (event.text.toLowerCase().includes("transfer")) {
action = {
type: AiFlowActionType.TRANSFER,
session_id: event.session.id,
target_phone_number: "+1234567890",
caller_id_name: "Support",
caller_id_number: "+1234567890",
};
} else if (event.text.toLowerCase().includes("goodbye")) {
action = {
type: AiFlowActionType.HANGUP,
session_id: event.session.id,
};
} else {
action = {
type: AiFlowActionType.SPEAK,
session_id: event.session.id,
text: `You said: ${event.text}`,
};
}
break;
case "assistant_speak":
console.log(`Spoke for ${event.duration_ms}ms`);
// Optional: track metrics, no action needed
break;
case "session_end":
console.log(`Session ${event.session.id} ended`);
// Cleanup logic, no action needed
break;
}
// Return action if one was created
if (action) {
res.json(action);
} else {
res.status(204).send();
}
});
app.listen(3000, () => {
console.log("Webhook server listening on port 3000");
});Event-Action Flow Diagram
┌─────────────────┐
│ session_start │──> Respond with speak/audio or do nothing
└─────────────────┘
┌─────────────────┐
│ user_speak │──> Respond with speak/audio/transfer/hangup/dtmf
│ (barged_in?) │ Check barged_in flag for interruptions
└─────────────────┘
┌─────────────────┐
│ assistant_speak │──> Optional: track metrics, trigger next action
└─────────────────┘
┌─────────────────┐
│ session_end │──> Cleanup only, no actions accepted
└─────────────────┘Validation with Zod
The SDK exports Zod schemas for runtime validation:
import { AiFlowEventSchema, AiFlowActionSchema } from "@sipgate/ai-flow-sdk";
// Validate incoming event
try {
const event = AiFlowEventSchema.parse(req.body);
// event is now type-safe
} catch (error) {
console.error("Invalid event:", error);
}
// Validate outgoing action
try {
const action = AiFlowActionSchema.parse({
type: "speak",
session_id: event.session.id,
text: "Hello!",
});
res.json(action);
} catch (error) {
console.error("Invalid action:", error);
}Troubleshooting
Common Issues
WebSocket Connection Errors
If you encounter WebSocket connection issues:
wss.on("connection", (ws, req) => {
ws.on("error", (error) => {
console.error("WebSocket error:", error);
});
ws.on("close", (code, reason) => {
console.log(`Connection closed: ${code} - ${reason}`);
});
ws.on("message", assistant.ws(ws));
});Event Validation Errors
Use Zod schemas to validate incoming events:
import { AiFlowEventSchema } from "@sipgate/ai-flow-sdk";
app.post("/webhook", async (req, res) => {
try {
const event = AiFlowEventSchema.parse(req.body);
const action = await assistant.onEvent(event);
if (action) {
res.json(action);
} else {
res.status(204).send();
}
} catch (error) {
console.error("Invalid event:", error);
res.status(400).json({ error: "Invalid event format" });
}
});Debug Mode
Enable debug logging to see all events and actions:
const assistant = AiFlowAssistant.create({
debug: true, // Logs all events and actions
// ... your handlers
});Audio Format Issues
When using the audio action, ensure your audio is in the correct format:
- Format: WAV
- Sample Rate: 8kHz
- Channels: Mono
- Bit Depth: 16-bit PCM
- Encoding: Base64
// Example: Convert audio file to correct format
import fs from "fs";
const audioBuffer = fs.readFileSync("audio.wav");
const base64Audio = audioBuffer.toString("base64");
return {
type: AiFlowActionType.AUDIO,
session_id: event.session.id,
audio: base64Audio,
};Additional Resources
- Official Documentation: sipgate.de/lp/ai-flow
- Support Email: [email protected]
- GitHub Issues: Report bugs or request features
License
Apache-2.0
Need help? Contact us at [email protected]
