@lautmaler/jovo-platform-aiflow
v0.3.4
Published
Sipgate AI Flow integration for Jovo Framework
Readme
Jovo Platform Plugin for Sipgate AI Flow
This plugin allows you to build voice applications for the Sipgate AI Flow platform with the Jovo Framework.
It targets Sipgate AI Flow v1.11.0.
Breaking changes in 0.3.0
DTMF is now an incoming event, not an output. The
dtmfaction andDtmfOutputhave been removed (the action never existed in the AI Flow API). Handle keypad input via thedtmf_receivedevent (InputType.DTMF_RECEIVED, digit onthis.$input.digit).Barge-in arrives as
user_speakwithbarged_in: true(routed toInputType.USER_BARGE_IN) rather than a dedicated event.SpeakOutput's TTS option istts(earlier docs incorrectly showedprovider).New events now have dedicated
InputTypes instead of falling through toNOTIFICATION_EVENT_UNHANDLED.user_speech_started→InputType.USER_SPEECH_STARTED,dtmf_received→InputType.DTMF_RECEIVED,sms_failed→InputType.SMS_FAILED. ⚠️ Upgrading from 0.2.x: if your app relied on these arriving asNOTIFICATION_EVENT_UNHANDLED(or had no handler for them), you must now add an explicit handler for each event you don't act on — otherwise it routes to yourUNHANDLEDhandler. This matters most foruser_speech_started, which fires on voice onset (no transcript) at the start of every caller turn; without a handler the bot will respond/speak spuriously. Add a no-op, e.g.:@Handle({ types: [InputType.USER_SPEECH_STARTED] }) userSpeechStarted() { return this.$send(EmptyOutput); }
New: mix_audio, send_sms, configure_transcription, configure_voice_to_voice outputs; transfer timeout; speak vad; Kugelaudio TTS; immediate barge-in; multi-action array responses; optional X-API-TOKEN auth; session.direction for outbound calls.
Installation
npm install @lautmaler/jovo-platform-aiflowQuick Start
Basic Setup
import { App } from '@jovotech/framework';
import { SipgateAiflowPlatform } from '@lautmaler/jovo-platform-aiflow';
const app = new App({
plugins: [
new SipgateAiflowPlatform({
apiKey: process.env.AIFLOW_API_KEY, // Optional: Sipgate API key
debug: true,
}),
],
});The platform integrates with Jovo's standard server adapters to handle HTTP POST requests from Sipgate AI Flow.
Event Types
The platform handles all Sipgate AI Flow v1.11.0 event types and maps them to Jovo's input system:
| Event Type | Description | Jovo InputType |
|------------|-------------|----------------|
| session_start | Call session begins | InputType.Launch |
| user_speech_started | Speech onset detected (WebSocket only, no action) | InputType.USER_SPEECH_STARTED |
| user_speak | User speech detected | InputType.Text |
| user_speak with barged_in: true | User interrupted the assistant | InputType.USER_BARGE_IN |
| user_barge_in (legacy) | Dedicated interruption event | InputType.USER_BARGE_IN |
| dtmf_received | Keypad digit pressed (0–9, *, #) | InputType.DTMF_RECEIVED |
| assistant_speak | Assistant started speaking | InputType.ASSISTANT_SPEAK |
| assistant_speech_ended | Assistant finished speaking | InputType.ASSISTANT_SPEECH_ENDED |
| user_input_timeout | No user speech within the configured timeout | InputType.USER_INPUT_TIMEOUT |
| sms_failed | A send_sms action failed | InputType.SMS_FAILED |
| session_end | Call session ends (no action accepted) | InputType.SESSION_END |
Note: Barge-in events can arrive either as a dedicated user_barge_in event or as a regular user_speak event with barged_in: true. Both are mapped to InputType.USER_BARGE_IN automatically, so your handler covers both cases.
Handling Events
import { BaseComponent, Component, Handle } from '@jovotech/framework';
import { InputType } from '@lautmaler/jovo-platform-aiflow';
@Component()
export class GlobalComponent extends BaseComponent {
@Handle({
types: [InputType.SESSION_START],
})
sessionStart() {
console.log('New call session started');
// Initialize session data
}
@Handle({
types: [InputType.SESSION_END],
})
sessionEnd() {
console.log('Call session ended');
// Cleanup
this.$session.state = [];
return this.$send('{}');
}
@Handle({
types: [InputType.USER_BARGE_IN],
})
userBargeIn() {
console.log('User interrupted during playback');
// Handle interruption — see "Barge-In Handling" section below
}
}Output Types
All output types require a sessionId parameter. You can access the session ID from this.$session.sessionInfo.id:
SpeakOutput
Send text-to-speech output to the caller.
import { SpeakOutput } from '@lautmaler/jovo-platform-aiflow';
@Intents('HelpIntent')
helpHandler() {
const sessionId = (this.$session as any).sessionInfo?.id;
return this.$send(SpeakOutput, {
message: 'How can I help you today?',
sessionId,
// Optional parameters:
ssml: '<speak>How can I <emphasis>help</emphasis> you?</speak>',
tts: {
provider: 'azure', // 'azure' | 'eleven_labs' | 'kugelaudio'
language: 'en-US',
voice: 'en-US-JennyNeural',
},
bargeIn: {
strategy: 'immediate', // 'immediate' | 'minimum_characters' | 'manual' | 'none'
allow_after_ms: 500,
},
userInputTimeoutSec: 5, // fire a user_input_timeout event after 5s of silence
vad: { end_of_turn_silence_ms: 1000 }, // tune end-of-turn detection
});
}Options:
message(string | string[]): Text to speaksessionId(string, required): Session identifierssml(string, optional): SSML markup (overrides message)tts(TtsConfig, optional): TTS provider configuration (Azure, ElevenLabs or Kugelaudio)bargeIn(BargeInConfig, optional): Barge-in behavioruserInputTimeoutSec(number, optional): Seconds to wait for the caller before firinguser_input_timeoutvad(VadConfig, optional): End-of-turn silence tuning (end_of_turn_silence_ms)
HangupOutput
End the call.
import { HangupOutput } from '@lautmaler/jovo-platform-aiflow';
@Intents('EndConversationIntent')
async endConversation() {
const sessionId = (this.$session as any).sessionInfo?.id;
await this.$send(SpeakOutput, {
message: 'Goodbye!',
sessionId,
});
return this.$send(HangupOutput, { sessionId });
}Options:
sessionId(string, required): Session identifier
TransferOutput
Transfer the call to another number.
import { TransferOutput } from '@lautmaler/jovo-platform-aiflow';
@Intents('HumanOperatorIntent')
async transferToOperator() {
const sessionId = (this.$session as any).sessionInfo?.id;
await this.$send(SpeakOutput, {
message: 'Transferring you to an operator.',
sessionId,
});
return this.$send(TransferOutput, {
sessionId,
targetPhoneNumber: '491234567890',
callerIdName: 'Support',
callerIdNumber: '490987654321',
timeout: 30, // optional: fall back to the agent if not answered within 30s
});
}Options:
sessionId(string, required): Session identifiertargetPhoneNumber(string, required): Phone number to transfer to (E.164 without leading+)callerIdName(string, required): Caller ID namecallerIdNumber(string, required): Caller ID numbertimeout(number, optional): Transfer timeout in seconds (5–120). Enables transfer fallback — if the target doesn't answer, AI Flow re-emitssession_startwith the samesession.id.
AudioOutput
Play a base64-encoded WAV audio file.
import { AudioOutput } from '@lautmaler/jovo-platform-aiflow';
@Intents('PlayMusicIntent')
playMusic() {
const sessionId = (this.$session as any).sessionInfo?.id;
const audioBase64 = '...'; // Base64 encoded WAV
return this.$send(AudioOutput, {
audio: audioBase64,
sessionId,
bargeIn: {
strategy: 'minimum_characters',
minimum_characters: 5,
},
});
}Options:
audio(string, required): Base64-encoded WAV audiosessionId(string, required): Session identifierbargeIn(BargeInConfig, optional): Barge-in behavior
MixAudioOutput
Loop a background sound (e.g. café, office) under the call. Sending another MixAudioOutput replaces the active loop; sending { stop: true } removes it. The loop is dropped automatically on session end.
import { MixAudioOutput } from '@lautmaler/jovo-platform-aiflow';
return this.$send(MixAudioOutput, {
sessionId,
audio: ambientBase64, // WAV 16kHz/mono/16-bit, base64
volume: 0.3, // 0.0–1.0, default 0.5
});
// later: this.$send(MixAudioOutput, { sessionId, stop: true });Options: sessionId (required), audio (base64 WAV, required unless stop), volume (0.0–1.0), stop (boolean).
SendSmsOutput
Send an SMS to the caller while the call is active. Delivery failures arrive as a sms_failed event.
import { SendSmsOutput } from '@lautmaler/jovo-platform-aiflow';
return this.$send(SendSmsOutput, {
sessionId,
phoneNumber: '4915790000687', // E.164
text: 'Your confirmation code is ABC123',
});Options: sessionId (required), phoneNumber (E.164, required), text (required).
ConfigureTranscriptionOutput
Change the STT provider and/or recognition language(s) mid-call without hanging up. Uses full-replace semantics.
import { ConfigureTranscriptionOutput, TranscriptionProvider } from '@lautmaler/jovo-platform-aiflow';
return this.$send(ConfigureTranscriptionOutput, {
sessionId,
provider: TranscriptionProvider.DEEPGRAM, // optional: AZURE | DEEPGRAM | ELEVEN_LABS | SIPGATE_QWEN | SIPGATE_PARAKEET
languages: ['de-DE', 'en-US'], // BCP-47, 1–4 entries; omit to reset to provider default
customVocabulary: ['sipgate'],
vad: { end_of_turn_silence_ms: 1200 },
});Options: sessionId (required), provider, languages, customVocabulary, vad (all optional — sending none is a no-op).
ConfigureVoiceToVoiceOutput
Preview feature. Switch to end-to-end speech-to-speech mode. Send a ConfigureTranscriptionOutput to revert to the standard STT→text→TTS pipeline.
import { ConfigureVoiceToVoiceOutput } from '@lautmaler/jovo-platform-aiflow';
return this.$send(ConfigureVoiceToVoiceOutput, { sessionId });Options: sessionId (required).
BargeInOutput
Acknowledge a barge-in event.
import { BargeInOutput } from '@lautmaler/jovo-platform-aiflow';
@Handle({
types: [InputType.USER_BARGE_IN],
})
handleBargeIn() {
const sessionId = (this.$session as any).sessionInfo?.id;
return this.$send(BargeInOutput, { sessionId });
}Options:
sessionId(string, required): Session identifier
Multiple Actions
AI Flow executes an array of actions in sequence. Send several outputs in one turn and the platform serializes them as a JSON array (a single output is still serialized as a bare object):
// Start ambient sound, greet, then hang up — delivered as one array
await this.$send(MixAudioOutput, { sessionId, audio: ambientBase64, volume: 0.3 });
await this.$send(SpeakOutput, { message: 'Welcome!', sessionId });
return this.$send(HangupOutput, { sessionId });Consecutive speak-only outputs are merged into a single utterance (one TTS call). Mixed action types are kept as a sequence and serialized as a JSON array. On HTTP, all outputs of a turn are batched into a single response. On a WebSocket connection (async mode), each
$send()is delivered immediately; batch actions into one array by passing them to a single$send()if you need them in one frame.
Barge-In Handling
Barge-in occurs when a caller interrupts the assistant while it's speaking. The platform provides tools for both configuring when barge-in is allowed and handling the interruption intelligently.
Barge-In Strategies
Configure how sensitive barge-in detection should be via BargeInConfig:
| Strategy | Latency | Reliability | Use Case |
|----------|---------|-------------|----------|
| immediate | 20-100ms | May trigger on noise | Natural conversations |
| minimum_characters | 50-200ms | Very reliable | Balanced approach (recommended) |
| manual | N/A | Perfect | Custom logic |
| none | N/A | Perfect | Critical info only (legal disclaimers, confirmation codes) |
import { BargeInStrategy } from '@lautmaler/jovo-platform-aiflow';
// Most natural — triggers on voice detection
{ strategy: BargeInStrategy.IMMEDIATE, allow_after_ms: 500 }
// Balanced — waits for a few characters
{ strategy: BargeInStrategy.MINIMUM_CHARACTERS, minimum_characters: 3 }
// For critical information — higher threshold
{ strategy: BargeInStrategy.MINIMUM_CHARACTERS, minimum_characters: 10, allow_after_ms: 3000 }
// No interruption allowed
{ strategy: BargeInStrategy.NONE }The allow_after_ms option prevents accidental triggers during the first milliseconds of playback.
Classifying Barge-In Intent
Use BargeInHelper to classify what the user meant by their interruption:
import { BargeInHelper, BargeInIntent } from '@lautmaler/jovo-platform-aiflow';
@Handle({ types: [InputType.USER_BARGE_IN] })
handleBargeIn() {
const text = this.$request.getInputText();
const intent = BargeInHelper.classify(text);
switch (intent) {
case BargeInIntent.NOISE:
case BargeInIntent.STOP:
// Short noise or "stop"/"quiet" — return empty (204)
return this.$send(EmptyOutput);
case BargeInIntent.ACKNOWLEDGMENT:
// "Got it", "okay", "yes" — brief follow-up
return this.$send(SpeakOutput, { message: 'Great! What else can I help with?' });
case BargeInIntent.REDIRECT:
// "Actually", "wait", "hold on" — let user continue
return this.$send(SpeakOutput, { message: 'Sure, go ahead.' });
case BargeInIntent.CORRECTION:
case BargeInIntent.NEW_QUESTION:
// Substantial input (>25 chars or contains "?") — process as new input
// Re-route to your normal speech handling logic
return this.handleUserInput(text);
}
}Convenience methods:
BargeInHelper.shouldSilentAcknowledge(text)— returnstruefor noise/stop (return 204)BargeInHelper.isSubstantialInput(text)— returnstruefor new questions/corrections (process normally)
Tracking Impatient Users
The session automatically tracks how many times a user has barged in via bargeInCount. Use this to adapt response brevity:
@Handle({ types: [InputType.USER_BARGE_IN] })
handleBargeIn() {
const session = this.$session as SipgateAiflowSession;
const text = this.$request.getInputText();
// Frequent interrupters get shorter responses
if (session.bargeInCount > 3) {
if (BargeInHelper.shouldSilentAcknowledge(text)) {
return this.$send(EmptyOutput);
}
return this.$send(SpeakOutput, { message: 'Yes?' });
}
// Normal barge-in handling for patient users
// ...
}Checking barged_in on Input
The barged_in flag is available on AiflowInput for any handler that needs to know if the current input was an interruption:
const input = this.$input as AiflowInput;
if (input.barged_in) {
// This input interrupted the assistant
}Cancelling Async Operations
When using async mode (SipgateAiflowAsync), cancel pending operations on barge-in so the user isn't waiting for stale responses:
@Handle({ types: [InputType.USER_BARGE_IN] })
handleBargeIn() {
if (this.jovo instanceof SipgateAiflowAsync) {
this.jovo.cancelPendingOperations();
}
// Handle the interruption...
}Use this.jovo.abortSignal in your own async operations to respect cancellation:
const response = await fetch(url, { signal: this.jovo.abortSignal });Session Information
Session information from Sipgate AI Flow is available via this.$session.sessionInfo:
interface SessionInfo {
id: string; // UUID of the session
account_id: string; // Account identifier
phone_number: string; // Phone number associated with this flow
direction?: 'inbound' | 'outbound'; // Call direction (use for outbound greetings)
from_phone_number?: string; // Caller's number
to_phone_number?: string; // Callee's number
}The call direction is also available directly on the request via request.getDirection().
Example usage:
LAUNCH() {
const sessionInfo = (this.$session as any).sessionInfo;
if (sessionInfo) {
console.log(`Call from: ${sessionInfo.phone_number}`);
console.log(`Session ID: ${sessionInfo.id}`);
}
return this.$redirect(MainComponent);
}Platform Configuration
interface SipgateAiflowConfig {
apiKey?: string; // Shared secret: when set, requests must carry a matching X-API-TOKEN header
debug?: boolean; // Enable debug logging
tts?: TtsConfig; // Default TTS provider settings
bargeIn?: BargeInConfig; // Default barge-in configuration
userInputTimeoutSec: number; // Timeout for user input (default: 4)
asyncOutput?: boolean; // Enable/disable async output (auto-detects by default)
plugins?: Plugin[]; // Additional platform-specific plugins
}Default Configuration
The platform provides sensible defaults:
{
tts: {
provider: 'azure',
language: 'en-US',
voice: 'en-US-JennyNeural',
},
bargeIn: {
strategy: 'minimum_characters',
minimum_characters: 3,
},
userInputTimeoutSec: 4,
debug: false,
}SpeakOutput applies the configured tts, bargeIn and userInputTimeoutSec as fallbacks when the corresponding option is not passed; per-output options always take precedence.
Custom Configuration
new SipgateAiflowPlatform({
apiKey: process.env.AIFLOW_API_KEY,
debug: true,
tts: {
provider: 'azure',
language: 'de-DE',
voice: 'de-DE-KatjaNeural',
},
bargeIn: {
strategy: 'immediate',
allow_after_ms: 500,
},
userInputTimeoutSec: 5,
})Authentication (X-API-TOKEN)
When apiKey is set, the platform validates the incoming X-API-TOKEN header against it and rejects mismatched requests. Leave apiKey unset to disable the check. For WebSocket connections, pass the upgrade-request headers as the third argument to WebSocketServer so the token can be validated:
wss.on('connection', (ws, req) => {
ws.on('message', async (data) => {
await app.handle(new WebSocketServer(ws, data, req.headers));
});
});Async Output (WebSocket)
When using a WebSocket connection, the platform supports async output mode where each $send() call delivers the response immediately over the WebSocket, instead of batching all outputs into a single HTTP response.
How It Works
- In sync mode (default for HTTP): All
$send()calls are collected and returned as a single response at the end of the handler. - In async mode (default for WebSocket): Each
$send()call immediately sends its output over the WebSocket connection. The final HTTP response is suppressed to avoid duplicate delivery.
This is useful for scenarios like sending a speak action followed by a hangup, where each action should be delivered to Sipgate AI Flow as soon as it's ready.
Configuration
Async output is controlled via the asyncOutput config option:
new SipgateAiflowPlatform({
asyncOutput: true, // Force enable async output
// asyncOutput: false, // Force disable async output
// asyncOutput: undefined, // Auto-detect (default)
})| Value | Behavior |
|-------|----------|
| undefined (default) | Auto-detect: enabled when using WebSocketServer, disabled for HTTP servers |
| true | Force enable async output regardless of server type |
| false | Force disable async output, even when using WebSocket |
Example: Multiple Outputs with WebSocket
@Intents('EndConversationIntent')
async endConversation() {
const sessionId = (this.$session as any).sessionInfo?.id;
// With async mode, this is sent immediately over WebSocket
await this.$send(SpeakOutput, {
message: 'Goodbye!',
sessionId,
});
// This is also sent immediately, right after the speak
return this.$send(HangupOutput, { sessionId });
}WebSocket Server Setup
To use async output with the built-in WebSocketServer:
import { WebSocketServer } from '@lautmaler/jovo-platform-aiflow';
import WebSocket from 'ws';
const wss = new WebSocket.Server({ port: 3000 });
wss.on('connection', (ws) => {
ws.on('message', async (data) => {
await app.handle(new WebSocketServer(ws, data));
});
});When WebSocketServer is used, async output is enabled automatically — no additional configuration needed.
Server Integration
The platform works with Jovo's standard Express.js server adapter:
Express.js with Jovo Webhook
import { Webhook } from '@jovotech/server-express';
import { app } from './app'; // Your Jovo app
const port = process.env.JOVO_PORT || 3000;
(async () => {
await app.initialize();
Webhook.listen(port, () => {
console.log(`Server listening on port ${port}`);
});
Webhook.post('/webhook', async (req, res) => {
await app.handle(new ExpressJs(req, res));
});
})();Custom Express Server
If you need a custom Express server adapter, extend Jovo's Server class:
import { AnyObject, Server } from '@jovotech/framework';
import type { Request, Response } from 'express';
export class ExpressJsExt extends Server {
constructor(public req: Request, public res: Response) {
super();
}
fail(error: Error): void {
if (!this.res.headersSent) {
this.res.status(500).json({
code: 500,
msg: error.message,
});
}
}
getRequestObject(): AnyObject {
return this.req.body;
}
setResponse(response: unknown): Promise<void> {
return new Promise<void>((resolve) => {
if (!this.res.headersSent) {
this.res.json(response);
}
resolve();
});
}
}Complete Example
import { App } from '@jovotech/framework';
import {
SipgateAiflowPlatform,
SpeakOutput,
HangupOutput,
InputType,
} from '@lautmaler/jovo-platform-aiflow';
import { BaseComponent, Component, Intents, Handle } from '@jovotech/framework';
// Define your component
@Component()
class MyComponent extends BaseComponent {
LAUNCH() {
const sessionInfo = (this.$session as any).sessionInfo;
return this.$send(SpeakOutput, {
message: 'Welcome to my voice app!',
sessionId: sessionInfo?.id,
});
}
@Intents('HelpIntent')
helpHandler() {
const sessionId = (this.$session as any).sessionInfo?.id;
return this.$send(SpeakOutput, {
message: 'I can help you with various tasks.',
sessionId,
});
}
@Intents('EndConversationIntent')
async endConversation() {
const sessionId = (this.$session as any).sessionInfo?.id;
await this.$send(SpeakOutput, {
message: 'Goodbye!',
sessionId,
});
return this.$send(HangupOutput, { sessionId });
}
@Handle({
types: [InputType.SESSION_END],
})
sessionEnd() {
this.$session.state = [];
return this.$send('{}');
}
}
// Create your app
const app = new App({
components: [MyComponent],
plugins: [
new SipgateAiflowPlatform({
apiKey: process.env.AIFLOW_API_KEY,
debug: true,
}),
],
});
export { app };Testing
The platform supports Jovo's TestSuite for unit testing:
import { TestSuite } from '@jovotech/framework';
import { InputType } from '@lautmaler/jovo-platform-aiflow';
const testSuite = new TestSuite({
stage: 'dev',
locale: 'en-US',
});
beforeEach(() => {
testSuite.$session.sessionInfo = {
id: 'test-session-id',
account_id: 'test-account',
phone_number: '+1234567890',
};
});
test('should handle help intent', async () => {
await testSuite.run({
type: InputType.Launch,
});
const { output } = await testSuite.run({
intent: 'HelpIntent',
});
expect(output.length).toBe(1);
expect(output[0]?.platforms?.['sipgate-aiflow']?.text).toBeDefined();
});Sample Requests
Example event payloads can be found in the sample-requests/ directory:
SessionStart.json- Call session beginsUserSpeak.json- User speech detectedUserBargeIn.json- User interrupted the assistant (dedicated legacy event)UserSpeakBargedIn.json- User speech with thebarged_inflagUserSpeechStarted.json- Speech onset detected (WebSocket only)Dtmf.json- Keypad digit pressed (dtmf_received)AssistantSpeak.json- Assistant started speakingAssistantSpeechEnded.json- Assistant finished speakingUserInputTimeout.json- No user speech within timeoutSmsFailed.json- Asend_smsaction failedSessionEnd.json- Call session ends
Links
License
See LICENSE file
