@juspay/breeze-buddy-client-sdk
v0.1.2
Published
Browser SDK for Buddy AI voice agent — WebRTC voice sessions via Daily.co and Pipecat
Readme
@juspay/breeze-buddy-client-sdk
Browser SDK for Buddy AI voice agent — WebRTC voice sessions via Daily.co and Pipecat.
npm install @juspay/breeze-buddy-client-sdkPure TypeScript, zero framework dependencies. Works in React, Vue, Svelte, or vanilla JS.
Stability: pre-1.0. The surface may change between minor versions until
1.0.0.
Table of Contents
- Quick start
- Constructing and starting a session
- Execution modes
- The
Sessionhandle - Two subscription styles — raw vs. typed helpers
- Using raw events
- Listening to the user (typed helpers)
- Making the assistant speak (typed helpers)
- Transcript vs. Speaking
- Wildcard subscription
- Events catalog
- Errors
- Low-level API & reference
Quick start
Two flows, pick whichever matches where your backend lives. Both examples use the raw session.on(event, handler) API — the same pattern is expanded in Using raw events. If you prefer role-filtered subscriptions, Listening to the user and Making the assistant speak show the typed-helper equivalents.
SDK creates the lead (full flow):
import { BuddyClient } from '@juspay/breeze-buddy-client-sdk';
const client = new BuddyClient({
auth: { token: 'your-jwt-token' },
resellerId: 'my-reseller',
merchantId: 'my-merchant'
});
const session = await client.startSession({
templateId: 'f47ac10b-58cc-4372-a567-0e02b2c3d479',
payload: { customer_name: 'John' }
});
session.on('transcript', (entry) => {
if (entry.role === 'user') console.log('user:', entry.text);
if (entry.role === 'assistant') console.log('assistant:', entry.text);
});
// …later
await session.close();Your backend provisions the Daily room (direct join):
import { joinRoom } from '@juspay/breeze-buddy-client-sdk';
const session = await joinRoom({ roomUrl, token });Same Session handle returned by both.
Stream-mode example (deterministic output)
Three calls drive the whole loop when your backend is in DAILY_STREAM mode:
import { joinRoom } from '@juspay/breeze-buddy-client-sdk';
// 1. Join — your backend gave you { roomUrl, token }
const session = await joinRoom({ roomUrl, token });
// 2. Listen to the user — raw 'transcript' event, branch on role
session.on('transcript', (entry) => {
if (entry.role === 'user' && entry.isComplete) {
console.log('user said:', entry.text);
// Decide what the assistant should say next…
}
});
// (optional) Observe the assistant speaking — raw TTS lifecycle events
session.on('tts-start', () => showSpeakingIndicator());
session.on('tts-chunk', (text) => appendWord(text));
session.on('tts-end', () => hideSpeakingIndicator());
// 3. Make the assistant speak — bypasses the LLM in stream mode
await session.assistantSpeak('Hello! How can I help you today?');
await session.close();Constructing and starting a session
new BuddyClient(options)
Create once per authenticated user. Long-lived — reuse across multiple calls.
| ClientOptions | Type | Req. | Description |
| --------------- | ------------ | ---- | -------------------------------------------------------------------- |
| auth | AuthConfig | Yes | { token: string } — short-lived JWT, never embed long-lived tokens |
| resellerId | string | Yes | Must be one of the reseller IDs authorized in your JWT claims |
| merchantId | string | Yes | Must be one of the merchant IDs authorized in your JWT claims |
| baseUrl | string | No | API base URL. Defaults to https://clairvoyance.breezelabs.app |
The JWT carries
reseller_idsandmerchant_idsas authorization lists — a single token may authorize multiple reseller/merchant combos, so you pick one per client.
client.startSession(options)
Creates a lead via the API, then auto-connects WebRTC.
| StartSessionOptions | Type | Req. | Description |
| --------------------- | ------------------------------------ | ---- | ------------------------------------------------------------ |
| templateId | string | Yes | Template UUID |
| payload | Record<string, unknown> | No | Template-specific payload |
| executionMode | 'production' \| 'test' \| 'stream' | No | Defaults to 'production' |
| requestId | string | No | Unique request ID for idempotency. Auto-generated if omitted |
| on | Partial<SessionEventMap> | No | Handlers registered before connect — no events missed |
joinRoom(options) — direct join, no client construction
| JoinRoomOptions | Type | Req. | Description |
| ----------------- | -------------------------- | ---- | ------------------- |
| roomUrl | string | Yes | Daily room URL |
| token | string | Yes | Daily meeting token |
| on | Partial<SessionEventMap> | No | Initial handlers |
No auth, no resellerId, no merchantId — zero API calls, so none of that is needed.
Execution modes
| Mode | Wire | Pipeline | Use for |
| -------------- | -------------- | ------------------------- | --------------------------------------------------------- |
| 'production' | DAILY | STT → LLM → TTS | Normal conversational flow (default) |
| 'test' | DAILY_TEST | STT → LLM → TTS (sandbox) | Sandbox with no telephony side effects |
| 'stream' | DAILY_STREAM | STT → TTS (no LLM) | Deterministic, scripted output — compliance, IVR, handoff |
Pick 'stream' when you want the assistant to say exactly what you tell it to, via session.assistantSpeak(text), without the LLM rewriting it.
The Session handle
Returned by both startSession() and joinRoom(). Everything you can do with a live session is listed here.
Lifecycle & mic
| Method | Description |
| ------------------------- | ---------------------------------------------------------------- |
| getState() | Read-only snapshot of current state |
| close() | End the call, release audio, remove listeners, clear transcripts |
| [Symbol.asyncDispose]() | Alias for close() — enables await using (ES2024+ engines) |
| mute() / unmute() | Mic on/off |
| setMicEnabled(enabled) | Explicit set |
Outbound
| Method | Description |
| ----------------------------- | --------------------------------------------------------------------- |
| assistantSpeak(text) | Send text to TTS. Returns Promise<void> resolving on next tts-end |
| sendMessage(msgType, data?) | Low-level RTVI escape hatch for custom backend handlers |
Events
| Method | Description |
| -------------------------------- | ------------------------------------------------------------------------------------ |
| on(event, handler) | Subscribe to any session event (see Events catalog) |
| off(event, handler) | Unsubscribe |
| onUserTranscript(handler) | Filtered: user transcripts only. Returns Unsubscribe |
| onAssistantTranscript(handler) | Filtered: assistant transcripts only. Returns Unsubscribe |
| onToolCall(handler) | Filtered: tool-call transcripts only. Returns Unsubscribe |
| onUserSpeaking(handler) | User VAD — {start}, {end}. Returns Unsubscribe |
| onAssistantSpeaking(handler) | Assistant TTS lifecycle — {start}, {chunk, text}, {end}. Returns Unsubscribe |
Snapshot shape — getState()
type SessionState = {
status: ConnectionStatus;
isMicEnabled: boolean;
transcripts: TranscriptEntry[];
assistantAudioTrack: MediaStreamTrack | null;
userAudioTrack: MediaStreamTrack | null;
error: string | null;
};await using — automatic cleanup (ES2024+)
await using session = await joinRoom({ roomUrl, token });
// session.close() runs automatically when the block exitsTwo subscription styles — raw vs. typed helpers
The SDK gives you two equivalent ways to react to what happens in a session:
- Raw events —
session.on(eventName, handler)/session.off(eventName, handler). One subscription method, all event names kebab-case. Everything the SDK can tell you is a plain event. This is the primary API. - Typed helpers —
session.onUserTranscript(...),session.onAssistantSpeaking(...), etc. Thin wrappers over the raw events that pre-filter (e.g. only user transcripts) or aggregate (e.g.onAssistantSpeakingmergestts-start/tts-chunk/tts-endinto one handler). Each returns anUnsubscribefunction, so no pairedoffcall needed.
Pick one style and stay consistent. Mixing them inside a single feature (e.g. subscribing to raw 'transcript' for the user and typed onAssistantSpeaking for the assistant) works — but is noise for readers. The rest of this doc shows raw first, then the typed-helper equivalents.
Using raw events
Everything the session surfaces comes through session.on(eventName, handler). All event names are kebab-case, matching the Daily / Pipecat / Web-API convention. Handlers registered via on must be removed with session.off(event, handler) when you no longer want them — or they'll be cleaned up automatically at session.close().
Listening to the user
User text (STT output) and user speech activity (VAD) are two independent streams.
User text — 'transcript' event, branch on role:
session.on('transcript', (entry) => {
if (entry.role !== 'user') return;
updateBubble(entry.id, entry.text, entry.isComplete);
});Transcripts stream in place — the same id is emitted multiple times as text grows, with isComplete: true on the final version.
User speech activity (no text) — VAD events:
session.on('user-speech-start', () => showListeningIndicator());
session.on('user-speech-end', () => hideListeningIndicator());Fires near-instantly when the mic picks up speech, well before STT produces text. Useful for "🎙️ listening" indicators.
Observing the assistant
The assistant surfaces on two different pipeline stages. Both come through raw events.
Assistant text — 'transcript' event with role === 'assistant':
session.on('transcript', (entry) => {
if (entry.role !== 'assistant') return;
renderAssistantText(entry.id, entry.text, entry.isComplete);
});Does not fire in
'stream'mode (no LLM). Use the TTS events below as your text source in stream mode.
Assistant TTS lifecycle — what the user actually hears:
session.on('tts-start', () => showSpeakingIndicator());
session.on('tts-chunk', (text) => appendWord(text));
session.on('tts-end', () => hideSpeakingIndicator());Fires in every execution mode, including 'stream'. See Transcript vs. Speaking for when to pick which.
Tool calls — also on 'transcript' with role === 'tool_call':
session.on('transcript', (entry) => {
if (entry.role !== 'tool_call') return;
console.log('tool invoked:', entry.functionName, 'complete=', entry.isComplete);
});Connection and conversation lifecycle
session.on('state-change', (status) => {
if (status === 'connected') showCallUI();
if (status === 'disconnected') showEndedScreen();
if (status === 'error') showErrorScreen();
});
session.on('connected', () => console.log('WebRTC up'));
session.on('assistant-ready', () => enableInput()); // bot pipeline is live
session.on('disconnected', () => console.log('call ended'));
session.on('error', (message) => showError(message));
// Server-emitted conversation events (Breeze Buddy):
session.on('conversation-start', () => markCallStarted());
session.on('conversation-end', (reason) => logEndReason(reason));
session.on('pipeline-error', (details) => logPipelineError(details));Status graph:
idle → connecting → connected → disconnecting → disconnected
↘ errorMedia tracks and mic
session.on('track-started', (track, local) => attachTrack(track, local));
session.on('track-stopped', (track, local) => detachTrack(track, local));
session.on('mic-change', (enabled) => updateMicUI(enabled));Barge-in (raw-event version)
Pipecat's VAD auto-cancels TTS when the user speaks. Detect the overlap with raw events:
let assistantIsSpeaking = false;
session.on('tts-start', () => {
assistantIsSpeaking = true;
});
session.on('tts-end', () => {
assistantIsSpeaking = false;
});
session.on('user-speech-start', () => {
if (assistantIsSpeaking) handleBargeIn();
});Registering handlers before connect
Both client.startSession({...}) and joinRoom({...}) accept an on map so handlers see every event from 'connecting' onward — no race where a fast 'connected' fires before you subscribe.
const session = await joinRoom({
roomUrl,
token,
on: {
'state-change': (status) => console.log('[state]', status),
transcript: (entry) => appendTranscript(entry),
'tts-start': () => showSpeakingIndicator(),
'tts-end': () => hideSpeakingIndicator()
}
});Listening to the user (typed helpers)
Same underlying events as above, pre-filtered. Each helper returns an Unsubscribe function — call it to remove the handler (no paired off needed).
User text — onUserTranscript
Delivers only entries where role === 'user', so no manual branching:
const unsubscribe = session.onUserTranscript((entry) => {
updateBubble(entry.id, entry.text, entry.isComplete);
});
// …later
unsubscribe();If you only care about the final text:
session.onUserTranscript((entry) => {
if (entry.isComplete) console.log('user said:', entry.text);
});User speech activity — onUserSpeaking
Merges 'user-speech-start' + 'user-speech-end' into one discriminated event:
session.onUserSpeaking((event) => {
if (event.type === 'start') showListeningIndicator();
if (event.type === 'end') hideListeningIndicator();
});No chunk variant — user speech has no text here; that's what onUserTranscript is for.
Making the assistant speak (typed helpers)
Same events as in Using raw events, wrapped for ergonomics.
assistantSpeak(text) — push text to TTS
session.assistantSpeak(text) sends text straight to TTS. In 'stream' mode this bypasses the LLM entirely — text is spoken verbatim.
await session.assistantSpeak('Hello, how can I help you today?');
startListening();
await session.assistantSpeak('Please hold while I transfer you.');
transferCall();Signature & behavior:
session.assistantSpeak(text: string): Promise<void>- Resolves on the next
'tts-end'after sending. - Rejects with
SessionErrorif the session isn't connected or closes before completion. - Rejects with
InvalidRequestErroriftextis empty / whitespace-only. - Text over 2000 chars is truncated (with a console warning).
Observing speech — onAssistantSpeaking
Merges 'tts-start' / 'tts-chunk' / 'tts-end' into one discriminated event. Subscribe once, not per call:
session.onAssistantSpeaking((event) => {
switch (event.type) {
case 'start':
showSpeakingIndicator();
break;
case 'chunk':
appendWord(event.text);
break;
case 'end':
hideSpeakingIndicator();
break;
}
});Fires in every execution mode, including 'stream' (because it's tied to TTS, not the LLM).
Observing assistant text — onAssistantTranscript
Delivers only entries where role === 'assistant' — the LLM's streaming response, before it reaches TTS:
session.onAssistantTranscript((entry) => {
renderAssistantBubble(entry.id, entry.text, entry.isComplete);
});Does not fire in 'stream' mode (no LLM). Use onAssistantSpeaking as your text source there.
Tool calls — onToolCall
Delivers only entries where role === 'tool_call':
session.onToolCall((entry) => {
console.log('tool invoked:', entry.functionName, 'complete=', entry.isComplete);
});Why no per-utterance callback on assistantSpeak?
Pipecat's TTS events carry no correlation ID, and the pipeline can produce TTS for reasons other than your call (server-initiated idle prompts, barge-in interruption, template-baked audio). A callback claiming "these events are for your utterance" would lie about a precision the underlying system doesn't provide. The Promise resolves on "the next tts-end" — honest and scoped; for live observation you subscribe to the global stream via onAssistantSpeaking.
Barge-in detection (typed-helper version)
let assistantIsSpeaking = false;
session.onAssistantSpeaking((e) => {
if (e.type === 'start') assistantIsSpeaking = true;
if (e.type === 'end') assistantIsSpeaking = false;
});
session.onUserSpeaking((e) => {
if (e.type === 'start' && assistantIsSpeaking) handleBargeIn();
});Cancelling TTS — TODO (cross-team)
No client-triggerable way to flush the assistant mid-utterance today. The only cancellation path is automatic VAD-driven barge-in. Programmatic flush requires a new on_client_message handler in clairvoyance (at app/ai/voice/agents/breeze_buddy/agent/__init__.py:650-665, where tts-speak is registered); the SDK side is a 3-line session.cancelSpeech() wrapper once the backend ships.
Transcript vs. Speaking
The assistant has two "what it said" streams that fire at different pipeline stages. Pick by use case, not by feel.
| | onAssistantTranscript / 'transcript' (assistant) | onAssistantSpeaking / 'tts-*' |
| ------------------------------------------------- | ---------------------------------------------------- | ---------------------------------- |
| Source | LLM token stream | TTS pipeline output |
| Fires when | Model is generating text | Audio is being synthesized |
| Stream mode (no LLM) | ❌ never fires | ✅ fires — only text stream |
| Production / test mode | ✅ fires (earlier in the pipeline) | ✅ fires (after TTS begins) |
| Handler receives | Streaming AssistantTranscript | 'start' \| 'chunk' \| 'end' |
| Reflects post-processing? (PII, profanity filter) | No — raw LLM output | Yes — what the user actually hears |
| Use for | Render the model's response as text | Sync UI with actual audio |
Rule of thumb:
- What the model said →
onAssistantTranscript(or'transcript'with role filter) - What the user is hearing →
onAssistantSpeaking(or'tts-*'events) - In stream mode →
onAssistantSpeaking/'tts-*'is your only text stream
Symmetric helpers on the user side: onUserTranscript (STT text) vs onUserSpeaking (VAD activity).
Wildcard subscription
Pass '*' to session.on to receive every other event in one place — useful for logging, analytics, or mirroring the entire session into a state store.
The handler signature is (eventName, ...originalArgs). eventName is excluded from the wildcard namespace (you won't get '*' for '*'), and the args are the original event's args — so casting per-case gives you full type safety.
import type {
ConnectionStatus,
ConversationEndReason,
PipelineErrorDetails,
TranscriptEntry
} from '@juspay/breeze-buddy-client-sdk';
session.on('*', (event, ...args) => {
switch (event) {
// --- Connection ---
case 'connected':
onConnected();
break;
case 'disconnected':
onDisconnected();
break;
case 'error': {
const [message] = args as [string];
showError(message);
break;
}
case 'state-change': {
const [status] = args as [ConnectionStatus];
renderStatus(status);
break;
}
case 'assistant-ready':
enableInput();
break;
// --- Conversation lifecycle (server-emitted) ---
case 'conversation-start':
markCallStarted();
break;
case 'conversation-end': {
const [reason] = args as [ConversationEndReason];
logEndReason(reason);
break;
}
case 'pipeline-error': {
const [details] = args as [PipelineErrorDetails];
logPipelineError(details);
break;
}
// --- Media ---
case 'track-started': {
const [track, local] = args as [MediaStreamTrack, boolean];
attachTrack(track, local);
break;
}
case 'track-stopped': {
const [track, local] = args as [MediaStreamTrack, boolean];
detachTrack(track, local);
break;
}
case 'mic-change': {
const [enabled] = args as [boolean];
updateMicUI(enabled);
break;
}
// --- Speech activity (VAD — no text) ---
case 'user-speech-start':
case 'user-speech-end':
case 'assistant-speech-start':
case 'assistant-speech-end':
markSpeechActivity(event);
break;
// --- TTS lifecycle ---
case 'tts-start':
showSpeakingIndicator();
break;
case 'tts-chunk': {
const [text] = args as [string];
appendWord(text);
break;
}
case 'tts-end':
hideSpeakingIndicator();
break;
// --- Transcripts ---
case 'transcript': {
const [entry] = args as [TranscriptEntry];
if (entry.role === 'user') updateUserBubble(entry);
else if (entry.role === 'assistant') updateAssistantBubble(entry);
else if (entry.role === 'tool_call') logToolCall(entry);
break;
}
// --- Telemetry ---
case 'metrics': {
const [data] = args as [unknown];
pushMetrics(data);
break;
}
}
});The wildcard fires in addition to any specific subscriptions you've made — not instead of them. So you can keep per-event subscriptions for hot paths and use '*' purely for observability.
Events catalog
Reference table of every event. Subscribe via session.on(event, handler) or via options.on on startSession / joinRoom.
Connection
| Event | Handler |
| ------------------- | ------------------------------------ |
| 'connected' | () => void |
| 'disconnected' | () => void |
| 'error' | (message: string) => void |
| 'state-change' | (status: ConnectionStatus) => void |
| 'assistant-ready' | () => void |
Conversation lifecycle (server-emitted)
| Event | Handler |
| ---------------------- | ----------------------------------------- |
| 'conversation-start' | () => void |
| 'conversation-end' | (reason: ConversationEndReason) => void |
| 'pipeline-error' | (details: PipelineErrorDetails) => void |
Media
| Event | Handler |
| ----------------- | --------------------------------------------------- |
| 'track-started' | (track: MediaStreamTrack, local: boolean) => void |
| 'track-stopped' | (track: MediaStreamTrack, local: boolean) => void |
| 'mic-change' | (enabled: boolean) => void |
Speech activity (VAD — no text)
| Event | Handler |
| -------------------------- | ------------ |
| 'user-speech-start' | () => void |
| 'user-speech-end' | () => void |
| 'assistant-speech-start' | () => void |
| 'assistant-speech-end' | () => void |
TTS lifecycle
| Event | Handler |
| ------------- | ------------------------ |
| 'tts-start' | () => void |
| 'tts-chunk' | (text: string) => void |
| 'tts-end' | () => void |
Transcripts & telemetry
| Event | Handler |
| -------------- | ---------------------------------- |
| 'transcript' | (entry: TranscriptEntry) => void |
| 'metrics' | (data: unknown) => void |
Wildcard
| Event | Handler |
| ----- | ----------------------------------------------------------------- |
| '*' | (event: Exclude<SessionEvent, '*'>, ...args: unknown[]) => void |
See Wildcard subscription for a complete switch/case example.
Errors
All errors extend BuddyError. Branch with instanceof — no string code matching.
import {
BuddyError,
AuthenticationError,
APIError,
NetworkError,
TimeoutError,
InvalidRequestError,
SessionError
} from '@juspay/breeze-buddy-client-sdk';
try {
const session = await client.startSession({ templateId, payload });
} catch (err) {
if (err instanceof AuthenticationError) return refreshTokenAndRetry();
if (err instanceof NetworkError || err instanceof TimeoutError) return showRetryBanner();
if (err instanceof APIError) console.error(err.statusCode, err.details);
if (err instanceof BuddyError) console.error(err.message);
}| Class | Thrown when |
| --------------------- | ---------------------------------------------------- |
| BuddyError | Base class — catch-all for SDK errors |
| AuthenticationError | HTTP 401 / 403 |
| APIError | Other non-2xx API responses |
| NetworkError | Fetch failed (offline, DNS, CORS, etc.) |
| TimeoutError | Request exceeded the 30s timeout |
| InvalidRequestError | Invalid SDK usage (e.g. empty assistantSpeak text) |
| SessionError | Session-lifecycle error (e.g. speak before connect) |
Every instance carries .message, optional .statusCode, and optional .details (raw response body).
Low-level API & reference
client.api.createLead(...)
For workflows that don't fit startSession:
const { leadId } = await client.api.createLead({
templateId: 'f47ac10b-58cc-4372-a567-0e02b2c3d479',
payload: { customer_name: 'John' },
executionMode: 'stream'
});session.sendMessage(msgType, data?)
Send raw RTVI messages to custom backend handlers:
session.sendMessage('my-custom-handler', { some: 'data' });Exports
// Classes
export {
BuddyClient,
BuddyError,
AuthenticationError,
APIError,
NetworkError,
TimeoutError,
InvalidRequestError,
SessionError
};
// Functions
export { joinRoom };
// Types
export type {
ClientOptions,
StartSessionOptions,
JoinRoomOptions,
Session,
SessionState,
SessionEvent,
SessionEventMap,
WildcardHandler,
Unsubscribe,
ConnectionStatus,
ExecutionMode,
TranscriptEntry,
UserTranscript,
AssistantTranscript,
ToolCallTranscript,
ConversationEndReason,
PipelineErrorDetails,
AssistantSpeakingEvent,
UserSpeakingEvent,
CreateLeadOptions,
LeadResult,
API
};