@juspay/breeze-buddy-client-sdk

v0.1.2

Published

8 days ago

Browser SDK for Buddy AI voice agent — WebRTC voice sessions via Daily.co and Pipecat

@juspay/breeze-buddy-client-sdk

Browser SDK for Buddy AI voice agent — WebRTC voice sessions via Daily.co and Pipecat.

npm install @juspay/breeze-buddy-client-sdk

Pure TypeScript, zero framework dependencies. Works in React, Vue, Svelte, or vanilla JS.

Stability: pre-1.0. The surface may change between minor versions until 1.0.0.

Quick start

Two flows, pick whichever matches where your backend lives. Both examples use the raw session.on(event, handler) API — the same pattern is expanded in Using raw events. If you prefer role-filtered subscriptions, Listening to the user and Making the assistant speak show the typed-helper equivalents.

SDK creates the lead (full flow):

import { BuddyClient } from '@juspay/breeze-buddy-client-sdk';

const client = new BuddyClient({
  auth: { token: 'your-jwt-token' },
  resellerId: 'my-reseller',
  merchantId: 'my-merchant'
});

const session = await client.startSession({
  templateId: 'f47ac10b-58cc-4372-a567-0e02b2c3d479',
  payload: { customer_name: 'John' }
});

session.on('transcript', (entry) => {
  if (entry.role === 'user') console.log('user:', entry.text);
  if (entry.role === 'assistant') console.log('assistant:', entry.text);
});

// …later
await session.close();

Your backend provisions the Daily room (direct join):

import { joinRoom } from '@juspay/breeze-buddy-client-sdk';

const session = await joinRoom({ roomUrl, token });

Same Session handle returned by both.

Stream-mode example (deterministic output)

Three calls drive the whole loop when your backend is in DAILY_STREAM mode:

import { joinRoom } from '@juspay/breeze-buddy-client-sdk';

// 1. Join — your backend gave you { roomUrl, token }
const session = await joinRoom({ roomUrl, token });

// 2. Listen to the user — raw 'transcript' event, branch on role
session.on('transcript', (entry) => {
  if (entry.role === 'user' && entry.isComplete) {
    console.log('user said:', entry.text);
    // Decide what the assistant should say next…
  }
});

// (optional) Observe the assistant speaking — raw TTS lifecycle events
session.on('tts-start', () => showSpeakingIndicator());
session.on('tts-chunk', (text) => appendWord(text));
session.on('tts-end', () => hideSpeakingIndicator());

// 3. Make the assistant speak — bypasses the LLM in stream mode
await session.assistantSpeak('Hello! How can I help you today?');

await session.close();

Constructing and starting a session

`new BuddyClient(options)`

Create once per authenticated user. Long-lived — reuse across multiple calls.

| ClientOptions | Type | Req. | Description | | --------------- | ------------ | ---- | -------------------------------------------------------------------- | | auth | AuthConfig | Yes | { token: string } — short-lived JWT, never embed long-lived tokens | | resellerId | string | Yes | Must be one of the reseller IDs authorized in your JWT claims | | merchantId | string | Yes | Must be one of the merchant IDs authorized in your JWT claims | | baseUrl | string | No | API base URL. Defaults to https://clairvoyance.breezelabs.app |

The JWT carries reseller_ids and merchant_ids as authorization lists — a single token may authorize multiple reseller/merchant combos, so you pick one per client.

`client.startSession(options)`

Creates a lead via the API, then auto-connects WebRTC.

| StartSessionOptions | Type | Req. | Description | | --------------------- | ------------------------------------ | ---- | ------------------------------------------------------------ | | templateId | string | Yes | Template UUID | | payload | Record<string, unknown> | No | Template-specific payload | | executionMode | 'production' \| 'test' \| 'stream' | No | Defaults to 'production' | | requestId | string | No | Unique request ID for idempotency. Auto-generated if omitted | | on | Partial<SessionEventMap> | No | Handlers registered before connect — no events missed |

`joinRoom(options)` — direct join, no client construction

| JoinRoomOptions | Type | Req. | Description | | ----------------- | -------------------------- | ---- | ------------------- | | roomUrl | string | Yes | Daily room URL | | token | string | Yes | Daily meeting token | | on | Partial<SessionEventMap> | No | Initial handlers |

No auth, no resellerId, no merchantId — zero API calls, so none of that is needed.

Execution modes

| Mode | Wire | Pipeline | Use for | | -------------- | -------------- | ------------------------- | --------------------------------------------------------- | | 'production' | DAILY | STT → LLM → TTS | Normal conversational flow (default) | | 'test' | DAILY_TEST | STT → LLM → TTS (sandbox) | Sandbox with no telephony side effects | | 'stream' | DAILY_STREAM | STT → TTS (no LLM) | Deterministic, scripted output — compliance, IVR, handoff |

Pick 'stream' when you want the assistant to say exactly what you tell it to, via session.assistantSpeak(text), without the LLM rewriting it.

The `Session` handle

Returned by both startSession() and joinRoom(). Everything you can do with a live session is listed here.

Lifecycle & mic

| Method | Description | | ------------------------- | ---------------------------------------------------------------- | | getState() | Read-only snapshot of current state | | close() | End the call, release audio, remove listeners, clear transcripts | | [Symbol.asyncDispose]() | Alias for close() — enables await using (ES2024+ engines) | | mute() / unmute() | Mic on/off | | setMicEnabled(enabled) | Explicit set |

Outbound

| Method | Description | | ----------------------------- | --------------------------------------------------------------------- | | assistantSpeak(text) | Send text to TTS. Returns Promise<void> resolving on next tts-end | | sendMessage(msgType, data?) | Low-level RTVI escape hatch for custom backend handlers |

Events

| Method | Description | | -------------------------------- | ------------------------------------------------------------------------------------ | | on(event, handler) | Subscribe to any session event (see Events catalog) | | off(event, handler) | Unsubscribe | | onUserTranscript(handler) | Filtered: user transcripts only. Returns Unsubscribe | | onAssistantTranscript(handler) | Filtered: assistant transcripts only. Returns Unsubscribe | | onToolCall(handler) | Filtered: tool-call transcripts only. Returns Unsubscribe | | onUserSpeaking(handler) | User VAD — {start}, {end}. Returns Unsubscribe | | onAssistantSpeaking(handler) | Assistant TTS lifecycle — {start}, {chunk, text}, {end}. Returns Unsubscribe |

Snapshot shape — `getState()`

type SessionState = {
  status: ConnectionStatus;
  isMicEnabled: boolean;
  transcripts: TranscriptEntry[];
  assistantAudioTrack: MediaStreamTrack | null;
  userAudioTrack: MediaStreamTrack | null;
  error: string | null;
};

`await using` — automatic cleanup (ES2024+)

await using session = await joinRoom({ roomUrl, token });
// session.close() runs automatically when the block exits

Two subscription styles — raw vs. typed helpers

The SDK gives you two equivalent ways to react to what happens in a session:

Raw events — session.on(eventName, handler) / session.off(eventName, handler). One subscription method, all event names kebab-case. Everything the SDK can tell you is a plain event. This is the primary API.
Typed helpers — session.onUserTranscript(...), session.onAssistantSpeaking(...), etc. Thin wrappers over the raw events that pre-filter (e.g. only user transcripts) or aggregate (e.g. onAssistantSpeaking merges tts-start / tts-chunk / tts-end into one handler). Each returns an Unsubscribe function, so no paired off call needed.

Pick one style and stay consistent. Mixing them inside a single feature (e.g. subscribing to raw 'transcript' for the user and typed onAssistantSpeaking for the assistant) works — but is noise for readers. The rest of this doc shows raw first, then the typed-helper equivalents.

Using raw events

Everything the session surfaces comes through session.on(eventName, handler). All event names are kebab-case, matching the Daily / Pipecat / Web-API convention. Handlers registered via on must be removed with session.off(event, handler) when you no longer want them — or they'll be cleaned up automatically at session.close().

Listening to the user

User text (STT output) and user speech activity (VAD) are two independent streams.

User text — 'transcript' event, branch on role:

session.on('transcript', (entry) => {
  if (entry.role !== 'user') return;
  updateBubble(entry.id, entry.text, entry.isComplete);
});

Transcripts stream in place — the same id is emitted multiple times as text grows, with isComplete: true on the final version.

User speech activity (no text) — VAD events:

session.on('user-speech-start', () => showListeningIndicator());
session.on('user-speech-end', () => hideListeningIndicator());

Fires near-instantly when the mic picks up speech, well before STT produces text. Useful for "🎙️ listening" indicators.

Observing the assistant

The assistant surfaces on two different pipeline stages. Both come through raw events.

Assistant text — 'transcript' event with role === 'assistant':

session.on('transcript', (entry) => {
  if (entry.role !== 'assistant') return;
  renderAssistantText(entry.id, entry.text, entry.isComplete);
});

Does not fire in 'stream' mode (no LLM). Use the TTS events below as your text source in stream mode.

Assistant TTS lifecycle — what the user actually hears:

session.on('tts-start', () => showSpeakingIndicator());
session.on('tts-chunk', (text) => appendWord(text));
session.on('tts-end', () => hideSpeakingIndicator());

Fires in every execution mode, including 'stream'. See Transcript vs. Speaking for when to pick which.

Tool calls — also on 'transcript' with role === 'tool_call':

session.on('transcript', (entry) => {
  if (entry.role !== 'tool_call') return;
  console.log('tool invoked:', entry.functionName, 'complete=', entry.isComplete);
});

Connection and conversation lifecycle

session.on('state-change', (status) => {
  if (status === 'connected') showCallUI();
  if (status === 'disconnected') showEndedScreen();
  if (status === 'error') showErrorScreen();
});

session.on('connected', () => console.log('WebRTC up'));
session.on('assistant-ready', () => enableInput()); // bot pipeline is live
session.on('disconnected', () => console.log('call ended'));
session.on('error', (message) => showError(message));

// Server-emitted conversation events (Breeze Buddy):
session.on('conversation-start', () => markCallStarted());
session.on('conversation-end', (reason) => logEndReason(reason));
session.on('pipeline-error', (details) => logPipelineError(details));

Status graph:

idle → connecting → connected → disconnecting → disconnected
                ↘ error

Media tracks and mic

session.on('track-started', (track, local) => attachTrack(track, local));
session.on('track-stopped', (track, local) => detachTrack(track, local));
session.on('mic-change', (enabled) => updateMicUI(enabled));

Barge-in (raw-event version)

Pipecat's VAD auto-cancels TTS when the user speaks. Detect the overlap with raw events:

let assistantIsSpeaking = false;
session.on('tts-start', () => {
  assistantIsSpeaking = true;
});
session.on('tts-end', () => {
  assistantIsSpeaking = false;
});
session.on('user-speech-start', () => {
  if (assistantIsSpeaking) handleBargeIn();
});

Registering handlers before connect

Both client.startSession({...}) and joinRoom({...}) accept an on map so handlers see every event from 'connecting' onward — no race where a fast 'connected' fires before you subscribe.

const session = await joinRoom({
  roomUrl,
  token,
  on: {
    'state-change': (status) => console.log('[state]', status),
    transcript: (entry) => appendTranscript(entry),
    'tts-start': () => showSpeakingIndicator(),
    'tts-end': () => hideSpeakingIndicator()
  }
});

Listening to the user (typed helpers)

Same underlying events as above, pre-filtered. Each helper returns an Unsubscribe function — call it to remove the handler (no paired off needed).

User text — `onUserTranscript`

Delivers only entries where role === 'user', so no manual branching:

const unsubscribe = session.onUserTranscript((entry) => {
  updateBubble(entry.id, entry.text, entry.isComplete);
});

// …later
unsubscribe();

If you only care about the final text:

session.onUserTranscript((entry) => {
  if (entry.isComplete) console.log('user said:', entry.text);
});

User speech activity — `onUserSpeaking`

Merges 'user-speech-start' + 'user-speech-end' into one discriminated event:

session.onUserSpeaking((event) => {
  if (event.type === 'start') showListeningIndicator();
  if (event.type === 'end') hideListeningIndicator();
});

No chunk variant — user speech has no text here; that's what onUserTranscript is for.

Making the assistant speak (typed helpers)

Same events as in Using raw events, wrapped for ergonomics.

`assistantSpeak(text)` — push text to TTS

session.assistantSpeak(text) sends text straight to TTS. In 'stream' mode this bypasses the LLM entirely — text is spoken verbatim.

await session.assistantSpeak('Hello, how can I help you today?');
startListening();

await session.assistantSpeak('Please hold while I transfer you.');
transferCall();

Signature & behavior:

session.assistantSpeak(text: string): Promise<void>
Resolves on the next 'tts-end' after sending.
Rejects with SessionError if the session isn't connected or closes before completion.
Rejects with InvalidRequestError if text is empty / whitespace-only.
Text over 2000 chars is truncated (with a console warning).

Observing speech — `onAssistantSpeaking`

Merges 'tts-start' / 'tts-chunk' / 'tts-end' into one discriminated event. Subscribe once, not per call:

session.onAssistantSpeaking((event) => {
  switch (event.type) {
    case 'start':
      showSpeakingIndicator();
      break;
    case 'chunk':
      appendWord(event.text);
      break;
    case 'end':
      hideSpeakingIndicator();
      break;
  }
});

Fires in every execution mode, including 'stream' (because it's tied to TTS, not the LLM).

Observing assistant text — `onAssistantTranscript`

Delivers only entries where role === 'assistant' — the LLM's streaming response, before it reaches TTS:

session.onAssistantTranscript((entry) => {
  renderAssistantBubble(entry.id, entry.text, entry.isComplete);
});

Does not fire in 'stream' mode (no LLM). Use onAssistantSpeaking as your text source there.

Tool calls — `onToolCall`

Delivers only entries where role === 'tool_call':

session.onToolCall((entry) => {
  console.log('tool invoked:', entry.functionName, 'complete=', entry.isComplete);
});

Why no per-utterance callback on `assistantSpeak`?

Pipecat's TTS events carry no correlation ID, and the pipeline can produce TTS for reasons other than your call (server-initiated idle prompts, barge-in interruption, template-baked audio). A callback claiming "these events are for your utterance" would lie about a precision the underlying system doesn't provide. The Promise resolves on "the next tts-end" — honest and scoped; for live observation you subscribe to the global stream via onAssistantSpeaking.

Barge-in detection (typed-helper version)

let assistantIsSpeaking = false;
session.onAssistantSpeaking((e) => {
  if (e.type === 'start') assistantIsSpeaking = true;
  if (e.type === 'end') assistantIsSpeaking = false;
});
session.onUserSpeaking((e) => {
  if (e.type === 'start' && assistantIsSpeaking) handleBargeIn();
});

Cancelling TTS — TODO (cross-team)

No client-triggerable way to flush the assistant mid-utterance today. The only cancellation path is automatic VAD-driven barge-in. Programmatic flush requires a new on_client_message handler in clairvoyance (at app/ai/voice/agents/breeze_buddy/agent/__init__.py:650-665, where tts-speak is registered); the SDK side is a 3-line session.cancelSpeech() wrapper once the backend ships.

Transcript vs. Speaking

The assistant has two "what it said" streams that fire at different pipeline stages. Pick by use case, not by feel.

| | onAssistantTranscript / 'transcript' (assistant) | onAssistantSpeaking / 'tts-*' | | ------------------------------------------------- | ---------------------------------------------------- | ---------------------------------- | | Source | LLM token stream | TTS pipeline output | | Fires when | Model is generating text | Audio is being synthesized | | Stream mode (no LLM) | ❌ never fires | ✅ fires — only text stream | | Production / test mode | ✅ fires (earlier in the pipeline) | ✅ fires (after TTS begins) | | Handler receives | Streaming AssistantTranscript | 'start' \| 'chunk' \| 'end' | | Reflects post-processing? (PII, profanity filter) | No — raw LLM output | Yes — what the user actually hears | | Use for | Render the model's response as text | Sync UI with actual audio |

Rule of thumb:

What the model said → onAssistantTranscript (or 'transcript' with role filter)
What the user is hearing → onAssistantSpeaking (or 'tts-*' events)
In stream mode → onAssistantSpeaking / 'tts-*' is your only text stream

Symmetric helpers on the user side: onUserTranscript (STT text) vs onUserSpeaking (VAD activity).

Wildcard subscription

Pass '*' to session.on to receive every other event in one place — useful for logging, analytics, or mirroring the entire session into a state store.

The handler signature is (eventName, ...originalArgs). eventName is excluded from the wildcard namespace (you won't get '*' for '*'), and the args are the original event's args — so casting per-case gives you full type safety.

import type {
  ConnectionStatus,
  ConversationEndReason,
  PipelineErrorDetails,
  TranscriptEntry
} from '@juspay/breeze-buddy-client-sdk';

session.on('*', (event, ...args) => {
  switch (event) {
    // --- Connection ---
    case 'connected':
      onConnected();
      break;
    case 'disconnected':
      onDisconnected();
      break;
    case 'error': {
      const [message] = args as [string];
      showError(message);
      break;
    }
    case 'state-change': {
      const [status] = args as [ConnectionStatus];
      renderStatus(status);
      break;
    }
    case 'assistant-ready':
      enableInput();
      break;

    // --- Conversation lifecycle (server-emitted) ---
    case 'conversation-start':
      markCallStarted();
      break;
    case 'conversation-end': {
      const [reason] = args as [ConversationEndReason];
      logEndReason(reason);
      break;
    }
    case 'pipeline-error': {
      const [details] = args as [PipelineErrorDetails];
      logPipelineError(details);
      break;
    }

    // --- Media ---
    case 'track-started': {
      const [track, local] = args as [MediaStreamTrack, boolean];
      attachTrack(track, local);
      break;
    }
    case 'track-stopped': {
      const [track, local] = args as [MediaStreamTrack, boolean];
      detachTrack(track, local);
      break;
    }
    case 'mic-change': {
      const [enabled] = args as [boolean];
      updateMicUI(enabled);
      break;
    }

    // --- Speech activity (VAD — no text) ---
    case 'user-speech-start':
    case 'user-speech-end':
    case 'assistant-speech-start':
    case 'assistant-speech-end':
      markSpeechActivity(event);
      break;

    // --- TTS lifecycle ---
    case 'tts-start':
      showSpeakingIndicator();
      break;
    case 'tts-chunk': {
      const [text] = args as [string];
      appendWord(text);
      break;
    }
    case 'tts-end':
      hideSpeakingIndicator();
      break;

    // --- Transcripts ---
    case 'transcript': {
      const [entry] = args as [TranscriptEntry];
      if (entry.role === 'user') updateUserBubble(entry);
      else if (entry.role === 'assistant') updateAssistantBubble(entry);
      else if (entry.role === 'tool_call') logToolCall(entry);
      break;
    }

    // --- Telemetry ---
    case 'metrics': {
      const [data] = args as [unknown];
      pushMetrics(data);
      break;
    }
  }
});

The wildcard fires in addition to any specific subscriptions you've made — not instead of them. So you can keep per-event subscriptions for hot paths and use '*' purely for observability.

Events catalog

Reference table of every event. Subscribe via session.on(event, handler) or via options.on on startSession / joinRoom.

Connection

| Event | Handler | | ------------------- | ------------------------------------ | | 'connected' | () => void | | 'disconnected' | () => void | | 'error' | (message: string) => void | | 'state-change' | (status: ConnectionStatus) => void | | 'assistant-ready' | () => void |

Conversation lifecycle (server-emitted)

| Event | Handler | | ---------------------- | ----------------------------------------- | | 'conversation-start' | () => void | | 'conversation-end' | (reason: ConversationEndReason) => void | | 'pipeline-error' | (details: PipelineErrorDetails) => void |

Media

| Event | Handler | | ----------------- | --------------------------------------------------- | | 'track-started' | (track: MediaStreamTrack, local: boolean) => void | | 'track-stopped' | (track: MediaStreamTrack, local: boolean) => void | | 'mic-change' | (enabled: boolean) => void |

Speech activity (VAD — no text)

| Event | Handler | | -------------------------- | ------------ | | 'user-speech-start' | () => void | | 'user-speech-end' | () => void | | 'assistant-speech-start' | () => void | | 'assistant-speech-end' | () => void |

TTS lifecycle

| Event | Handler | | ------------- | ------------------------ | | 'tts-start' | () => void | | 'tts-chunk' | (text: string) => void | | 'tts-end' | () => void |

Transcripts & telemetry

| Event | Handler | | -------------- | ---------------------------------- | | 'transcript' | (entry: TranscriptEntry) => void | | 'metrics' | (data: unknown) => void |

Wildcard

| Event | Handler | | ----- | ----------------------------------------------------------------- | | '*' | (event: Exclude<SessionEvent, '*'>, ...args: unknown[]) => void |

See Wildcard subscription for a complete switch/case example.

Errors

All errors extend BuddyError. Branch with instanceof — no string code matching.

import {
  BuddyError,
  AuthenticationError,
  APIError,
  NetworkError,
  TimeoutError,
  InvalidRequestError,
  SessionError
} from '@juspay/breeze-buddy-client-sdk';

try {
  const session = await client.startSession({ templateId, payload });
} catch (err) {
  if (err instanceof AuthenticationError) return refreshTokenAndRetry();
  if (err instanceof NetworkError || err instanceof TimeoutError) return showRetryBanner();
  if (err instanceof APIError) console.error(err.statusCode, err.details);
  if (err instanceof BuddyError) console.error(err.message);
}

| Class | Thrown when | | --------------------- | ---------------------------------------------------- | | BuddyError | Base class — catch-all for SDK errors | | AuthenticationError | HTTP 401 / 403 | | APIError | Other non-2xx API responses | | NetworkError | Fetch failed (offline, DNS, CORS, etc.) | | TimeoutError | Request exceeded the 30s timeout | | InvalidRequestError | Invalid SDK usage (e.g. empty assistantSpeak text) | | SessionError | Session-lifecycle error (e.g. speak before connect) |

Every instance carries .message, optional .statusCode, and optional .details (raw response body).

Low-level API & reference

`client.api.createLead(...)`

For workflows that don't fit startSession:

const { leadId } = await client.api.createLead({
  templateId: 'f47ac10b-58cc-4372-a567-0e02b2c3d479',
  payload: { customer_name: 'John' },
  executionMode: 'stream'
});

`session.sendMessage(msgType, data?)`

Send raw RTVI messages to custom backend handlers:

session.sendMessage('my-custom-handler', { some: 'data' });

Exports

// Classes
export {
  BuddyClient,
  BuddyError,
  AuthenticationError,
  APIError,
  NetworkError,
  TimeoutError,
  InvalidRequestError,
  SessionError
};

// Functions
export { joinRoom };

// Types
export type {
  ClientOptions,
  StartSessionOptions,
  JoinRoomOptions,
  Session,
  SessionState,
  SessionEvent,
  SessionEventMap,
  WildcardHandler,
  Unsubscribe,
  ConnectionStatus,
  ExecutionMode,
  TranscriptEntry,
  UserTranscript,
  AssistantTranscript,
  ToolCallTranscript,
  ConversationEndReason,
  PipelineErrorDetails,
  AssistantSpeakingEvent,
  UserSpeakingEvent,
  CreateLeadOptions,
  LeadResult,
  API
};

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

@juspay/breeze-buddy-client-sdk

Table of Contents

Quick start

Stream-mode example (deterministic output)

Constructing and starting a session

new BuddyClient(options)

client.startSession(options)

joinRoom(options) — direct join, no client construction

Execution modes

The Session handle

Lifecycle & mic

Outbound

Events

Snapshot shape — getState()

await using — automatic cleanup (ES2024+)

Two subscription styles — raw vs. typed helpers

Using raw events

Listening to the user

Observing the assistant

Connection and conversation lifecycle

Media tracks and mic

Barge-in (raw-event version)

Registering handlers before connect

Listening to the user (typed helpers)

User text — onUserTranscript

User speech activity — onUserSpeaking

Making the assistant speak (typed helpers)

assistantSpeak(text) — push text to TTS

Observing speech — onAssistantSpeaking

Observing assistant text — onAssistantTranscript

Tool calls — onToolCall

Why no per-utterance callback on assistantSpeak?

Barge-in detection (typed-helper version)

Cancelling TTS — TODO (cross-team)

Transcript vs. Speaking

Wildcard subscription

Events catalog

Connection

Conversation lifecycle (server-emitted)

Media

Speech activity (VAD — no text)

TTS lifecycle

Transcripts & telemetry

Wildcard

Errors

Low-level API & reference

client.api.createLead(...)

session.sendMessage(msgType, data?)

Exports

`new BuddyClient(options)`

`client.startSession(options)`

`joinRoom(options)` — direct join, no client construction

The `Session` handle

Snapshot shape — `getState()`

`await using` — automatic cleanup (ES2024+)

User text — `onUserTranscript`

User speech activity — `onUserSpeaking`

`assistantSpeak(text)` — push text to TTS

Observing speech — `onAssistantSpeaking`

Observing assistant text — `onAssistantTranscript`

Tool calls — `onToolCall`

Why no per-utterance callback on `assistantSpeak`?

`client.api.createLead(...)`

`session.sendMessage(msgType, data?)`