ultron-live-sdk

v2.0.1

Published

15 days ago

Official JavaScript/TypeScript SDK for Ultron Live — real-time AI vision, screen commentary, session management, and TTS, powered by GPT-4o and Gemini.

0High
0Medium
0Low

npm_metabrix

ultron vision screen-share ai-commentary screenshare real-time gemini gpt4o sdk live tts autocommentary

Ultron Live SDK — Complete Documentation

Version 1.2.0 | npm | Homepage

📧 For any confusion or queries, contact us at [email protected]

🔑 To get your API key, visit live.ultronai.me, log in with your email ID, and grab your API key from the Playground page.

The official JavaScript/TypeScript SDK for Ultron Live — real-time AI vision analysis, screen commentary, session chat, text-to-speech, realtime voice AI, and WebRTC low-latency streaming. Supports REST, WebRTC, and WebSocket transports.

Installation

npm install ultron-live-sdk

Import

// ESM (recommended)
import { UltronLive } from 'ultron-live-sdk';

// CommonJS
const { UltronLive } = require('ultron-live-sdk');

UMD (Browser)

<script src="https://cdn.skypack.dev/ultron-live-sdk"></script>

Quick Start

import { UltronLive } from 'ultron-live-sdk';

// 1. Initialize with your API key
const ultron = new UltronLive({ apiKey: 'ulk_your_api_key' });

// 2. Start screen share with AI commentary
await ultron.startScreenShare({
  model: 'gemini-2.5-flash',
  onTranscript: (text) => // Show text preview or other functionality,
  onCreditsUpdate: (remaining) => // Show Credits remaining number,
  onCreditsExhausted: () => // Take action on credit exhaustion,
  onError: (err) => console.error(err),
  onStop: () => // Perform some action on stop streaming,
});

// 3. Stop when done
ultron.stop();

SDK Configuration

Constructor: `new UltronLive(config)`

const ultron = new UltronLive({
  apiKey: 'ulk_your_api_key',           // Required — your API key or JWT token
  baseUrl: 'https://live-api.ultronai.me', // Optional — backend URL
  apiVersion: 'v1',                      // Optional — 'v1' (default) or 'v2'
  model: 'gemini-3.1-flash-lite-preview',  // Optional — default AI model
  historyWindow: 5,                      // Optional — context window size
  frameInterval: 300-3000,                   // Optional — ms between frame captures
  frameQuality: 0.4,                     // Optional — JPEG quality (0–1)
  maxFrameWidth: 1280,                   // Optional — max capture width in px
  maxFrameHeight: 720,                   // Optional — max capture height in px
});

`UltronSDKConfig` — All Options

| Option | Type | Default | Description | |---|---|---|---| | apiKey | string | — | Your Ultron API key (starts with ulk_). Server-side only — never ship to a browser. Obtain from your dashboard or via auth.getMe(). | | token | string | — | A JWT (from auth.verifyOtp() or a minted session token). Provide apiKey, token, or tokenProvider. | | tokenProvider | () => string \| Promise<string> | — | Browser-safe auth. A callback returning a short-lived JWT; the SDK caches it until just before expiry and refreshes automatically (and on 401). See Browser-safe auth. | | baseUrl | string | https://live-api.ultronai.me | Base URL of the Ultron Live backend. | | apiVersion | 'v1' \| 'v2' | v1 | Backend workflow version. v1 = stable pipeline (backward-compatible). v2 = current pipeline (Golden Frame filtering, realtime tools, assessment engine, MCP servers). See API Versions. | | model | ModelValue | gemini-3.1-flash-lite-preview | Default AI model for vision analysis. | | historyWindow | number | 5 | Number of recent commentary strings kept in context for continuity. | | frameInterval | number | 1500 | Milliseconds between each frame capture during streaming. | | frameQuality | number | 0.4 | JPEG compression quality for captured frames (0 = worst, 1 = best). | | maxFrameWidth | number | 1280 | Maximum width in pixels for captured frames. Frames are scaled down if larger. | | maxFrameHeight | number | 720 | Maximum height in pixels for captured frames. |

Reading Config at Runtime

const config = ultron.config;
console.log(config.apiKey);       // current API key
console.log(config.model);        // current default model
console.log(config.frameInterval); // 1500

ultron.config returns a Readonly<Required<UltronSDKConfig>> — all fields are guaranteed present.

API Versions (v1 & v2)

The Ultron Live backend runs two parallel workflows. Choose one per UltronLive instance with the apiVersion option.

| | v1 — Stable (default) | v2 — Current | |---|---|---| | Vision pipeline | One model call per frame | Golden Frame pipeline — streaming filter + two-tier description | | Realtime voice | Standard relay | Relay with realtime tool-calling | | Extra features | — | Assessment engine, MCP servers | | REST paths | /api/v1/* | /api/v2/* | | WebSocket paths | /ws, /ws/realtime | /ws/v2, /ws/v2/realtime |

// Stable (default) — existing behavior, nothing changes
const ultron = new UltronLive({ apiKey: 'ulk_...' });

// Current pipeline — golden frames, assessment, MCP, realtime tools
const ultronV2 = new UltronLive({ apiKey: 'ulk_...', apiVersion: 'v2' });

ultron.apiVersion;   // 'v1'
ultronV2.apiVersion; // 'v2'

The version flows through everything automatically — REST calls, createWebRTC(), and createRealtimeVoice() all use the configured version's paths. You don't pass it again.

Backward compatible: v1 is the default, so upgrading the SDK package does not change behavior. Opt into v2 to use the new features below. Calling a v2-only API (assessment.*, mcp.*) on a v1 instance throws a clear error.

Authentication

The SDK provides a built-in OTP-based authentication flow via ultron.auth. Two authentication methods are supported:

API Key — x-api-key header or Authorization: Bearer ulk_...
JWT — Authorization: Bearer <jwt> (obtained via OTP verification)

The flow is always: request an OTP → verify it → get a JWT + API key.

Step 1: Request an OTP

Use signup for a brand-new account, login for an existing one (both email a 6-digit OTP). sendOtp is the legacy get-or-create call, kept for backward compatibility.

// New user
await ultron.auth.signup({
  email: '[email protected]',
  firstName: 'Ada',      // optional
  lastName: 'Lovelace',  // optional
  dateOfBirth: '1990-12-10', // optional, ISO date
});

// Existing user
await ultron.auth.login({ email: '[email protected]' });

// Legacy (get-or-create)
const result = await ultron.auth.sendOtp({ email: '[email protected]' });
console.log(result); // { success: true, message: 'OTP sent successfully to your email' }

Parameters:

| Field | Type | Required | Description | |---|---|---|---| | email | string | Yes | The email address to send the OTP to. |

Returns: Promise<{ success: boolean; message: string }>

Step 2: Verify OTP

const response = await ultron.auth.verifyOtp({
  email: '[email protected]',
  otp: '123456',
  timezone: 'America/New_York', // optional — IANA timezone string
});

const { token, user } = response.data;

console.log(token);        // JWT token string
console.log(user.email);   // '[email protected]'
console.log(user.credits); // 10000

// Store and apply the token
ultron.setApiKey(token);

Parameters:

| Field | Type | Required | Description | |---|---|---|---| | email | string | Yes | The email used in Step 1. | | otp | string | Yes | The 6-digit OTP received via email. | | timezone | string | No | IANA timezone string (e.g. America/New_York). Stored for timezone-aware features. |

Returns: Promise<AuthResult>

interface AuthResult {
  success: boolean;
  data: {
    token: string;
    user: UltronUser;
  };
}

interface UltronUser {
  id: string;
  email: string;
  apiKey: string;
  credits: number;
  isVerified: boolean;
  lastLoginAt: string;
  createdAt: string;
  timezone: string | null;
  plan: 'free' | 'pro' | 'enterprise';
  planStartedAt: string | null;
  subscriptionStatus: string | null;
  subscriptionCurrentPeriodEnd: string | null;
  cancelAtPeriodEnd: boolean;
}

Step 3: Get Current User Profile

const me = await ultron.auth.getMe();
console.log(me.data.email);   // '[email protected]'
console.log(me.data.credits); // remaining credits

Returns: Promise<GetMeResult>

interface GetMeResult {
  success: boolean;
  data: UltronUser;
}

Step 4: Update Timezone

await ultron.auth.updateTimezone({ timezone: 'Asia/Kolkata' });

Validates the IANA timezone string and updates the user record. Safe to call on every page load.

| Field | Type | Required | Description | |---|---|---|---| | timezone | string | Yes | Valid IANA timezone string (e.g. Europe/London, Asia/Kolkata). |

Updating the API Key

After login or token refresh, update the SDK's key:

ultron.setApiKey('new_jwt_token');

This updates both the internal config and the HTTP client immediately.

Full Auth Flow Example

import { UltronLive } from 'ultron-live-sdk';

const ultron = new UltronLive({ apiKey: '' }); // empty key initially

// 1. Send OTP
await ultron.auth.sendOtp({ email: '[email protected]' });

// 2. User enters OTP from email...
const userVerification = await ultron.auth.verifyOtp({
  email: '[email protected]',
  otp: '123456',
  timezone: Intl.DateTimeFormat().resolvedOptions().timeZone,
});

const { token, user } = userVerification.data

// 3. Apply the token
ultron.setApiKey(token);

// 4. Now use the SDK normally
await ultron.startScreenShare({
  onTranscript: (text) => console.log(text),
});

Browser-safe auth (keep your API key secret)

Never put your ulk_… API key in browser code. Anything the browser can read (the Network tab, DevTools, extensions) can read your key. For any client-facing app, use a short-lived session token minted by your own backend, and give the SDK a tokenProvider instead of an apiKey.

How it works

browser ──▶ your backend  /api/ultron-token      (you authenticate your user)
                  │  calls Ultron with your API key (server-side only)
                  ▼
        POST /api/v2/auth/session-token  →  { token, expiresIn, expiresAt }
                  ▼
browser ◀── { token }        // short-lived, session-scoped JWT

1) Your backend mints the token (Node/Next.js example — the API key stays here):

// POST /api/ultron-token   (server-side)
export async function POST(req: Request) {
  const user = await authenticateYourUser(req);          // your own auth
  if (!user) return new Response('Unauthorized', { status: 401 });

  const r = await fetch('https://live-api.ultronai.me/api/v2/auth/session-token', {
    method: 'POST',
    headers: { 'Content-Type': 'application/json', 'x-api-key': process.env.ULTRON_API_KEY! },
    body: JSON.stringify({ endUserId: user.id, ttlSeconds: 600 }),
  });
  const { data } = await r.json();
  return Response.json({ token: data.token });            // return ONLY the token
}

2) The browser uses tokenProvider (no key ever ships to the client):

const ultron = new UltronLive({
  apiVersion: 'v2',
  tokenProvider: async () => {
    const r = await fetch('/api/ultron-token', { credentials: 'include' });
    if (!r.ok) throw new Error('Could not fetch session token');
    return (await r.json()).token;
  },
});

// Use everything normally — REST, createWebRTC(), createRealtimeVoice() all
// pull the token from the provider, cache it until just before it expires, and
// refresh automatically (and on a 401).
await ultron.startScreenShare({ onTranscript: console.log });

What a session token can and can't do

✅ Runs inference (vision, assessment, WebRTC, realtime voice, chat) — billed to your account's shared credit pool.
✅ Reads templates and effective MCP tools.
❌ Cannot manage billing/subscription, edit profile, manage MCP servers, write templates, or mint more tokens — those return 403. Do them server-side with your API key.
⏱️ Expires fast (default 10 min, max 1 hr). A leaked token is useless within minutes.

POST /api/v2/auth/session-token accepts only API-key auth (server-side). Body: { endUserId?, ttlSeconds?, metadata? }. endUserId is for your own attribution/rate-limiting — it never changes which account is billed.

Updating the token at runtime

ultron.setToken('new_jwt');                 // set a token directly
ultron.setTokenProvider(async () => '…');   // or (re)configure the provider

Screen Share Streaming

The primary feature of the SDK. Requests browser screen share access and runs a continuous AI commentary loop.

`ultron.startScreenShare(options?)`

await ultron.startScreenShare({
  // Model override (optional — uses SDK default if omitted)
  model: 'gemini-2.5-flash',

  // Frame capture settings (optional — uses SDK defaults)
  frameInterval: 300-3500, //Choose a number in this range
  frameQuality: 0.4,
  maxFrameWidth: 1280,
  maxFrameHeight: 720,
  historyWindow: 5,

  // Provide your own MediaStream instead of requesting getDisplayMedia()
  stream: existingMediaStream, // optional

  // Callbacks
  onTranscript: (text) => {
    // Called with each AI commentary string (≤20 words)
  },
  onCreditsUpdate: (remaining) => {
    // Called after each frame with updated credit balance
  },
  onLatency: (ms) => {
    // Called with processing latency in milliseconds
  },
  onFrame: (dataUrl) => {
    // Called with the raw captured frame as a JPEG data-URL
    // Useful for displaying a preview
    previewImg.src = dataUrl;
  },
  onCreditsExhausted: () => {
    // Called when credits reach zero — stream auto-stops
    // Alert user to show usage limit reached 
  },
  onError: (err) => {
    // Called on API/network errors (non-credit errors)
    console.error('Error:', err.message);
  },
  onStop: () => {
    // Called when the stream stops (user action or track ended)
    setStreamEnded(true)
  },
});

`StartScreenShareOptions` — All Fields

| Field | Type | Default | Description | |---|---|---|---| | stream | MediaStream | — | Supply an existing stream instead of requesting getDisplayMedia(). | | model | ModelValue | SDK default | Override the AI model for this session. | | frameInterval | number | 1500 | Ms between frame captures. | | frameQuality | number | 0.4 | JPEG quality (0–1). | | maxFrameWidth | number | 1280 | Max frame width in px. | | maxFrameHeight | number | 720 | Max frame height in px. | | historyWindow | number | 5 | Number of past commentaries to send as context. | | onTranscript | (text: string) => void | — | New AI commentary. | | onCreditsUpdate | (remaining: number) => void | — | Updated credit balance. | | onLatency | (ms: number) => void | — | Frame processing latency. | | onFrame | (dataUrl: string) => void | — | Raw captured frame data-URL. | | onStop | () => void | — | Stream ended. | | onCreditsExhausted | () => void | — | Credits hit zero. | | onError | (err: Error) => void | — | API/network error. |

Stopping the Stream

ultron.stop();

This:

Stops the frame capture interval
Stops all MediaStream tracks (ends the screen share)
Stops any playing audio
Clears the history buffer

Checking Stream State

if (ultron.isStreaming) {
  // Currently streaming, session id: ultron.sessionId
}

| Property | Type | Description | |---|---|---| | isStreaming | boolean | Whether a stream analysis loop is running. | | sessionId | string \| null | The active session ID, or null when not streaming. |

Video Stream Analysis

Analyze any MediaStream — webcam, <video> element capture, canvas stream, etc.

`ultron.startVideoStream(options)`

// From a webcam
const stream = await navigator.mediaDevices.getUserMedia({ video: true });

await ultron.startVideoStream({
  stream,  // Required — any MediaStream
  model: 'gpt-4o',
  frameInterval: 2000,
  onTranscript: (text) => // show transcript text,
  onCreditsUpdate: (remaining) => // show users their remaining credits,
  onError: (err) => console.error(err),
});

From a `<video>` Element

const videoEl = document.getElementById('my-video') as HTMLVideoElement;
const stream = videoEl.captureStream();

await ultron.startVideoStream({
  stream,
  model: 'gemini-2.5-flash',
  onTranscript: // Perform your task,
});

From a Canvas

const canvas = document.getElementById('game-canvas') as HTMLCanvasElement;
const stream = canvas.captureStream(30); // 30 FPS

await ultron.startVideoStream({
  stream,
  model: 'gemini-2.5-flash',
  frameInterval: 1000,
  onTranscript: (text) => showCommentary(text),
});

`StartVideoStreamOptions` — All Fields

Same as StartScreenShareOptions except stream is required (not optional).

| Field | Type | Required | Description | |---|---|---|---| | stream | MediaStream | Yes | The video stream to analyze. | | All other fields | — | No | Same as StartScreenShareOptions above. |

Frame Analysis (Low-Level)

Send a single frame for AI vision analysis without using the streaming loop. Useful for custom capture pipelines or one-off analysis.

`ultron.analyseFrame(options)`

const result = await ultron.analyseFrame({
  image: 'data:image/jpeg;base64,/9j/4AAQSkZJRg...',
  model: 'gemini-2.5-flash',           // optional — uses SDK default
  history: ['Player opened the menu.'], // optional — previous commentaries
  sessionId: 'sess_abc123',            // optional — attach to a session
  timestamp: Date.now(),               // optional — client timestamp
  enableAudio: true,                   // optional — include TTS audio in response
  voiceName: 'Kore',                   // optional — Gemini TTS voice name
});

console.log(result.response);          // "Player is navigating the inventory screen."
console.log(result.latency);           // 342 (ms)
console.log(result.creditsRemaining);  // 9842
console.log(result.creditsUsed);       // 3
console.log(result.usage);             // { inputTokens: 1200, outputTokens: 45 }
console.log(result.model);             // 'gemini-2.5-flash'
console.log(result.provider);          // 'gemini'
console.log(result.audio);             // base64 WAV audio (if enableAudio was true)

`AnalyseFrameOptions`

| Field | Type | Required | Description | |---|---|---|---| | image | string | Yes | Base-64 data-URL (data:image/jpeg;base64,...) or raw base-64 JPEG string. | | model | ModelValue | No | Override the AI model. Falls back to SDK default. | | history | string[] | No | Previous commentary strings for context continuity. | | sessionId | string \| null | No | Session ID to attach this frame to. | | timestamp | number | No | Client-side Unix timestamp. Defaults to Date.now(). | | enableAudio | boolean | No | If true, includes TTS audio of the commentary in the response. | | voiceName | string | No | Gemini TTS voice name (e.g. Kore, Zephyr, Puck). Default Kore. |

`AnalyseFrameResult`

| Field | Type | Description | |---|---|---| | success | boolean | Whether the analysis succeeded. | | response | string | The AI-generated commentary (≤20 words). | | latency | number | Processing time in milliseconds. | | receivedAt | string | ISO 8601 server timestamp. | | model | string | Model that was used. | | provider | string | Provider ('openai' or 'gemini'). | | usage | { inputTokens: number; outputTokens: number } | Token usage for this frame. | | creditsUsed | number | Credits consumed by this call. | | creditsRemaining | number | Remaining credit balance. | | audio | string | (optional) Base64-encoded WAV audio. Present only when enableAudio: true. | | audioMimeType | string | (optional) "audio/wav". Present only when enableAudio: true. |

Session Management

Sessions group frames and chat messages together, giving the AI context about an ongoing viewing session. The backend automatically builds rich context from sessions including:

Frame commentaries — timestamped AI descriptions of each captured frame
Summaries — AI-generated summaries produced every 10 frames using Gemini 2.5 Pro
Curated high-context frames — deduplicated, tagged frames selected by AI for efficient retrieval

`ultron.startSession()`

const { success, sessionId } = await ultron.startSession();
console.log(sessionId); // 'sess_abc123def456'

Returns: Promise<StartSessionResult>

interface StartSessionResult {
  success: boolean;
  sessionId: string;
}

Sessions are created automatically when you call startScreenShare() or startVideoStream(). Use startSession() only when building a custom pipeline with analyseFrame() and chat().

WebRTC (Low-Latency Mode)

The SDK supports a WebRTC transport for lower latency, persistent connections, and binary frame transfer. All real-time data flows through WebRTC data channels instead of individual HTTP requests.

When to Use WebRTC vs REST

| | REST (default) | WebRTC | |---|---|---| | Latency | Higher (new TCP per request) | Lower (persistent P2P) | | Frame format | Base64 in JSON (33% bloat) | Raw binary ArrayBuffer | | Connection | Stateless | Persistent | | Setup complexity | None | Requires signaling handshake |

Auth endpoints (/api/v1/auth/*) and TTS (/api/tts/gemini) remain REST-only regardless of transport.

Quick Start

import { UltronLive } from 'ultron-live-sdk';

const ultron = new UltronLive({ apiKey: 'ulk_your_api_key' });

// Create a WebRTC client (inherits apiKey/token/baseUrl from the SDK)
const rtc = ultron.createWebRTC();

// Wire up callbacks
rtc.onConnected = () => {
  console.log('WebRTC ready');
  rtc.startSession();
};
rtc.onFrameResult = (result) => {
  console.log(result.response);       // AI commentary
  console.log(result.creditsRemaining);
};
rtc.onChatResult = (result) => {
  console.log(result.response);
};
rtc.onSessionStarted = (sessionId) => {
  console.log('Session:', sessionId);
};
rtc.onDisconnected = (reason) => {
  console.warn('Disconnected:', reason);
};

// Connect (resolves when all 3 data channels are open)
await rtc.connect();

// Send frames as binary (preferred)
canvas.toBlob((blob) => rtc.sendFrame(blob), 'image/jpeg', 0.4);

// Or send as base64 if you already have it
rtc.sendFrameBase64(base64String, Date.now(), 'gemini-2.5-flash');

// Chat
rtc.sendChat('What was I working on?', { enableAudio: true });

// Change model
rtc.setModel('gemini-2.5-flash');

// Disconnect
rtc.disconnect();

`ultron.createWebRTC(overrides?)`

Factory method that creates an UltronWebRTCClient pre-configured with the SDK's baseUrl, apiKey, and token.

// Uses SDK config
const rtc = ultron.createWebRTC();

// Override specific options
const rtc = ultron.createWebRTC({
  connectionTimeout: 20000,
});

Standalone Usage

You can also use UltronWebRTCClient directly without the main UltronLive class:

import { UltronWebRTCClient } from 'ultron-live-sdk';

const rtc = new UltronWebRTCClient({
  backendUrl: 'https://live-api.ultronai.me',
  apiKey: 'ulk_your_api_key',
});

await rtc.connect();

`WebRTCConfig`

| Option | Type | Default | Description | |---|---|---|---| | backendUrl | string | https://live-api.ultronai.me | Backend URL. | | apiKey | string | — | API key (ulk_...). | | token | string | — | JWT token (alternative to apiKey). | | connectionTimeout | number | 15000 | Connection timeout in ms. |

Callbacks

Set these on the client instance before calling connect().

| Callback | Signature | Description | |---|---|---| | onFrameResult | (result: WebRTCFrameResult) => void | AI commentary for a frame. | | onFrameError | (error: WebRTCFrameError) => void | Frame processing error. | | onChatResult | (result: WebRTCChatResult) => void | Chat response. | | onChatError | (error: WebRTCChatError) => void | Chat error. | | onSessionStarted | (sessionId: string, info?: SessionStartedInfo) => void | Session created. info carries the effective golden/assessment config (v2). | | onSessionStopped | () => void | Session ended. | | onConnected | () => void | All data channels open. | | onDisconnected | (reason: string) => void | Connection lost. | | onError | (error: string) => void | General error from server. | | onConnectionStateChange | (state: RTCPeerConnectionState) => void | Peer connection state changed. | | onFlowResult (v2) | (result: WebRTCFlowResult) => void | Tier-2 synthesis narrative tying recent frames together (golden pipeline). | | onSessionStartError (v2) | (error: string) => void | Server rejected the requested session config. | | onAssessmentProgress (v2) | (p: WebRTCAssessmentProgress) => void | Frame-collection progress during an assessment session. | | onAssessmentResult (v2) | (r: WebRTCAssessmentResult) => void | Final assessment verdict. | | onAssessmentError (v2) | (e: WebRTCAssessmentError) => void | Assessment failed. |

Methods

| Method | Signature | Description | |---|---|---| | connect() | () => Promise<void> | Connect and wait for all 3 data channels to open. | | sendFrame(data) | (data: Blob \| ArrayBuffer) => void | Send a binary frame (zero encoding overhead). | | sendFrameBase64(base64, meta?, model?) | (base64: string, meta?: number \| { seq?, captureTs?, model? }, model?: string) => void | Send a frame as base64 JSON. Pass { seq, captureTs } so the v2 golden pipeline can order frames. | | sendChat(message, options?) | (message: string, options?: WebRTCChatOptions) => void | Send a chat message. | | startSession(config?) | (config?: SessionConfig) => void | Start a session. Pass a SessionConfig (v2) for golden tuning or an assessment session. | | stopSession() | () => void | Stop the current session (emits the final assessment verdict in assessment mode). | | finalize() (v2) | () => void | Assess mid-stream without ending the session; keep streaming afterward. | | setModel(model) | (model: string) => void | Change the AI model. | | ping() | () => void | Ping for latency measurement. | | disconnect() | () => void | Close all connections and clean up. |

Properties

| Property | Type | Description | |---|---|---| | isReady | boolean | Whether all 3 data channels are open. | | connected | boolean | Whether the connection is established. | | sessionId | string \| null | Current session ID. |

`WebRTCChatOptions`

| Field | Type | Required | Description | |---|---|---|---| | sessionId | string | No | Override session (defaults to current). | | frames | string[] | No | Base64 frame data-URLs for visual context. | | enableAudio | boolean | No | Include TTS audio in response. | | voiceName | string | No | TTS voice name. |

Response Types

`WebRTCFrameResult`

| Field | Type | Description | |---|---|---| | type | 'frame:result' | Message type. | | response | string | AI commentary. | | latency | number | Server processing time (ms). | | model | string | Model used. | | usage | { inputTokens, outputTokens } | Token usage. | | creditsUsed | number | Credits deducted. | | creditsRemaining | number | Remaining balance. | | sessionId | string \| null | Current session ID. |

`WebRTCChatResult`

| Field | Type | Description | |---|---|---| | type | 'chat:result' | Message type. | | response | string | AI answer. | | creditsRemaining | number | Remaining balance. | | latency | number | Processing time (ms). | | usage | { inputTokens, outputTokens } | Token usage. | | usedFrameUrls | string[] | (optional) Referenced image URLs. | | audio | string | (optional) Base64 MP3 audio. | | audioMimeType | string | (optional) "audio/mpeg". |

Data Channels

The WebRTC connection uses three data channels:

| Channel | Mode | Purpose | |---|---|---| | frames | Unreliable, unordered | Frame data (binary or JSON). Dropped frames are expected. | | chat | Reliable, ordered | Chat messages and responses. | | control | Reliable, ordered | Session start/stop, model config, ping/pong. |

Frame Loop Example

const rtc = ultron.createWebRTC();
rtc.onFrameResult = (r) => updateUI(r.response, r.creditsRemaining);
rtc.onConnected = () => rtc.startSession();
await rtc.connect();

// Capture and send frames every 1.5s
const interval = setInterval(() => {
  if (!rtc.isReady) return;
  const canvas = document.createElement('canvas');
  canvas.width = Math.min(video.videoWidth, 1280);
  canvas.height = Math.min(video.videoHeight, 720);
  canvas.getContext('2d')!.drawImage(video, 0, 0, canvas.width, canvas.height);
  canvas.toBlob((blob) => rtc.sendFrame(blob!), 'image/jpeg', 0.4);
}, 1500);

// Clean up
window.addEventListener('beforeunload', () => {
  clearInterval(interval);
  rtc.disconnect();
});

Reconnection

async function connectWithRetry(rtc, maxRetries = 3) {
  for (let i = 0; i < maxRetries; i++) {
    try {
      await rtc.connect();
      return;
    } catch (err) {
      console.warn(`Attempt ${i + 1} failed:`, err.message);
      if (i < maxRetries - 1) await new Promise(r => setTimeout(r, 2000 * (i + 1)));
    }
  }
  throw new Error('Failed to connect after retries');
}

Migration from REST

| REST | WebRTC | |---|---| | await ultron.analyseFrame({ image }) | rtc.sendFrame(blob) + rtc.onFrameResult | | await ultron.chat({ message }) | rtc.sendChat(message) + rtc.onChatResult | | await ultron.startSession() | rtc.startSession() + rtc.onSessionStarted | | ultron.setModel('...') | rtc.setModel('...') |

The key difference: REST methods are request/response (await), while WebRTC methods are fire-and-forget with callbacks.

v2: Golden Frame pipeline & live assessment

With apiVersion: 'v2', the WebRTC session runs the Golden Frame pipeline and can act as a live assessment session. Capture faster (~5–10 fps) and pass seq + captureTs so the pipeline can order frames.

const ultron = new UltronLive({ apiKey: 'ulk_...', apiVersion: 'v2' });
const rtc = ultron.createWebRTC();

// Live commentary with golden tuning + a Tier-2 narrative every N frames
rtc.onFrameResult = (r) => console.log('frame:', r.response);
rtc.onFlowResult  = (f) => console.log('summary:', f.response);

rtc.onConnected = () => {
  rtc.startSession({
    golden: { stages: { stage2: { scorers: { motion: true }, synthesis: true } } },
  });
};
await rtc.connect();

let seq = 0;
setInterval(() => {
  if (!rtc.isReady) return;
  // ...capture base64 jpeg...
  rtc.sendFrameBase64(base64, { seq: seq++, captureTs: Date.now() });
}, 150);

Assessment session (structured verdict instead of per-frame commentary):

rtc.onAssessmentProgress = (p) => console.log(`${p.framesCaptured} frames`);
rtc.onAssessmentResult   = (r) => console.log('verdict:', r.assessment); // JSON string or text

rtc.onConnected = () => {
  rtc.startSession({
    outputType: 'json',
    prompt: 'Verify the sample-collection procedure.',
    responseSchema: { type: 'object', properties: { verdict: { type: 'string' } }, required: ['verdict'] },
  });
};
await rtc.connect();

// ...stream frames... then:
rtc.stopSession();   // emits the final assessment:result
// or rtc.finalize() to get a verdict mid-stream and keep going

Read info.ingestFps from onSessionStarted(sessionId, info) and set your capture interval to 1000 / info.ingestFps ms.

Running non-golden (legacy one-call-per-frame). Golden is on by default in v2 — omitting golden does not turn it off. Two ways to opt out:

// Per session: disable golden but stay on v2 (keeps tools/assessment available)
rtc.startSession({ golden: { enabled: false } });

// Or run the entire stable pipeline (no golden at all)
const ultron = new UltronLive({ apiKey: 'ulk_...', apiVersion: 'v1' });

Realtime Voice AI

The SDK supports a Realtime Voice AI mode that enables live, bidirectional voice conversations with the AI. The AI can see your screen in real-time (via the frame context from an active WebRTC or REST streaming session) and respond with natural speech.

This feature connects to either OpenAI Realtime (GPT-4o Realtime) or Gemini Live (Gemini 3.1 Flash Live) via a WebSocket relay at /ws/realtime (v1) or /ws/v2/realtime (v2). The relay keeps API keys server-side and translates between providers so the client uses a single protocol.

ultron.createRealtimeVoice() automatically uses the path for your configured apiVersion — the raw wss://.../ws/realtime URLs shown below are for reference if you connect manually.

When to Use Realtime Voice vs Chat

| | Chat (REST/WebRTC) | Realtime Voice | |---|---|---| | Interaction | Text-based, request/response | Voice-based, continuous | | Latency | Per-request | Persistent low-latency stream | | Screen context | Manual (pass frames) | Automatic (reads live frame store) | | Audio | Optional TTS on response | Native bidirectional audio | | Provider | Gemini 2.5 Flash / GPT-4o | Gemini 3.1 Flash Live / GPT-4o Realtime |

Quick Start

const ws = new WebSocket('wss://live-api.ultronai.me/ws/realtime');

// 1. Authenticate and choose provider
ws.onopen = () => {
  ws.send(JSON.stringify({
    type: 'auth',
    token: 'your_jwt_token',       // or apiKey: 'ulk_...'
    provider: 'gemini',             // 'openai' or 'gemini'
    sessionId: 'optional_session_id',
  }));
};

// 2. Handle messages
ws.onmessage = (event) => {
  const msg = JSON.parse(event.data);

  switch (msg.type) {
    case 'auth:ok':
      console.log('Authenticated:', msg.email);
      break;
    case 'relay:ready':
      console.log('Voice AI ready, provider:', msg.provider);
      break;
    case 'response.audio.delta':
      playAudioChunk(msg.delta);
      break;
    case 'response.audio_transcript.delta':
      showTranscript(msg.delta);
      break;
    case 'conversation.item.input_audio_transcription.completed':
      showUserTranscript(msg.transcript);
      break;
    case 'response.done':
      break;
    case 'input_audio_buffer.speech_started':
      stopAudioPlayback();
      break;
    case 'relay:error':
      console.error('Relay error:', msg.error);
      break;
    case 'relay:upstream_closed':
      console.warn('Upstream closed:', msg.code);
      break;
  }
};

// 3. Send audio (OpenAI Realtime protocol — works for both providers)
function sendAudio(base64PcmChunk) {
  ws.send(JSON.stringify({
    type: 'input_audio_buffer.append',
    audio: base64PcmChunk,  // PCM16, 16kHz, mono
  }));
}

// 4. Send an image for Gemini to see (Gemini provider only)
function sendImage(base64Image) {
  ws.send(JSON.stringify({
    type: 'input_image',
    image: base64Image,
    mimeType: 'image/jpeg',
  }));
}

// 5. Reset the session
ws.send(JSON.stringify({ type: 'reset', sessionId: 'new_session_id' }));

Protocol

The client uses the OpenAI Realtime protocol regardless of which provider is selected. The relay translates messages for Gemini automatically.

Client → Server:

| Message Type | Description | |---|---| | auth | Authenticate with token or apiKey, choose provider (openai or gemini). | | input_audio_buffer.append | Send a chunk of PCM16 audio (base64). | | input_image | Send an image for Gemini to see (Gemini provider only). | | reset | Close and reopen the upstream connection with fresh context. | | Any other | Forwarded verbatim to the upstream provider. |

Server → Client:

| Message Type | Description | |---|---| | auth:ok | Authentication successful. | | relay:ready | Upstream connection open, ready for audio. | | relay:error | Error from the relay. | | relay:upstream_closed | Upstream provider disconnected. | | session.created | Provider session initialized. | | response.audio.delta | Audio chunk from AI (base64). | | response.audio_transcript.delta | Text transcript of AI speech. | | response.audio.done | AI finished sending audio. | | response.done | AI turn complete. | | input_audio_buffer.speech_started | User started speaking (barge-in). | | conversation.item.input_audio_transcription.completed | Transcript of user speech. |

Providers

| Provider | Model | Audio Format | Features | |---|---|---|---| | openai | GPT-4o Realtime Preview | PCM16, 24kHz | Voice activity detection, whisper transcription | | gemini | Gemini 3.1 Flash Live Preview | PCM16, 16kHz | Activity detection, video input, output transcription |

Screen Context Integration

The Realtime Voice AI automatically has access to your screen context when a WebRTC or REST streaming session is active:

Frame commentaries from the live in-memory store (updated in real-time by the frame handler)
Session summaries from past sessions
Recent chat messages from today

Context is refreshed on every conversational turn. For Gemini, the latest captured frame image is also sent as video input.

Audio Format

Input: PCM16, 16kHz, mono (both providers)
Output (OpenAI): PCM16, 24kHz, mono
Output (Gemini): PCM16, 24kHz, mono (normalized to OpenAI format by relay)

Assessment (v2)

Requires apiVersion: 'v2'. Calling these on a v1 instance throws a clear error.

The assessment engine turns media (a video, frames, or documents) into a structured or text verdict — PASS/FAIL checklists, KYC checks, quality inspection, etc. Behind the scenes it's the same vision engine; an "assessment" is just config (outputType, prompt, optional responseSchema/templateId).

Upload & get a verdict

assessAndWait uploads the media and polls until the verdict is ready:

const ultron = new UltronLive({ apiKey: 'ulk_...', apiVersion: 'v2' });

const result = await ultron.assessment.assessAndWait({
  media: [fileInput.files[0]],          // a video, frames, or documents (≤30, ≤18 MB each)
  idImage: idFileInput.files[0],        // optional
  outputType: 'json',
  prompt: 'Is PPE worn correctly throughout?',
  responseSchema: {
    type: 'object',
    properties: { ppe: { type: 'boolean' }, notes: { type: 'string' } },
    required: ['ppe'],
  },
});

console.log(result.status);     // 'done'

// For json output, parse the calibrated verdict:
const v = JSON.parse(result.assessment); // VerdictResult
console.log(v.verdict);       // 'PASS' | 'FAIL' | 'UNCERTAIN'
console.log(v.confidence);    // 0–100 (the lowest per-check confidence)
console.log(v.needs_review);  // true → route to a human
console.log(v.checks);        // [{ name, result, confidence, observed }]

Calibrated verdict & human-in-the-loop

For outputType: 'json', the result is a calibrated VerdictResult — the SDK/backend guarantees the fields verdict (PASS/FAIL/UNCERTAIN), confidence (0–100), needs_review, plus per-question checks[]. Confidence is honest: an element that can't be directly verified is capped low and marked UNCERTAIN (absence of evidence is never a confident PASS/FAIL).

Whenever confidence is below the template's confidenceThreshold (default 70) the verdict is forced to UNCERTAIN and needs_review: true (the model's lean is kept in original_verdict) — so a human is pulled in automatically:

const v = JSON.parse(result.assessment);
if (v.needs_review) routeToHumanReviewer(v);   // confidence below threshold or genuinely uncertain
else applyAutomatically(v.verdict);

Set the cutoff per template via confidenceThreshold, or globally with the ASSESSMENT_CONFIDENCE_THRESHOLD env var on the backend.

Add or skip questions per run

Adjust a template's question set for a single run without editing the stored template — append extraChecks or drop existing ones with skipChecks (by 1-based index or title):

await ultron.assessment.assessAndWait({
  media: [file],
  templateId: tpl.id,
  extraChecks: [{ title: 'Gloves worn', instruction: 'Were gloves worn the whole time?' }],
  skipChecks: [2, 'Identity Match'],   // drop question #2 and the "Identity Match" check
});

The same extraChecks/skipChecks work on a live session via rtc.startSession({ ... }). To change a test's questions permanently, edit the template's checks with ultron.assessment.templates.update(id, { checks: [...] }).

Off-camera collections. For privately-collected types (urine, vaginal, rectal) where only the post-collection sealing/packaging is on camera, set collectionMode: 'off_camera'. It auto-skips every check marked onCameraOnly: true and tells the model not to penalize the unseen collection:

await ultron.assessment.assessAndWait({ media: [file], templateId, collectionMode: 'off_camera' });
// Mark collection-step questions so they drop automatically:
//   checks: [{ title: 'Proper Collection', instruction: '…', onCameraOnly: true }, …]

Prefer to manage polling yourself? Use the lower-level calls:

const { jobId } = await ultron.assessment.assess({ media: [file], outputType: 'text' });
const status = await ultron.assessment.get(jobId); // poll → { status: 'processing' | 'done' | ... }
const history = await ultron.assessment.list();     // the caller's past assessments

File inputs: in the browser pass File/Blob objects directly. In Node, pass { data: buffer, filename: 'clip.mp4', contentType: 'video/mp4' }.

Single frame

const r = await ultron.assessment.assessFrame(base64DataUrl, {
  outputType: 'json',
  prompt: 'Is the gauge in the green zone?',
  responseSchema: { type: 'object', properties: { ok: { type: 'boolean' } } },
});

Templates (saved presets / collection types)

Any authed user can create their own templates. The list returns the shared library plus the caller's own templates; each entry is tagged mine, shared, and canEdit.

const { data: all }    = await ultron.assessment.templates.list();
const { data: mine }   = await ultron.assessment.templates.list({ scope: 'mine' });
const { data: shared } = await ultron.assessment.templates.list({ scope: 'shared' });
const { data: tpl }    = await ultron.assessment.templates.get(id);

// Create a private template owned by the caller
const { data: created } = await ultron.assessment.templates.create({
  name: 'My Inspection Test',
  outputType: 'json',
  promptTemplate: 'Inspect the unit. {{checks}}',
  checks: [{ title: 'Power LED on', instruction: 'Is the green LED visible?' }],
});

// Reference a template from any assessment with templateId + fields:
await ultron.assessment.assessAndWait({
  media: [file],
  templateId: tpl.id,
  fields: { collectionType: 'Oral swab' },
});

Admins can publish to the shared library by passing shared: true on create/update.

Live (real-time) assessment over a screen/camera share uses the WebRTC client — see v2: Golden Frame pipeline & live assessment.

`AssessOptions`

| Field | Type | Required | Description | |---|---|---|---| | media | Array<File \| Blob \| AssessFile> | Yes | 1 video, many frames, or documents (≤30, ≤18 MB each). | | idImage | File \| Blob \| AssessFile | No | Optional ID image. | | outputType | 'text' \| 'json' | No | text (default) or structured json. | | prompt | string | No | Use-case instructions. | | promptMode | 'append' \| 'replace' | No | append (default) keeps the base prompt; replace swaps it. | | responseSchema | string \| object | When json | JSON schema for structured output (verdict/confidence/needs_review fields are added automatically). | | templateId | string | No | A saved preset id. | | fields | object | No | Values for a template's {{placeholders}}. | | extraChecks | AssessmentCheck[] | No | Extra questions appended for this run: { title, instruction }. | | skipChecks | Array<number \| string> | No | Existing questions to skip this run, by 1-based index or title. | | model | ModelValue | No | Model override. |

MCP Servers (v2)

Requires apiVersion: 'v2'. HTTP transport only.

Manage the authenticated user's own MCP ("web-MCP") servers — external tool endpoints the AI can call during live sessions. Secret header values are encrypted at rest and never returned (responses contain only a masked hint).

const ultron = new UltronLive({ apiKey: 'ulk_...', apiVersion: 'v2' });

// Add a server
const { data: server } = await ultron.mcp.servers.create({
  name: 'docs',
  url: 'https://example.com/mcp',
  headers: { 'X-Workspace': 'acme' },
  secrets: [{ key: 'Authorization', value: 'Bearer abc123' }], // encrypted at rest
});

// Validate it (connect + list tools)
await ultron.mcp.servers.test(server.id);

// List, update, remove
const { data: servers } = await ultron.mcp.servers.list();
await ultron.mcp.servers.update(server.id, { enabled: false });
await ultron.mcp.servers.remove(server.id);

// Effective tools available to the user (system + their MCP servers)
const { data: tools } = await ultron.mcp.tools();

| Method | Description | |---|---| | mcp.servers.list() | List the user's servers (secrets masked). | | mcp.servers.get(id) | Fetch one server. | | mcp.servers.create(input) | Add a server ({ name, url, headers?, secrets?, enabled? }). | | mcp.servers.update(id, input) | Update (re-encrypts provided secrets). | | mcp.servers.remove(id) | Delete a server. | | mcp.servers.test(id) | Connect + list tools to validate config/credentials. | | mcp.tools() | Effective tools available to the user. |

Stream Preview

Show a live, zero-lag preview of the captured screen share or video stream. The preview uses the raw MediaStream directly (not re-encoded canvas frames), so there is no perceptible delay.

`ultron.attachPreview(videoElement)`

The easiest way to show a preview. Pass any <video> element and the SDK wires it up automatically.

await ultron.startScreenShare({
  onTranscript: (text) => console.log(text),
});

// Attach the live stream to a <video> element
const previewEl = document.getElementById('preview') as HTMLVideoElement;
ultron.attachPreview(previewEl);

The SDK sets autoplay, playsInline, and muted on the element. When ultron.stop() is called, all attached preview elements are automatically detached.

`ultron.detachPreview(videoElement)`

Manually detach a preview element before stopping the stream.

ultron.detachPreview(previewEl);

`ultron.getPreviewStream()`

Returns the active MediaStream or null if not streaming. Use this when you need full control over how the preview is rendered.

const stream = ultron.getPreviewStream();
if (stream) {
  const videoEl = document.createElement('video');
  videoEl.srcObject = stream;
  videoEl.autoplay = true;
  videoEl.muted = true;
  document.body.appendChild(videoEl);
}

Preview Methods Reference

| Method | Signature | Returns | Description | |---|---|---|---| | attachPreview | (el: HTMLVideoElement) => void | void | Attach the live stream to a <video> element. Throws if no active stream. | | detachPreview | (el: HTMLVideoElement) => void | void | Detach a previously attached preview element. | | getPreviewStream | () => MediaStream \| null | MediaStream \| null | Get the raw active MediaStream, or null. |

Notes

Preview elements are automatically cleaned up when ultron.stop() is called.
You can attach multiple <video> elements simultaneously.
The preview shows the original stream, not the downscaled JPEG frames sent to the AI — so it's full resolution with zero lag.

Chat

Chat with the AI about the current or past screen sessions. The AI has access to all frames, commentaries, and summaries from up to 10 recent sessions.

When the user asks about a specific visual moment, the AI automatically uses a tool-based frame retrieval system to search through curated session frames by keyword, tags, or time range — and includes the actual screenshots in its reasoning.

`ultron.chat(options)`

const reply = await ultron.chat({
  sessionId: ultron.sessionId,  // optional — attach to current session
  message: 'What was I working on 5 minutes ago?',
  frames: [currentFrameDataUrl], // optional — include current screen
  enableAudio: true,             // optional — include TTS audio in response
  voiceName: 'Kore',            // optional — Gemini TTS voice name
});

console.log(reply.response);         // "You were editing the login component..."
console.log(reply.creditsRemaining); // 9500
console.log(reply.latency);          // 1200 (ms)
console.log(reply.usedFrameUrls);    // ['https://...frame1.jpg'] — referenced screenshots
console.log(reply.audio);            // base64 WAV audio (if enableAudio was true)

`ChatOptions`

| Field | Type | Required | Description | |---|---|---|---| | message | string | Yes | The user's question or message. | | sessionId | string \| null | No | Session ID for context. Pass ultron.sessionId during streaming. | | frames | string[] | No | Base-64 data-URL screenshots to include as visual context. | | enableAudio | boolean | No | If true, includes TTS audio of the AI response. | | voiceName | string | No | Gemini TTS voice name (e.g. Kore, Zephyr). Default Kore. |

`ChatResult`

| Field | Type | Description | |---|---|---| | success | boolean | Whether the chat succeeded. | | response | string | The AI's response. | | creditsRemaining | number | Updated credit balance. | | latency | number | Processing time in ms. | | usage | { inputTokens: number; outputTokens: number } | Token usage. | | usedFrameUrls | string[] | (optional) URLs of screenshots the AI referenced via tool call. | | audio | string | (optional) Base64-encoded WAV audio. Present only when enableAudio: true. | | audioMimeType | string | (optional) "audio/wav". Present only when enableAudio: true. |

Chat Examples

// Ask about current screen
const reply1 = await ultron.chat({
  message: 'What am I looking at right now?',
  frames: [captureCurrentFrame()],
});

// Ask about session history (no frames needed — AI searches automatically)
const reply2 = await ultron.chat({
  sessionId: ultron.sessionId,
  message: 'Give me a summary of this session',
});

// Ask with audio response
const reply3 = await ultron.chat({
  sessionId: ultron.sessionId,
  message: 'What happened in the last 2 minutes?',
  enableAudio: true,
  voiceName: 'Zephyr',
});

// Standalone chat (no session, no frames)
const reply4 = await ultron.chat({
  message: 'What AI models do you support?',
});

Text-to-Speech (TTS)

Convert any text to speech audio. Returns a Blob that can be played in the browser.

`ultron.tts(options)`

const blob = await ultron.tts({
  text: 'Hello from Ultron Live!',
  voiceName: 'Zephyr',  // optional — Gemini voice name
});

// Play the audio
const audio = new Audio(URL.createObjectURL(blob));
audio.play();

`TtsOptions`

| Field | Type | Required | Description | |---|---|---|---| | text | string | Yes | The text to convert to speech. | | voiceName | string | No | Gemini TTS voice name. Defaults to 'Zephyr'. | | apiKey | string | No | Override API key for this TTS call. |

Returns: Promise<Blob> — an audio/mpeg blob.

TTS with Download

const blob = await ultron.tts({ text: 'Save this narration' });

// Create a download link
const url = URL.createObjectURL(blob);
const a = document.createElement('a');
a.href = url;
a.download = 'narration.mp3';
a.click();
URL.revokeObjectURL(url);

TTS with Custom Audio Element

const blob = await ultron.tts({ text: 'Custom playback' });
const audioEl = document.getElementById('my-audio') as HTMLAudioElement;
audioEl.src = URL.createObjectURL(blob);
audioEl.play();

Audio Player & Voice Narration

The SDK includes a built-in audio player that automatically narrates AI commentary during streaming. It supports two voice providers: Gemini TTS and ElevenLabs.

Enable/Disable Audio During Streaming

// Start streaming first
await ultron.startScreenShare({
  model: 'gemini-2.5-flash',
  onTranscript: (text) => setPreviewText(text),
});

// Enable voice narration
ultron.setAudioEnabled(true);

// Disable voice narration (stops current audio)
ultron.setAudioEnabled(false);

Configure Gemini Voice

ultron.configureAudio({
  voiceProvider: 'gemini',
  geminiVoiceName: 'Zephyr',  // Gemini voice name
});
ultron.setAudioEnabled(true);

Available Gemini TTS Voices

Zephyr, Puck, Charon, Kore, Fenrir, Leda, Orus, Aoede, Callirrhoe, Autonoe, Enceladus, Iapetus, Umbriel, Algieba, Despina, Erinome, Algenib, Rasalgethi, Laomedeia, Achernar, Alnilam, Schedar, Gacrux, Pulcherrima, Achird, Zubenelgenubi, Vindemiatrix, Sadachbia, Sadaltager, Sulafat

Configure ElevenLabs Voice

ultron.configureAudio({
  voiceProvider: 'elevenlabs',
  elevenLabsApiKey: 'your_elevenlabs_api_key',
  elevenLabsVoiceId: '21m00Tcm4TlvDq8ikWAM', // Rachel (default)
});
ultron.setAudioEnabled(true);

`AudioPlayerOptions`

| Field | Type | Default | Description | |---|---|---|---| | voiceProvider | 'gemini' \| 'elevenlabs' | 'gemini' | Which TTS provider to use. | | elevenLabsApiKey | string | '' | Required if using ElevenLabs. | | elevenLabsVoiceId | string | '21m00Tcm4TlvDq8ikWAM' | ElevenLabs voice ID (Rachel by default). | | geminiVoiceName | string | 'Zephyr' | Gemini TTS voice name. | | skipPhrases | string[] | ['no visual change'] | Phrases to skip narrating (case-insensitive match). |

Audio Behavior

Audio uses a priority queue — if a new commentary arrives while the previous is still playing, it queues the new one and plays it next
Only one audio plays at a time (no overlap)
Identical consecutive commentaries are deduplicated (not spoken twice)
Phrases matching skipPhrases are silently skipped
Calling ultron.stop() stops all audio and clears the queue

Skip Phrases Example

ultron.configureAudio({
  voiceProvider: 'gemini',
  skipPhrases: [
    'no visual change',
    'nothing new',
    'same screen',
  ],
});

Any commentary containing these phrases (case-insensitive) will not be narrated.

Model Management

The SDK ships with a built-in model catalogue and helpers for browsing and switching models.

Available Models

| Value | Label | Provider | Badge | |---|---|---|---| | gpt-4o-realtime | GPT-4o Realtime | OpenAI | OpenAI | | gpt-4o | GPT-4o | OpenAI | OpenAI | | gpt-4o-mini | GPT-4o mini | OpenAI | OpenAI | | gemini-3.1-pro-preview | Gemini 3.1 Pro | Google | Google | | gemini-3-flash-preview | Gemini 3 Flash | Google | Google | | gemini-3.1-flash-lite-preview | Gemini 3.1 Flash-Lite | Google | Google | | gemini-2.5-pro | Gemini 2.5 Pro | Google | Google | | gemini-2.5-flash | Gemini 2.5 Flash | Google | Google | | gemini-2.5-flash-lite | Gemini 2.5 Flash-Lite | Google | Google | | gemini-2.5-flash-image | Gemini 2.5 Flash-Image | Google | Google | | gemini-2.0-flash | Gemini 2.0 Flash | Google | Google | | gemini-2.0-flash-lite | Gemini 2.0 Flash-Lite | Google | Google |

Browse Models

// All models
const allModels = ultron.models; // ModelMeta[]

// Filter by provider
const geminiModels = ultron.getModelsByProvider('gemini');
const openaiModels = ultron.getModelsByProvider('openai');

// Get metadata for a specific model
const meta = ultron.getModelMeta('gemini-2.5-flash');
console.log(meta);
// { value: 'gemini-2.5-flash', label: 'Gemini 2.5 Flash', provider: 'gemini', badge: 'Google' }

Switch Model at Runtime

// Change the default model (takes effect on next frame)
ultron.setModel('gpt-4o');

// Or override per-call
await ultron.analyseFrame({
  image: frameData,
  model: 'gemini-2.5-pro', // just for this frame
});

`ModelMeta` Type

interface ModelMeta {
  value: ModelValue;      // e.g. 'gemini-2.5-flash'
  label: string;          // e.g. 'Gemini 2.5 Flash'
  provider: ModelProvider; // 'openai' | 'gemini'
  badge: string;          // 'OpenAI' | 'Google'
}

Custom / Pass-Through Models

The ModelValue type accepts any string, so you can pass custom model identifiers:

ultron.setModel('my-custom-model-v2');

Build a Model Selector UI

const select = document.getElementById('model-select') as HTMLSelectElement;

ultron.models.forEach((m) => {
  const option = document.createElement('option');
  option.value = m.value;
  option.textContent = `${m.label} (${m.badge})`;
  select.appendChild(option);
});

select.onchange = () => ultron.setModel(select.value);

Error Handling

The SDK provides three typed error classes for structured error handling.

Error Classes

| Class | Status Code | When Thrown | |---|---|---| | UltronAuthError | 401 | Invalid or expired API key / JWT. | | UltronCreditsError | 402 | Insufficient credits to process the request. | | UltronAPIError | Any | General API error (parent class of the above). |

Error Class Hierarchy

UltronAPIError (extends Error)
├── UltronAuthError (401)
└── UltronCreditsError (402)

All errors have a statusCode property and a message.

Catching Errors

import {
  UltronAPIError,
  UltronAuthError,
  UltronCreditsError,
} from 'ultron-live-sdk';

try {
  const result = await ultron.analyseFrame({ image: dataUrl });
} catch (err) {
  if (err instanceof UltronCreditsError) {
    // 402 — user needs to top up
    showTopUpDialog();
  } else if (err instanceof UltronAuthError) {
    // 401 — token expired, redirect to login
    redirectToLogin();
  } else if (err instanceof UltronAPIError) {
    // Other API error
    console.error(`API Error [${err.statusCode}]: ${err.message}`);
  } else {
    // Network error or unexpected
    console.error('Unexpected error:', err);
  }
}

Error Handling in Streaming

During streaming, errors are delivered via the onError and onCreditsExhausted callbacks instead of throwing:

await ultron.startScreenShare({
  onTranscript: (text) => console.log(text),

  onCreditsExhausted: () => {
    // Stream auto-stops when credits run out
    showTopUpDialog();
  },

  onError: (err) => {
    // Non-credit API/network errors
    if (err instanceof UltronAuthError) {
      ultron.stop();
      redirectToLogin();
    } else {
      console.error('Stream error:', err.message);
    }
  },
});

Error Properties

class UltronAPIError extends Error {
  name: string;        // 'UltronAPIError'
  message: string;     // Error description
  statusCode: number;  // HTTP status code
}

Low-Level Modules

The SDK exports its internal building blocks for advanced use cases.

`UltronHttpClient`

Direct HTTP client for all API endpoints. Use when you need full control.

import { UltronHttpClient } from 'ultron-live-sdk';

const client = new UltronHttpClient('https://live-api.ultronai.me', 'ulk_your_key');

// Auth
await client.sendOtp({ email: '[email protected]' });
const auth = await client.verifyOtp({ email: '[email protected]', otp: '123456' });
client.setApiKey(auth.token);
const me = await client.getMe();

// Session
const session = await client.startSession();

// Vision
const result = await client.analyseFrame({
  image: dataUrl,
  model: 'gemini-2.5-flash',
  history: [],
  sessionId: session.sessionId,
});

// Chat
const chat = await client.chat({
  sessionId: session.sessionId,
  message: 'What happened?',
  frames: [],
});

// TTS (returns Blob)
const audioBlob = await client.ttsGemini({ text: 'Hello' });

API Endpoints Used

Paths below show the v1 prefix; with apiVersion: 'v2' the SDK calls the /api/v2/*, /ws/v2, and /ws/v2/realtime equivalents automatically.

| Method | Endpoint | Description | |---|---|---| | POST | /api/{v}/auth/signup · /login · /send-otp | Request an OTP | | POST | /api/{v}/auth/verify-otp | Verify OTP, get JWT | | GET | /api/{v}/auth/me | Get user profile (protected) | | POST | /api/{v}/auth/timezone · PUT /auth/profile | Update profile/timezone (protected) | | GET | /api/{v}/auth/transactions · `/subscriptio

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

Ultron Live SDK — Complete Documentation

Table of Contents

Installation

Import

UMD (Browser)

Quick Start

SDK Configuration

Constructor: new UltronLive(config)

UltronSDKConfig — All Options

Reading Config at Runtime

API Versions (v1 & v2)

Authentication

Step 1: Request an OTP

Step 2: Verify OTP

Step 3: Get Current User Profile

Step 4: Update Timezone

Updating the API Key

Full Auth Flow Example

Browser-safe auth (keep your API key secret)

Updating the token at runtime

Screen Share Streaming

ultron.startScreenShare(options?)

StartScreenShareOptions — All Fields

Stopping the Stream

Checking Stream State

Video Stream Analysis

ultron.startVideoStream(options)

From a <video> Element

From a Canvas

StartVideoStreamOptions — All Fields

Frame Analysis (Low-Level)

ultron.analyseFrame(options)

AnalyseFrameOptions

AnalyseFrameResult

Session Management

ultron.startSession()

WebRTC (Low-Latency Mode)

When to Use WebRTC vs REST

Quick Start

ultron.createWebRTC(overrides?)

Standalone Usage

WebRTCConfig

Callbacks

Methods

Properties

WebRTCChatOptions

Response Types

WebRTCFrameResult

WebRTCChatResult

Data Channels

Frame Loop Example

Reconnection

Migration from REST

v2: Golden Frame pipeline & live assessment

Realtime Voice AI

When to Use Realtime Voice vs Chat

Quick Start

Protocol

Providers

Screen Context Integration

Audio Format

Assessment (v2)

Upload & get a verdict

Calibrated verdict & human-in-the-loop

Add or skip questions per run

Single frame

Templates (saved presets / collection types)

AssessOptions

MCP Servers (v2)

Stream Preview

ultron.attachPreview(videoElement)

ultron.detachPreview(videoElement)

ultron.getPreviewStream()

Preview Methods Reference

Constructor: `new UltronLive(config)`

`UltronSDKConfig` — All Options

`ultron.startScreenShare(options?)`

`StartScreenShareOptions` — All Fields

`ultron.startVideoStream(options)`

From a `<video>` Element

`StartVideoStreamOptions` — All Fields

`ultron.analyseFrame(options)`

`AnalyseFrameOptions`

`AnalyseFrameResult`

`ultron.startSession()`

`ultron.createWebRTC(overrides?)`

`WebRTCConfig`

`WebRTCChatOptions`

`WebRTCFrameResult`

`WebRTCChatResult`

`AssessOptions`

`ultron.attachPreview(videoElement)`

`ultron.detachPreview(videoElement)`

`ultron.getPreviewStream()`

`ultron.chat(options)`

`ChatOptions`

`ChatResult`

`ultron.tts(options)`

`TtsOptions`

`AudioPlayerOptions`

`ModelMeta` Type

`UltronHttpClient`