npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

ultron-live-sdk

v1.2.0

Published

Official JavaScript/TypeScript SDK for Ultron Live — real-time AI vision, screen commentary, session management, and TTS, powered by GPT-4o and Gemini.

Readme

Ultron Live SDK — Complete Documentation

Version 1.2.0 | npm | Homepage

📧 For any confusion or queries, contact us at [email protected]

🔑 To get your API key, visit live.ultronai.me, log in with your email ID, and grab your API key from the Playground page.

The official JavaScript/TypeScript SDK for Ultron Live — real-time AI vision analysis, screen commentary, session chat, text-to-speech, realtime voice AI, and WebRTC low-latency streaming. Supports REST, WebRTC, and WebSocket transports.


Table of Contents

  1. Installation
  2. Quick Start
  3. SDK Configuration
  4. Authentication
  5. Screen Share Streaming
  6. Video Stream Analysis
  7. Frame Analysis (Low-Level)
  8. Session Management
  9. Chat
  10. WebRTC (Low-Latency Mode)
  11. Realtime Voice AI
  12. Stream Preview
  13. Text-to-Speech (TTS)
  14. Audio Player & Voice Narration
  15. Model Management
  16. Error Handling
  17. Low-Level Modules
  18. TypeScript Types Reference
  19. Framework Examples
  20. Constants & Defaults
  21. Full API Reference

Installation

npm install ultron-live-sdk

Import

// ESM (recommended)
import { UltronLive } from 'ultron-live-sdk';

// CommonJS
const { UltronLive } = require('ultron-live-sdk');

UMD (Browser)

<script src="https://cdn.skypack.dev/ultron-live-sdk"></script>

Quick Start

import { UltronLive } from 'ultron-live-sdk';

// 1. Initialize with your API key
const ultron = new UltronLive({ apiKey: 'ulk_your_api_key' });

// 2. Start screen share with AI commentary
await ultron.startScreenShare({
  model: 'gemini-2.5-flash',
  onTranscript: (text) => // Show text preview or other functionality,
  onCreditsUpdate: (remaining) => // Show Credits remaining number,
  onCreditsExhausted: () => // Take action on credit exhaustion,
  onError: (err) => console.error(err),
  onStop: () => // Perform some action on stop streaming,
});

// 3. Stop when done
ultron.stop();

SDK Configuration

Constructor: new UltronLive(config)

const ultron = new UltronLive({
  apiKey: 'ulk_your_api_key',           // Required — your JWT token
  baseUrl: 'https://live-api.ultronai.me', // Optional — backend URL
  model: 'gemini-3.1-flash-lite-preview',  // Optional — default AI model
  historyWindow: 5,                      // Optional — context window size
  frameInterval: 300-3000,                   // Optional — ms between frame captures
  frameQuality: 0.4,                     // Optional — JPEG quality (0–1)
  maxFrameWidth: 1280,                   // Optional — max capture width in px
  maxFrameHeight: 720,                   // Optional — max capture height in px
});

UltronSDKConfig — All Options

| Option | Type | Default | Description | |---|---|---|---| | apiKey | string | required | Your Ultron API key (starts with ulk_). Obtain from your dashboard or via auth.getMe(). | | baseUrl | string | https://live-api.ultronai.me | Base URL of the Ultron Live backend. | | model | ModelValue | gemini-3.1-flash-lite-preview | Default AI model for vision analysis. | | historyWindow | number | 5 | Number of recent commentary strings kept in context for continuity. | | frameInterval | number | 1500 | Milliseconds between each frame capture during streaming. | | frameQuality | number | 0.4 | JPEG compression quality for captured frames (0 = worst, 1 = best). | | maxFrameWidth | number | 1280 | Maximum width in pixels for captured frames. Frames are scaled down if larger. | | maxFrameHeight | number | 720 | Maximum height in pixels for captured frames. |

Reading Config at Runtime

const config = ultron.config;
console.log(config.apiKey);       // current API key
console.log(config.model);        // current default model
console.log(config.frameInterval); // 1500

ultron.config returns a Readonly<Required<UltronSDKConfig>> — all fields are guaranteed present.


Authentication

The SDK provides a built-in OTP-based authentication flow via ultron.auth. Two authentication methods are supported:

  1. API Keyx-api-key header or Authorization: Bearer ulk_...
  2. JWTAuthorization: Bearer <jwt> (obtained via OTP verification)

Step 1: Send OTP

const result = await ultron.auth.sendOtp({ email: '[email protected]' });
console.log(result); // { success: true, data: null, message: "OTP sent successfully to your email" message: 'OTP sent' }

Parameters:

| Field | Type | Required | Description | |---|---|---|---| | email | string | Yes | The email address to send the OTP to. |

Returns: Promise<{ success: boolean; message: string }>

Step 2: Verify OTP

const response = await ultron.auth.verifyOtp({
  email: '[email protected]',
  otp: '123456',
  timezone: 'America/New_York', // optional — IANA timezone string
});

const { token, user } = response.data;

console.log(token);        // JWT token string
console.log(user.email);   // '[email protected]'
console.log(user.credits); // 10000

// Store and apply the token
ultron.setApiKey(token);

Parameters:

| Field | Type | Required | Description | |---|---|---|---| | email | string | Yes | The email used in Step 1. | | otp | string | Yes | The 6-digit OTP received via email. | | timezone | string | No | IANA timezone string (e.g. America/New_York). Stored for timezone-aware features. |

Returns: Promise<AuthResult>

interface AuthResult {
  success: boolean;
  data: {
    token: string;
    user: UltronUser;
  };
}

interface UltronUser {
  id: string;
  email: string;
  apiKey: string;
  credits: number;
  isVerified: boolean;
  lastLoginAt: string;
  createdAt: string;
  timezone: string | null;
  plan: 'free' | 'pro' | 'enterprise';
  planStartedAt: string | null;
  subscriptionStatus: string | null;
  subscriptionCurrentPeriodEnd: string | null;
  cancelAtPeriodEnd: boolean;
}

Step 3: Get Current User Profile

const me = await ultron.auth.getMe();
console.log(me.data.email);   // '[email protected]'
console.log(me.data.credits); // remaining credits

Returns: Promise<GetMeResult>

interface GetMeResult {
  success: boolean;
  data: UltronUser;
}

Step 4: Update Timezone

await ultron.auth.updateTimezone({ timezone: 'Asia/Kolkata' });

Validates the IANA timezone string and updates the user record. Safe to call on every page load.

| Field | Type | Required | Description | |---|---|---|---| | timezone | string | Yes | Valid IANA timezone string (e.g. Europe/London, Asia/Kolkata). |

Updating the API Key

After login or token refresh, update the SDK's key:

ultron.setApiKey('new_jwt_token');

This updates both the internal config and the HTTP client immediately.

Full Auth Flow Example

import { UltronLive } from 'ultron-live-sdk';

const ultron = new UltronLive({ apiKey: '' }); // empty key initially

// 1. Send OTP
await ultron.auth.sendOtp({ email: '[email protected]' });

// 2. User enters OTP from email...
const userVerification = await ultron.auth.verifyOtp({
  email: '[email protected]',
  otp: '123456',
  timezone: Intl.DateTimeFormat().resolvedOptions().timeZone,
});

const { token, user } = userVerification.data

// 3. Apply the token
ultron.setApiKey(token);

// 4. Now use the SDK normally
await ultron.startScreenShare({
  onTranscript: (text) => console.log(text),
});

Screen Share Streaming

The primary feature of the SDK. Requests browser screen share access and runs a continuous AI commentary loop.

ultron.startScreenShare(options?)

await ultron.startScreenShare({
  // Model override (optional — uses SDK default if omitted)
  model: 'gemini-2.5-flash',

  // Frame capture settings (optional — uses SDK defaults)
  frameInterval: 300-3500, //Choose a number in this range
  frameQuality: 0.4,
  maxFrameWidth: 1280,
  maxFrameHeight: 720,
  historyWindow: 5,

  // Provide your own MediaStream instead of requesting getDisplayMedia()
  stream: existingMediaStream, // optional

  // Callbacks
  onTranscript: (text) => {
    // Called with each AI commentary string (≤20 words)
  },
  onCreditsUpdate: (remaining) => {
    // Called after each frame with updated credit balance
  },
  onLatency: (ms) => {
    // Called with processing latency in milliseconds
  },
  onFrame: (dataUrl) => {
    // Called with the raw captured frame as a JPEG data-URL
    // Useful for displaying a preview
    previewImg.src = dataUrl;
  },
  onCreditsExhausted: () => {
    // Called when credits reach zero — stream auto-stops
    // Alert user to show usage limit reached 
  },
  onError: (err) => {
    // Called on API/network errors (non-credit errors)
    console.error('Error:', err.message);
  },
  onStop: () => {
    // Called when the stream stops (user action or track ended)
    setStreamEnded(true)
  },
});

StartScreenShareOptions — All Fields

| Field | Type | Default | Description | |---|---|---|---| | stream | MediaStream | — | Supply an existing stream instead of requesting getDisplayMedia(). | | model | ModelValue | SDK default | Override the AI model for this session. | | frameInterval | number | 1500 | Ms between frame captures. | | frameQuality | number | 0.4 | JPEG quality (0–1). | | maxFrameWidth | number | 1280 | Max frame width in px. | | maxFrameHeight | number | 720 | Max frame height in px. | | historyWindow | number | 5 | Number of past commentaries to send as context. | | onTranscript | (text: string) => void | — | New AI commentary. | | onCreditsUpdate | (remaining: number) => void | — | Updated credit balance. | | onLatency | (ms: number) => void | — | Frame processing latency. | | onFrame | (dataUrl: string) => void | — | Raw captured frame data-URL. | | onStop | () => void | — | Stream ended. | | onCreditsExhausted | () => void | — | Credits hit zero. | | onError | (err: Error) => void | — | API/network error. |

Stopping the Stream

ultron.stop();

This:

  • Stops the frame capture interval
  • Stops all MediaStream tracks (ends the screen share)
  • Stops any playing audio
  • Clears the history buffer

Checking Stream State

if (ultron.isStreaming) {
  // Currently streaming, session id: ultron.sessionId
}

| Property | Type | Description | |---|---|---| | isStreaming | boolean | Whether a stream analysis loop is running. | | sessionId | string \| null | The active session ID, or null when not streaming. |


Video Stream Analysis

Analyze any MediaStream — webcam, <video> element capture, canvas stream, etc.

ultron.startVideoStream(options)

// From a webcam
const stream = await navigator.mediaDevices.getUserMedia({ video: true });

await ultron.startVideoStream({
  stream,  // Required — any MediaStream
  model: 'gpt-4o',
  frameInterval: 2000,
  onTranscript: (text) => // show transcript text,
  onCreditsUpdate: (remaining) => // show users their remaining credits,
  onError: (err) => console.error(err),
});

From a <video> Element

const videoEl = document.getElementById('my-video') as HTMLVideoElement;
const stream = videoEl.captureStream();

await ultron.startVideoStream({
  stream,
  model: 'gemini-2.5-flash',
  onTranscript: // Perform your task,
});

From a Canvas

const canvas = document.getElementById('game-canvas') as HTMLCanvasElement;
const stream = canvas.captureStream(30); // 30 FPS

await ultron.startVideoStream({
  stream,
  model: 'gemini-2.5-flash',
  frameInterval: 1000,
  onTranscript: (text) => showCommentary(text),
});

StartVideoStreamOptions — All Fields

Same as StartScreenShareOptions except stream is required (not optional).

| Field | Type | Required | Description | |---|---|---|---| | stream | MediaStream | Yes | The video stream to analyze. | | All other fields | — | No | Same as StartScreenShareOptions above. |


Frame Analysis (Low-Level)

Send a single frame for AI vision analysis without using the streaming loop. Useful for custom capture pipelines or one-off analysis.

ultron.analyseFrame(options)

const result = await ultron.analyseFrame({
  image: 'data:image/jpeg;base64,/9j/4AAQSkZJRg...',
  model: 'gemini-2.5-flash',           // optional — uses SDK default
  history: ['Player opened the menu.'], // optional — previous commentaries
  sessionId: 'sess_abc123',            // optional — attach to a session
  timestamp: Date.now(),               // optional — client timestamp
  enableAudio: true,                   // optional — include TTS audio in response
  voiceName: 'Kore',                   // optional — Gemini TTS voice name
});

console.log(result.response);          // "Player is navigating the inventory screen."
console.log(result.latency);           // 342 (ms)
console.log(result.creditsRemaining);  // 9842
console.log(result.creditsUsed);       // 3
console.log(result.usage);             // { inputTokens: 1200, outputTokens: 45 }
console.log(result.model);             // 'gemini-2.5-flash'
console.log(result.provider);          // 'gemini'
console.log(result.audio);             // base64 WAV audio (if enableAudio was true)

AnalyseFrameOptions

| Field | Type | Required | Description | |---|---|---|---| | image | string | Yes | Base-64 data-URL (data:image/jpeg;base64,...) or raw base-64 JPEG string. | | model | ModelValue | No | Override the AI model. Falls back to SDK default. | | history | string[] | No | Previous commentary strings for context continuity. | | sessionId | string \| null | No | Session ID to attach this frame to. | | timestamp | number | No | Client-side Unix timestamp. Defaults to Date.now(). | | enableAudio | boolean | No | If true, includes TTS audio of the commentary in the response. | | voiceName | string | No | Gemini TTS voice name (e.g. Kore, Zephyr, Puck). Default Kore. |

AnalyseFrameResult

| Field | Type | Description | |---|---|---| | success | boolean | Whether the analysis succeeded. | | response | string | The AI-generated commentary (≤20 words). | | latency | number | Processing time in milliseconds. | | receivedAt | string | ISO 8601 server timestamp. | | model | string | Model that was used. | | provider | string | Provider ('openai' or 'gemini'). | | usage | { inputTokens: number; outputTokens: number } | Token usage for this frame. | | creditsUsed | number | Credits consumed by this call. | | creditsRemaining | number | Remaining credit balance. | | audio | string | (optional) Base64-encoded WAV audio. Present only when enableAudio: true. | | audioMimeType | string | (optional) "audio/wav". Present only when enableAudio: true. |


Session Management

Sessions group frames and chat messages together, giving the AI context about an ongoing viewing session. The backend automatically builds rich context from sessions including:

  • Frame commentaries — timestamped AI descriptions of each captured frame
  • Summaries — AI-generated summaries produced every 10 frames using Gemini 2.5 Pro
  • Curated high-context frames — deduplicated, tagged frames selected by AI for efficient retrieval

ultron.startSession()

const { success, sessionId } = await ultron.startSession();
console.log(sessionId); // 'sess_abc123def456'

Returns: Promise<StartSessionResult>

interface StartSessionResult {
  success: boolean;
  sessionId: string;
}

Sessions are created automatically when you call startScreenShare() or startVideoStream(). Use startSession() only when building a custom pipeline with analyseFrame() and chat().


WebRTC (Low-Latency Mode)

The SDK supports a WebRTC transport for lower latency, persistent connections, and binary frame transfer. All real-time data flows through WebRTC data channels instead of individual HTTP requests.

When to Use WebRTC vs REST

| | REST (default) | WebRTC | |---|---|---| | Latency | Higher (new TCP per request) | Lower (persistent P2P) | | Frame format | Base64 in JSON (33% bloat) | Raw binary ArrayBuffer | | Connection | Stateless | Persistent | | Setup complexity | None | Requires signaling handshake |

Auth endpoints (/api/v1/auth/*) and TTS (/api/tts/gemini) remain REST-only regardless of transport.

Quick Start

import { UltronLive } from 'ultron-live-sdk';

const ultron = new UltronLive({ apiKey: 'ulk_your_api_key' });

// Create a WebRTC client (inherits apiKey/token/baseUrl from the SDK)
const rtc = ultron.createWebRTC();

// Wire up callbacks
rtc.onConnected = () => {
  console.log('WebRTC ready');
  rtc.startSession();
};
rtc.onFrameResult = (result) => {
  console.log(result.response);       // AI commentary
  console.log(result.creditsRemaining);
};
rtc.onChatResult = (result) => {
  console.log(result.response);
};
rtc.onSessionStarted = (sessionId) => {
  console.log('Session:', sessionId);
};
rtc.onDisconnected = (reason) => {
  console.warn('Disconnected:', reason);
};

// Connect (resolves when all 3 data channels are open)
await rtc.connect();

// Send frames as binary (preferred)
canvas.toBlob((blob) => rtc.sendFrame(blob), 'image/jpeg', 0.4);

// Or send as base64 if you already have it
rtc.sendFrameBase64(base64String, Date.now(), 'gemini-2.5-flash');

// Chat
rtc.sendChat('What was I working on?', { enableAudio: true });

// Change model
rtc.setModel('gemini-2.5-flash');

// Disconnect
rtc.disconnect();

ultron.createWebRTC(overrides?)

Factory method that creates an UltronWebRTCClient pre-configured with the SDK's baseUrl, apiKey, and token.

// Uses SDK config
const rtc = ultron.createWebRTC();

// Override specific options
const rtc = ultron.createWebRTC({
  connectionTimeout: 20000,
});

Standalone Usage

You can also use UltronWebRTCClient directly without the main UltronLive class:

import { UltronWebRTCClient } from 'ultron-live-sdk';

const rtc = new UltronWebRTCClient({
  backendUrl: 'https://live-api.ultronai.me',
  apiKey: 'ulk_your_api_key',
});

await rtc.connect();

WebRTCConfig

| Option | Type | Default | Description | |---|---|---|---| | backendUrl | string | https://live-api.ultronai.me | Backend URL. | | apiKey | string | — | API key (ulk_...). | | token | string | — | JWT token (alternative to apiKey). | | connectionTimeout | number | 15000 | Connection timeout in ms. |

Callbacks

Set these on the client instance before calling connect().

| Callback | Signature | Description | |---|---|---| | onFrameResult | (result: WebRTCFrameResult) => void | AI commentary for a frame. | | onFrameError | (error: WebRTCFrameError) => void | Frame processing error. | | onChatResult | (result: WebRTCChatResult) => void | Chat response. | | onChatError | (error: WebRTCChatError) => void | Chat error. | | onSessionStarted | (sessionId: string) => void | Session created. | | onSessionStopped | () => void | Session ended. | | onConnected | () => void | All data channels open. | | onDisconnected | (reason: string) => void | Connection lost. | | onError | (error: string) => void | General error from server. | | onConnectionStateChange | (state: RTCPeerConnectionState) => void | Peer connection state changed. |

Methods

| Method | Signature | Description | |---|---|---| | connect() | () => Promise<void> | Connect and wait for all 3 data channels to open. | | sendFrame(data) | (data: Blob \| ArrayBuffer) => void | Send a binary frame (zero encoding overhead). | | sendFrameBase64(base64, timestamp?, model?) | (base64: string, timestamp?: number, model?: string) => void | Send a frame as base64 JSON. | | sendChat(message, options?) | (message: string, options?: WebRTCChatOptions) => void | Send a chat message. | | startSession() | () => void | Start a recording session. | | stopSession() | () => void | Stop the current session. | | setModel(model) | (model: string) => void | Change the AI model. | | ping() | () => void | Ping for latency measurement. | | disconnect() | () => void | Close all connections and clean up. |

Properties

| Property | Type | Description | |---|---|---| | isReady | boolean | Whether all 3 data channels are open. | | connected | boolean | Whether the connection is established. | | sessionId | string \| null | Current session ID. |

WebRTCChatOptions

| Field | Type | Required | Description | |---|---|---|---| | sessionId | string | No | Override session (defaults to current). | | frames | string[] | No | Base64 frame data-URLs for visual context. | | enableAudio | boolean | No | Include TTS audio in response. | | voiceName | string | No | TTS voice name. |

Response Types

WebRTCFrameResult

| Field | Type | Description | |---|---|---| | type | 'frame:result' | Message type. | | response | string | AI commentary. | | latency | number | Server processing time (ms). | | model | string | Model used. | | usage | { inputTokens, outputTokens } | Token usage. | | creditsUsed | number | Credits deducted. | | creditsRemaining | number | Remaining balance. | | sessionId | string \| null | Current session ID. |

WebRTCChatResult

| Field | Type | Description | |---|---|---| | type | 'chat:result' | Message type. | | response | string | AI answer. | | creditsRemaining | number | Remaining balance. | | latency | number | Processing time (ms). | | usage | { inputTokens, outputTokens } | Token usage. | | usedFrameUrls | string[] | (optional) Referenced image URLs. | | audio | string | (optional) Base64 MP3 audio. | | audioMimeType | string | (optional) "audio/mpeg". |

Data Channels

The WebRTC connection uses three data channels:

| Channel | Mode | Purpose | |---|---|---| | frames | Unreliable, unordered | Frame data (binary or JSON). Dropped frames are expected. | | chat | Reliable, ordered | Chat messages and responses. | | control | Reliable, ordered | Session start/stop, model config, ping/pong. |

Frame Loop Example

const rtc = ultron.createWebRTC();
rtc.onFrameResult = (r) => updateUI(r.response, r.creditsRemaining);
rtc.onConnected = () => rtc.startSession();
await rtc.connect();

// Capture and send frames every 1.5s
const interval = setInterval(() => {
  if (!rtc.isReady) return;
  const canvas = document.createElement('canvas');
  canvas.width = Math.min(video.videoWidth, 1280);
  canvas.height = Math.min(video.videoHeight, 720);
  canvas.getContext('2d')!.drawImage(video, 0, 0, canvas.width, canvas.height);
  canvas.toBlob((blob) => rtc.sendFrame(blob!), 'image/jpeg', 0.4);
}, 1500);

// Clean up
window.addEventListener('beforeunload', () => {
  clearInterval(interval);
  rtc.disconnect();
});

Reconnection

async function connectWithRetry(rtc, maxRetries = 3) {
  for (let i = 0; i < maxRetries; i++) {
    try {
      await rtc.connect();
      return;
    } catch (err) {
      console.warn(`Attempt ${i + 1} failed:`, err.message);
      if (i < maxRetries - 1) await new Promise(r => setTimeout(r, 2000 * (i + 1)));
    }
  }
  throw new Error('Failed to connect after retries');
}

Migration from REST

| REST | WebRTC | |---|---| | await ultron.analyseFrame({ image }) | rtc.sendFrame(blob) + rtc.onFrameResult | | await ultron.chat({ message }) | rtc.sendChat(message) + rtc.onChatResult | | await ultron.startSession() | rtc.startSession() + rtc.onSessionStarted | | ultron.setModel('...') | rtc.setModel('...') |

The key difference: REST methods are request/response (await), while WebRTC methods are fire-and-forget with callbacks.


Realtime Voice AI

The SDK supports a Realtime Voice AI mode that enables live, bidirectional voice conversations with the AI. The AI can see your screen in real-time (via the frame context from an active WebRTC or REST streaming session) and respond with natural speech.

This feature connects to either OpenAI Realtime (GPT-4o Realtime) or Gemini Live (Gemini 3.1 Flash Live) via a WebSocket relay at /ws/realtime. The relay keeps API keys server-side and translates between providers so the client uses a single protocol.

When to Use Realtime Voice vs Chat

| | Chat (REST/WebRTC) | Realtime Voice | |---|---|---| | Interaction | Text-based, request/response | Voice-based, continuous | | Latency | Per-request | Persistent low-latency stream | | Screen context | Manual (pass frames) | Automatic (reads live frame store) | | Audio | Optional TTS on response | Native bidirectional audio | | Provider | Gemini 2.5 Flash / GPT-4o | Gemini 3.1 Flash Live / GPT-4o Realtime |

Quick Start

const ws = new WebSocket('wss://live-api.ultronai.me/ws/realtime');

// 1. Authenticate and choose provider
ws.onopen = () => {
  ws.send(JSON.stringify({
    type: 'auth',
    token: 'your_jwt_token',       // or apiKey: 'ulk_...'
    provider: 'gemini',             // 'openai' or 'gemini'
    sessionId: 'optional_session_id',
  }));
};

// 2. Handle messages
ws.onmessage = (event) => {
  const msg = JSON.parse(event.data);

  switch (msg.type) {
    case 'auth:ok':
      console.log('Authenticated:', msg.email);
      break;
    case 'relay:ready':
      console.log('Voice AI ready, provider:', msg.provider);
      break;
    case 'response.audio.delta':
      playAudioChunk(msg.delta);
      break;
    case 'response.audio_transcript.delta':
      showTranscript(msg.delta);
      break;
    case 'conversation.item.input_audio_transcription.completed':
      showUserTranscript(msg.transcript);
      break;
    case 'response.done':
      break;
    case 'input_audio_buffer.speech_started':
      stopAudioPlayback();
      break;
    case 'relay:error':
      console.error('Relay error:', msg.error);
      break;
    case 'relay:upstream_closed':
      console.warn('Upstream closed:', msg.code);
      break;
  }
};

// 3. Send audio (OpenAI Realtime protocol — works for both providers)
function sendAudio(base64PcmChunk) {
  ws.send(JSON.stringify({
    type: 'input_audio_buffer.append',
    audio: base64PcmChunk,  // PCM16, 16kHz, mono
  }));
}

// 4. Send an image for Gemini to see (Gemini provider only)
function sendImage(base64Image) {
  ws.send(JSON.stringify({
    type: 'input_image',
    image: base64Image,
    mimeType: 'image/jpeg',
  }));
}

// 5. Reset the session
ws.send(JSON.stringify({ type: 'reset', sessionId: 'new_session_id' }));

Protocol

The client uses the OpenAI Realtime protocol regardless of which provider is selected. The relay translates messages for Gemini automatically.

Client → Server:

| Message Type | Description | |---|---| | auth | Authenticate with token or apiKey, choose provider (openai or gemini). | | input_audio_buffer.append | Send a chunk of PCM16 audio (base64). | | input_image | Send an image for Gemini to see (Gemini provider only). | | reset | Close and reopen the upstream connection with fresh context. | | Any other | Forwarded verbatim to the upstream provider. |

Server → Client:

| Message Type | Description | |---|---| | auth:ok | Authentication successful. | | relay:ready | Upstream connection open, ready for audio. | | relay:error | Error from the relay. | | relay:upstream_closed | Upstream provider disconnected. | | session.created | Provider session initialized. | | response.audio.delta | Audio chunk from AI (base64). | | response.audio_transcript.delta | Text transcript of AI speech. | | response.audio.done | AI finished sending audio. | | response.done | AI turn complete. | | input_audio_buffer.speech_started | User started speaking (barge-in). | | conversation.item.input_audio_transcription.completed | Transcript of user speech. |

Providers

| Provider | Model | Audio Format | Features | |---|---|---|---| | openai | GPT-4o Realtime Preview | PCM16, 24kHz | Voice activity detection, whisper transcription | | gemini | Gemini 3.1 Flash Live Preview | PCM16, 16kHz | Activity detection, video input, output transcription |

Screen Context Integration

The Realtime Voice AI automatically has access to your screen context when a WebRTC or REST streaming session is active:

  1. Frame commentaries from the live in-memory store (updated in real-time by the frame handler)
  2. Session summaries from past sessions
  3. Recent chat messages from today

Context is refreshed on every conversational turn. For Gemini, the latest captured frame image is also sent as video input.

Audio Format

  • Input: PCM16, 16kHz, mono (both providers)
  • Output (OpenAI): PCM16, 24kHz, mono
  • Output (Gemini): PCM16, 24kHz, mono (normalized to OpenAI format by relay)

Stream Preview

Show a live, zero-lag preview of the captured screen share or video stream. The preview uses the raw MediaStream directly (not re-encoded canvas frames), so there is no perceptible delay.

ultron.attachPreview(videoElement)

The easiest way to show a preview. Pass any <video> element and the SDK wires it up automatically.

await ultron.startScreenShare({
  onTranscript: (text) => console.log(text),
});

// Attach the live stream to a <video> element
const previewEl = document.getElementById('preview') as HTMLVideoElement;
ultron.attachPreview(previewEl);

The SDK sets autoplay, playsInline, and muted on the element. When ultron.stop() is called, all attached preview elements are automatically detached.

ultron.detachPreview(videoElement)

Manually detach a preview element before stopping the stream.

ultron.detachPreview(previewEl);

ultron.getPreviewStream()

Returns the active MediaStream or null if not streaming. Use this when you need full control over how the preview is rendered.

const stream = ultron.getPreviewStream();
if (stream) {
  const videoEl = document.createElement('video');
  videoEl.srcObject = stream;
  videoEl.autoplay = true;
  videoEl.muted = true;
  document.body.appendChild(videoEl);
}

Preview Methods Reference

| Method | Signature | Returns | Description | |---|---|---|---| | attachPreview | (el: HTMLVideoElement) => void | void | Attach the live stream to a <video> element. Throws if no active stream. | | detachPreview | (el: HTMLVideoElement) => void | void | Detach a previously attached preview element. | | getPreviewStream | () => MediaStream \| null | MediaStream \| null | Get the raw active MediaStream, or null. |

Notes

  • Preview elements are automatically cleaned up when ultron.stop() is called.
  • You can attach multiple <video> elements simultaneously.
  • The preview shows the original stream, not the downscaled JPEG frames sent to the AI — so it's full resolution with zero lag.

Chat

Chat with the AI about the current or past screen sessions. The AI has access to all frames, commentaries, and summaries from up to 10 recent sessions.

When the user asks about a specific visual moment, the AI automatically uses a tool-based frame retrieval system to search through curated session frames by keyword, tags, or time range — and includes the actual screenshots in its reasoning.

ultron.chat(options)

const reply = await ultron.chat({
  sessionId: ultron.sessionId,  // optional — attach to current session
  message: 'What was I working on 5 minutes ago?',
  frames: [currentFrameDataUrl], // optional — include current screen
  enableAudio: true,             // optional — include TTS audio in response
  voiceName: 'Kore',            // optional — Gemini TTS voice name
});

console.log(reply.response);         // "You were editing the login component..."
console.log(reply.creditsRemaining); // 9500
console.log(reply.latency);          // 1200 (ms)
console.log(reply.usedFrameUrls);    // ['https://...frame1.jpg'] — referenced screenshots
console.log(reply.audio);            // base64 WAV audio (if enableAudio was true)

ChatOptions

| Field | Type | Required | Description | |---|---|---|---| | message | string | Yes | The user's question or message. | | sessionId | string \| null | No | Session ID for context. Pass ultron.sessionId during streaming. | | frames | string[] | No | Base-64 data-URL screenshots to include as visual context. | | enableAudio | boolean | No | If true, includes TTS audio of the AI response. | | voiceName | string | No | Gemini TTS voice name (e.g. Kore, Zephyr). Default Kore. |

ChatResult

| Field | Type | Description | |---|---|---| | success | boolean | Whether the chat succeeded. | | response | string | The AI's response. | | creditsRemaining | number | Updated credit balance. | | latency | number | Processing time in ms. | | usage | { inputTokens: number; outputTokens: number } | Token usage. | | usedFrameUrls | string[] | (optional) URLs of screenshots the AI referenced via tool call. | | audio | string | (optional) Base64-encoded WAV audio. Present only when enableAudio: true. | | audioMimeType | string | (optional) "audio/wav". Present only when enableAudio: true. |

Chat Examples

// Ask about current screen
const reply1 = await ultron.chat({
  message: 'What am I looking at right now?',
  frames: [captureCurrentFrame()],
});

// Ask about session history (no frames needed — AI searches automatically)
const reply2 = await ultron.chat({
  sessionId: ultron.sessionId,
  message: 'Give me a summary of this session',
});

// Ask with audio response
const reply3 = await ultron.chat({
  sessionId: ultron.sessionId,
  message: 'What happened in the last 2 minutes?',
  enableAudio: true,
  voiceName: 'Zephyr',
});

// Standalone chat (no session, no frames)
const reply4 = await ultron.chat({
  message: 'What AI models do you support?',
});

Text-to-Speech (TTS)

Convert any text to speech audio. Returns a Blob that can be played in the browser.

ultron.tts(options)

const blob = await ultron.tts({
  text: 'Hello from Ultron Live!',
  voiceName: 'Zephyr',  // optional — Gemini voice name
});

// Play the audio
const audio = new Audio(URL.createObjectURL(blob));
audio.play();

TtsOptions

| Field | Type | Required | Description | |---|---|---|---| | text | string | Yes | The text to convert to speech. | | voiceName | string | No | Gemini TTS voice name. Defaults to 'Zephyr'. | | apiKey | string | No | Override API key for this TTS call. |

Returns: Promise<Blob> — an audio/mpeg blob.

TTS with Download

const blob = await ultron.tts({ text: 'Save this narration' });

// Create a download link
const url = URL.createObjectURL(blob);
const a = document.createElement('a');
a.href = url;
a.download = 'narration.mp3';
a.click();
URL.revokeObjectURL(url);

TTS with Custom Audio Element

const blob = await ultron.tts({ text: 'Custom playback' });
const audioEl = document.getElementById('my-audio') as HTMLAudioElement;
audioEl.src = URL.createObjectURL(blob);
audioEl.play();

Audio Player & Voice Narration

The SDK includes a built-in audio player that automatically narrates AI commentary during streaming. It supports two voice providers: Gemini TTS and ElevenLabs.

Enable/Disable Audio During Streaming

// Start streaming first
await ultron.startScreenShare({
  model: 'gemini-2.5-flash',
  onTranscript: (text) => setPreviewText(text),
});

// Enable voice narration
ultron.setAudioEnabled(true);

// Disable voice narration (stops current audio)
ultron.setAudioEnabled(false);

Configure Gemini Voice

ultron.configureAudio({
  voiceProvider: 'gemini',
  geminiVoiceName: 'Zephyr',  // Gemini voice name
});
ultron.setAudioEnabled(true);

Available Gemini TTS Voices

Zephyr, Puck, Charon, Kore, Fenrir, Leda, Orus, Aoede, Callirrhoe, Autonoe, Enceladus, Iapetus, Umbriel, Algieba, Despina, Erinome, Algenib, Rasalgethi, Laomedeia, Achernar, Alnilam, Schedar, Gacrux, Pulcherrima, Achird, Zubenelgenubi, Vindemiatrix, Sadachbia, Sadaltager, Sulafat

Configure ElevenLabs Voice

ultron.configureAudio({
  voiceProvider: 'elevenlabs',
  elevenLabsApiKey: 'your_elevenlabs_api_key',
  elevenLabsVoiceId: '21m00Tcm4TlvDq8ikWAM', // Rachel (default)
});
ultron.setAudioEnabled(true);

AudioPlayerOptions

| Field | Type | Default | Description | |---|---|---|---| | voiceProvider | 'gemini' \| 'elevenlabs' | 'gemini' | Which TTS provider to use. | | elevenLabsApiKey | string | '' | Required if using ElevenLabs. | | elevenLabsVoiceId | string | '21m00Tcm4TlvDq8ikWAM' | ElevenLabs voice ID (Rachel by default). | | geminiVoiceName | string | 'Zephyr' | Gemini TTS voice name. | | skipPhrases | string[] | ['no visual change'] | Phrases to skip narrating (case-insensitive match). |

Audio Behavior

  • Audio uses a priority queue — if a new commentary arrives while the previous is still playing, it queues the new one and plays it next
  • Only one audio plays at a time (no overlap)
  • Identical consecutive commentaries are deduplicated (not spoken twice)
  • Phrases matching skipPhrases are silently skipped
  • Calling ultron.stop() stops all audio and clears the queue

Skip Phrases Example

ultron.configureAudio({
  voiceProvider: 'gemini',
  skipPhrases: [
    'no visual change',
    'nothing new',
    'same screen',
  ],
});

Any commentary containing these phrases (case-insensitive) will not be narrated.


Model Management

The SDK ships with a built-in model catalogue and helpers for browsing and switching models.

Available Models

| Value | Label | Provider | Badge | |---|---|---|---| | gpt-4o-realtime | GPT-4o Realtime | OpenAI | OpenAI | | gpt-4o | GPT-4o | OpenAI | OpenAI | | gpt-4o-mini | GPT-4o mini | OpenAI | OpenAI | | gemini-3.1-pro-preview | Gemini 3.1 Pro | Google | Google | | gemini-3-flash-preview | Gemini 3 Flash | Google | Google | | gemini-3.1-flash-lite-preview | Gemini 3.1 Flash-Lite | Google | Google | | gemini-2.5-pro | Gemini 2.5 Pro | Google | Google | | gemini-2.5-flash | Gemini 2.5 Flash | Google | Google | | gemini-2.5-flash-lite | Gemini 2.5 Flash-Lite | Google | Google | | gemini-2.5-flash-image | Gemini 2.5 Flash-Image | Google | Google | | gemini-2.0-flash | Gemini 2.0 Flash | Google | Google | | gemini-2.0-flash-lite | Gemini 2.0 Flash-Lite | Google | Google |

Browse Models

// All models
const allModels = ultron.models; // ModelMeta[]

// Filter by provider
const geminiModels = ultron.getModelsByProvider('gemini');
const openaiModels = ultron.getModelsByProvider('openai');

// Get metadata for a specific model
const meta = ultron.getModelMeta('gemini-2.5-flash');
console.log(meta);
// { value: 'gemini-2.5-flash', label: 'Gemini 2.5 Flash', provider: 'gemini', badge: 'Google' }

Switch Model at Runtime

// Change the default model (takes effect on next frame)
ultron.setModel('gpt-4o');

// Or override per-call
await ultron.analyseFrame({
  image: frameData,
  model: 'gemini-2.5-pro', // just for this frame
});

ModelMeta Type

interface ModelMeta {
  value: ModelValue;      // e.g. 'gemini-2.5-flash'
  label: string;          // e.g. 'Gemini 2.5 Flash'
  provider: ModelProvider; // 'openai' | 'gemini'
  badge: string;          // 'OpenAI' | 'Google'
}

Custom / Pass-Through Models

The ModelValue type accepts any string, so you can pass custom model identifiers:

ultron.setModel('my-custom-model-v2');

Build a Model Selector UI

const select = document.getElementById('model-select') as HTMLSelectElement;

ultron.models.forEach((m) => {
  const option = document.createElement('option');
  option.value = m.value;
  option.textContent = `${m.label} (${m.badge})`;
  select.appendChild(option);
});

select.onchange = () => ultron.setModel(select.value);

Error Handling

The SDK provides three typed error classes for structured error handling.

Error Classes

| Class | Status Code | When Thrown | |---|---|---| | UltronAuthError | 401 | Invalid or expired API key / JWT. | | UltronCreditsError | 402 | Insufficient credits to process the request. | | UltronAPIError | Any | General API error (parent class of the above). |

Error Class Hierarchy

UltronAPIError (extends Error)
├── UltronAuthError (401)
└── UltronCreditsError (402)

All errors have a statusCode property and a message.

Catching Errors

import {
  UltronAPIError,
  UltronAuthError,
  UltronCreditsError,
} from 'ultron-live-sdk';

try {
  const result = await ultron.analyseFrame({ image: dataUrl });
} catch (err) {
  if (err instanceof UltronCreditsError) {
    // 402 — user needs to top up
    showTopUpDialog();
  } else if (err instanceof UltronAuthError) {
    // 401 — token expired, redirect to login
    redirectToLogin();
  } else if (err instanceof UltronAPIError) {
    // Other API error
    console.error(`API Error [${err.statusCode}]: ${err.message}`);
  } else {
    // Network error or unexpected
    console.error('Unexpected error:', err);
  }
}

Error Handling in Streaming

During streaming, errors are delivered via the onError and onCreditsExhausted callbacks instead of throwing:

await ultron.startScreenShare({
  onTranscript: (text) => console.log(text),

  onCreditsExhausted: () => {
    // Stream auto-stops when credits run out
    showTopUpDialog();
  },

  onError: (err) => {
    // Non-credit API/network errors
    if (err instanceof UltronAuthError) {
      ultron.stop();
      redirectToLogin();
    } else {
      console.error('Stream error:', err.message);
    }
  },
});

Error Properties

class UltronAPIError extends Error {
  name: string;        // 'UltronAPIError'
  message: string;     // Error description
  statusCode: number;  // HTTP status code
}

Low-Level Modules

The SDK exports its internal building blocks for advanced use cases.

UltronHttpClient

Direct HTTP client for all API endpoints. Use when you need full control.

import { UltronHttpClient } from 'ultron-live-sdk';

const client = new UltronHttpClient('https://live-api.ultronai.me', 'ulk_your_key');

// Auth
await client.sendOtp({ email: '[email protected]' });
const auth = await client.verifyOtp({ email: '[email protected]', otp: '123456' });
client.setApiKey(auth.token);
const me = await client.getMe();

// Session
const session = await client.startSession();

// Vision
const result = await client.analyseFrame({
  image: dataUrl,
  model: 'gemini-2.5-flash',
  history: [],
  sessionId: session.sessionId,
});

// Chat
const chat = await client.chat({
  sessionId: session.sessionId,
  message: 'What happened?',
  frames: [],
});

// TTS (returns Blob)
const audioBlob = await client.ttsGemini({ text: 'Hello' });

API Endpoints Used

| Method | Endpoint | Description | |---|---|---| | POST | /api/v1/auth/send-otp | Send OTP email | | POST | /api/v1/auth/verify-otp | Verify OTP, get JWT | | GET | /api/v1/auth/me | Get user profile (protected) | | POST | /api/v1/auth/timezone | Update user timezone (protected) | | POST | /api/v1/vision/session/start | Create session (protected) | | POST | /api/v1/vision | Analyze a frame (protected) | | POST | /api/v1/vision/chat | Chat about session (protected) | | POST | /api/tts/gemini | Google Cloud text-to-speech | | GET | /api/v1/webrtc/config | Get ICE server configuration (protected) | | GET | /api/health | Health check | | WS | /ws | WebRTC signaling | | WS | /ws/realtime | Realtime voice AI relay |

UltronStreamer

The streaming orchestrator that ties together sessions, frame capture, vision API, and TTS.

import { UltronHttpClient, UltronStreamer } from 'ultron-live-sdk';

const client = new UltronHttpClient('https://live-api.ultronai.me', 'ulk_key');
const streamer = new UltronStreamer(client);

// Configure audio
streamer.configureAudio({
  voiceProvider: 'gemini',
  geminiVoiceName: 'Zephyr',
  enabled: true,
});

// Start screen share
await streamer.startScreenShare({
  model: 'gemini-2.5-flash',
  frameInterval: 1500,
  onTranscript: console.log,
});

// Check state
console.log(streamer.isRunning);       // true
console.log(streamer.currentSessionId); // 'sess_...'

// Toggle audio
streamer.setAudioEnabled(false);

// Stop
streamer.stop();

FrameCapturer

Captures JPEG frames from any MediaStream at a configurable interval.

import { FrameCapturer } from 'ultron-live-sdk';

const capturer = new FrameCapturer({
  frameInterval: 1000,   // capture every 1s
  frameQuality: 0.5,     // 50% JPEG quality
  maxFrameWidth: 1920,
  maxFrameHeight: 1080,
});

const stream = await navigator.mediaDevices.getDisplayMedia({ video: true });

// Start capturing
capturer.start(stream, (dataUrl) => {
  console.log('Captured frame:', dataUrl.length, 'chars');
  // Send to your own pipeline
});

// Capture a single frame on demand
const singleFrame = capturer.captureNow();

// Stop capturing
capturer.stop();

FrameCapturer Methods

| Method | Returns | Description | |---|---|---| | start(stream, onFrame) | void | Start capturing frames from a MediaStream. | | stop() | void | Stop capturing and clean up resources. | | captureNow() | string \| null | Capture a single frame immediately. Returns data-URL or null. |

UltronAudioPlayer

Priority-queue audio player that prevents overlap. Supports Gemini and ElevenLabs TTS.

import { UltronAudioPlayer } from 'ultron-live-sdk';

const player = new UltronAudioPlayer({
  voiceProvider: 'gemini',
  geminiVoiceName: 'Zephyr',
  skipPhrases: ['no visual change'],
});

// Set backend URL (needed for Gemini TTS endpoint)
player.backendUrl = 'https://live-api.ultronai.me';

// Speak text (auto-deduplicates, skips quiet phrases)
await player.speak('Player just scored a goal!');
await player.speak('Player just scored a goal!'); // skipped — same as last

// Reconfigure at runtime
player.configure({
  voiceProvider: 'elevenlabs',
  elevenLabsApiKey: 'your_key',
  elevenLabsVoiceId: '21m00Tcm4TlvDq8ikWAM',
});

// Stop all audio and clear queue
player.stop();

UltronAudioPlayer Methods

| Method | Returns | Description | |---|---|---| | speak(text, lastSpoken?) | Promise<void> | Speak text. Deduplicates against lastSpoken. Skips skipPhrases. | | stop() | void | Stop current audio and clear the queue. | | configure(opts) | void | Update voice provider options at runtime. |


TypeScript Types Reference

All types are exported from the package and can be imported:

import type {
  UltronSDKConfig,
  ModelValue,
  ModelMeta,
  ModelProvider,
  VoiceProvider,
  SpeakingStyle,
  AnalyseFrameOptions,
  AnalyseFrameResult,
  StartSessionResult,
  ChatOptions,
  ChatResult,
  TtsOptions,
  GetMeResult,
  UltronUser,
  SendOtpOptions,
  VerifyOtpOptions,
  AuthResult,
  StartScreenShareOptions,
  StartVideoStreamOptions,
  StreamerCallbacks,
  StreamerOptions,
  AudioPlayerOptions,
  WebRTCConfig,
  WebRTCCallbacks,
  WebRTCChatOptions,
  WebRTCFrameResult,
  WebRTCFrameError,
  WebRTCChatResult,
  WebRTCChatError,
} from 'ultron-live-sdk';

Core Types

type ModelProvider = 'openai' | 'gemini';

type ModelValue =
  | 'gpt-4o-realtime'
  | 'gpt-4o'
  | 'gpt-4o-mini'
  | 'gemini-3.1-pro-preview'
  | 'gemini-3-flash-preview'
  | 'gemini-3.1-flash-lite-preview'
  | 'gemini-2.5-pro'
  | 'gemini-2.5-flash'
  | 'gemini-2.5-flash-lite'
  | 'gemini-2.5-flash-image'
  | 'gemini-2.0-flash'
  | 'gemini-2.0-flash-lite'
  | (string & {});  // allows custom model strings

type VoiceProvider = 'gemini' | 'elevenlabs';

type SpeakingStyle = 'Dramatic' | 'Neutral' | 'Energetic';

Config Types

interface UltronSDKConfig {
  apiKey: string;
  baseUrl?: string;
  model?: ModelValue;
  historyWindow?: number;
  frameInterval?: number;
  frameQuality?: number;
  maxFrameWidth?: number;
  maxFrameHeight?: number;
}

interface ModelMeta {
  value: ModelValue;
  label: string;
  provider: ModelProvider;
  badge: string;
}

API Request/Response Types

interface AnalyseFrameOptions {
  image: string;
  timestamp?: number;
  model?: ModelValue;
  history?: string[];
  sessionId?: string | null;
  enableAudio?: boolean;
  voiceName?: string;
}

interface AnalyseFrameResult {
  success: boolean;
  response: string;
  latency: number;
  receivedAt: string;
  model: string;
  provider: string;
  usage: { inputTokens: number; outputTokens: number };
  creditsUsed: number;
  creditsRemaining: number;
  audio?: string;
  audioMimeType?: string;
}

interface StartSessionResult {
  success: boolean;
  sessionId: string;
}

interface ChatOptions {
  sessionId?: string | null;
  message: string;
  frames?: string[];
  enableAudio?: boolean;
  voiceName?: string;
}

interface ChatResult {
  success: boolean;
  response: string;
  creditsRemaining: number;
  latency: number;
  usage: { inputTokens: number; outputTokens: number };
  usedFrameUrls?: string[];
  audio?: string;
  audioMimeType?: string;
}

interface TtsOptions {
  text: string;
  voiceName?: string;
  apiKey?: string;
}

Auth Types

interface SendOtpOptions {
  email: string;
}

interface VerifyOtpOptions {
  email: string;
  otp: string;
  timezone?: string;
}

interface AuthResult {
  success: boolean;
  data: {
    token: string;
    user: UltronUser;
  };
}

interface UltronUser {
  id: string;
  email: string;
  apiKey: string;
  credits: number;
  isVerified: boolean;
  lastLoginAt: string;
  createdAt: string;
  timezone: string | null;
  plan: 'free' | 'pro' | 'enterprise';
  planStartedAt: string | null;
  subscriptionStatus: string | null;
  subscriptionCurrentPeriodEnd: string | null;
  cancelAtPeriodEnd: boolean;
}

interface GetMeResult {
  success: boolean;
  data: UltronUser;
}

Streaming Types

interface StreamerCallbacks {
  onTranscript?: (text: string) => void;
  onCreditsUpdate?: (remaining: number) => void;
  onLatency?: (latencyMs: number) => void;
  onStop?: () => void;
  onCreditsExhausted?: () => void;
  onError?: (err: Error) => void;
  onFrame?: (frameDataUrl: string) => void;
}

interface StreamerOptions extends StreamerCallbacks {
  model?: ModelValue;
  frameInterval?: number;
  frameQuality?: number;
  maxFrameWidth?: number;
  maxFrameHeight?: number;
  historyWindow?: number;
}

interface StartScreenShareOptions extends StreamerOptions {
  stream?: MediaStream;
}

interface StartVideoStreamOptions extends StreamerOptions {
  stream: MediaStream; // required
}

interface AudioPlayerOptions {
  voiceProvider?: VoiceProvider;
  elevenLabsApiKey?: string;
  elevenLabsVoiceId?: string;
  geminiVoiceName?: string;
  skipPhrases?: string[];
}

WebRTC Types

interface WebRTCConfig {
  backendUrl?: string;
  apiKey?: string;
  token?: string;
  connectionTimeout?: number;
}

interface WebRTCFrameResult {
  type: 'frame:result';
  response: string;
  latency: number;
  model: string;
  usage: { inputTokens: number; outputTokens: number };
  creditsUsed: number;
  creditsRemaining: number;
  sessionId: string | null;
}

interface WebRTCFrameError {
  type: 'frame:error';
  error: string;
  credits?: number;
}

interface WebRTCChatResult {
  type: 'chat:result';
  response: string;
  creditsRemaining: number;
  latency: number;
  usage: { inputTokens: number; outputTokens: number };
  usedFrameUrls?: string[];
  audio?: string;
  audioMimeType?: string;
}

interface WebRTCChatError {
  type: 'chat:error';
  error: string;
}

interface WebRTCChatOptions {
  sessionId?: string;
  frames?: string[];
  enableAudio?: boolean;
  voiceName?: string;
}

interface WebRTCCallbacks {
  onFrameResult?: (result: WebRTCFrameResult) => void;
  onFrameError?: (error: WebRTCFrameError) => void;
  onChatResult?: (result: WebRTCChatResult) => void;
  onChatError?: (error: WebRTCChatError) => void;
  onSessionStarted?: (sessionId: string) => void;
  onSessionStopped?: () => void;
  onConnected?: () => void;
  onDisconnected?: (reason: string) => void;
  onError?: (error: string) => void;
  onConnectionStateChange?: (state: RTCPeerConnectionState) => void;
}

Framework Examples

React — Full Feature Example

import { useState, useRef, useCallback } from 'react';
import { UltronLive } from 'ultron-live-sdk';

export function UltronDashboard() {
  const ultronRef = useRef(new UltronLive({
    apiKey: localStorage.getItem('ultron_token') || '',
  }));
  const ultron = ultronRef.current;
  const previewRef = useRef<HTMLVideoElement>(null);

  const [transcript, setTranscript] = useState('');
  const [credits, setCredits] = useState<number | null>(null);
  const [latency, setLatency] = useState<number | null>(null);
  const [streaming, setStreaming] = useState(false);
  const [model, setModel] = useState(ultron.config.model);

  const startStream = useCallback(async () => {
    await ultron.startScreenShare({
      model,
      onTranscript: (text) => setTranscript(text),
      onCreditsUpdate: (n) => setCredits(n),
      onLatency: (ms) => setLatency(ms),
      onCreditsExhausted: () => {
        alert('Credits exhausted');
        setStreaming(false);
      },
      onError: (err) => console.error(err),
      onStop: () => setStreaming(false),
    });
    setStreaming(true);

    // Attach live preview
    if (previewRef.current) {
      ultron.attachPreview(previewRef.current);
    }
  }, [model]);

  const stopStream = useCallback(() => {
    ultron.stop();
    setStreaming(false);
  }, []);

  return (
    <div>
      <select value={model} onChange={(e) => {
        setModel(e.target.value);
        ultron.setModel(e.target.value);
      }}>
        {ultron.models.map((m) => (
          <option key={m.value} value={m.value}>
            {m.label} ({m.badge})
          </option>
        ))}
      </select>

      <button onClick={startStream} disabled={streaming}>Start</button>
      <button onClick={stopStream} disabled={!streaming}>Stop</button>

      <button onClick={() => ultron.setAudioEnabled(true)}>Audio On</button>
      <button onClick={() => ultron.setAudioEnabled(false)}>Audio Off</button>

      {/* Live stream preview */}
      <video ref={previewRef} style={{ width: '100%', maxWidth: 640, display: streaming ? 'block' : 'none' }} />

      <p>{transcript}</p>
      {credits !== null && <p>Credits: {credits}</p>}
      {latency !== null && <p>Latency: {latency}ms</p>}
    </div>
  );
}

React — Auth Flow

import { useState } from 'react';
import { UltronLive } from 'ultron-live-sdk';

const ultron = new UltronLive({ apiKey: '' });

export function LoginForm() {
  const [email, setEmail] = useState('');
  const [otp, setOtp] = useState('');
  const [step, setStep] = useState<'email' | 'otp' | 'done'>('email');

  const sendOtp = async () => {
    await ultron.auth.sendOtp({ email });
    setStep('otp');
  };

  const verifyOtp = async () => {
    const result = await ultron.auth.verifyOtp({
      email,
      otp,
      timezone: Intl.DateTimeFormat().resolvedOptions().timeZone,
    });
    ultron.setApiKey(result.data.token);
    localStorage.setItem('ultron_token', result.data.token);
    setStep('done');
  };

  if (step === 'done') return <p>Logged in!</p>;

  return (
    <div>
      {step === 'email' && (
        <>
          <input value={email} onChange={(e) => setEmail(e.target.value)} placeholder="Email" />
          <button onClick={sendOtp}>Send OTP</button>
        </>
      )}
      {step === 'otp' && (
        <>
          <input value={otp} onChange={(e) => setOtp(e.target.value)} placeholder="OTP" />
          <button onClick={verifyOtp}>Verify</button>
        </>
      )}
    </div>
  );
}

React — Chat Panel

import { useState } from 'react';
import { UltronLive } from 'ultron-live-sdk';

export function ChatPanel({ ultron }: { ultron: UltronLive }) {
  const [message, setMessage] = useState('');
  const [reply, setReply] = useState('');
  const [loading, setLoading] = useState(false);

  const send = async () => {
    setLoading(true);
    const result = await ultron.chat({
      sessionId: ultron.sessionId,
      message,
    });
    setReply(result.response);
    setMessage('');
    setLoading(false);
  };

  return (
    <div>
      <input value={message} onChange={(e) => setMessage(e.target.value)} placeholder="Ask about the session..." />
      <button onClick={send} disabled={loading}>Send</button>
      {reply && <p>{reply}</p>}
    </div>
  );
}

Vanilla JS — Complete Example

<!DOCTYPE html>
<html lang="en">
<head>
  <meta charset="UTF-8">
  <title>Ultron Live Demo</title>
</head>
<body>
  <h1>Ultron Live</h1>

  <div>
    <select id="model-select"></select>
    <button id="start-btn">Start Screen Share</button>
    <button id="stop-btn" disabled>Stop</button>
    <label><input type="checkbox" id="audio-toggle"> Audio</label>
  </div>

  <div>
    <p id="transcript">Waiting...</p>
    <p id="credits"></p>
    <p id="latency"></p>
  </div>

  <!-- Live stream preview -->
  <video id="preview" style="width:100%;max-width:640px;display:none" muted></video>

  <div>
    <input id="chat-input" placeholder="Ask about the session..." />
    <button id="chat-btn">Chat</button>
    <p id="chat-reply"></p>
  </div>

  <script type="module">
    import { UltronLive } from 'https://cdn.skypack.dev/ultron-live-sdk';

    const ultron = new UltronLive({ apiKey: 'ulk_your_api_key' });

    // Populate model selector
    const select = document.getElementById('model-select');
    ultron.models.forEach((m) => {
      const opt = document.createElement('option');
      opt.value = m.value;
      opt.textContent = `${m.label} (${m.badge})`;
      select.appendChild(opt);
    });
    select.onchange = () => ultron.setModel(select.value);

    // Start
    document.getElementById('start-btn').onclick = async () => {
      await ultron.startScreenShare({
        model: select.value,
        onTranscript: (t) => document.getElementById('transcript').textContent = t,
        onCreditsUpdate: (n) => document.getElementById('credits').textContent = `Credits: ${n}`,
        onLatency: (ms) => document.getElementById('latency').textContent = `Latency: ${ms}ms`,
        onCreditsExhausted: () => alert('Credits exhausted!'),
        onStop: () => {
          document.getElementById('start-btn').disabled = false;
          document.getElementById('stop-btn').disabled = true;
          document.getElementById('preview').style.display = 'none';
        },
      });

      // Attach live preview
      const preview = document.getElementById('preview');
      ultron.attachPreview(preview);
      preview.style.display = 'block';

      document.getElementById('start-btn').disabled = true;
      document.getElementById('stop-btn').disabled = false;
    };

    // Stop
    document.getElementById('stop-btn').onclick = () => {
      ultron.stop();
      document.getElementById('start-btn').disabled = false;
      document.getElementById('stop-btn').disabled = true;
    };

    // Audio toggle
    document.getElementById('audio-toggle').onchange = (e) => {
      ultron.setAudioEnabled(e.target.checked);
      if (e.target.checked) {
        ultron.configureAudio({ voiceProvider: 'gemini', geminiVoiceName: 'Zephyr' });
      }
    };

    // Chat
    document.getElementById('chat-btn').onclick = async () => {
      const input = document.getElementById('chat-input');
      const reply = await ultron.chat({
        sessionId: ultron.sessionId,
        message: input.value,
      });
      document.getElementById('chat-reply').textContent = reply.response;
      input.value = '';
    };
  </script>
</body>
</html>

Node.js — Server-Side Usage

The SDK's HTTP client can be used server-side for frame analysis and chat (streaming/audio features require a browser).

import { UltronHttpClient } from 'ultron-live-sdk';
import fs from 'fs';

const client = new UltronHttpClient('https://live-api.ultronai.me', 'ulk_your_key');

// Analyze an image file
const imageBuffer = fs.readFileSync('screenshot.jpg');
const base64 = imageBuffer.toString('base64');
const dataUrl = `data:image/jpeg;base64,${base64}`;

const result = await client.analyseFrame({
  image: dataUrl,
  model: 'gemini-2.5-flash',
});

console.log(result.response);

// Chat
const chat = await client.chat({
  message: 'Describe this image',
  frames: [dataUrl],
});

console.log(chat.resp