gemini-live-react

v0.4.2

Published

9 days ago

React hook for Google Gemini Live API with real-time bidirectional voice streaming

gemini-live-react

React hook for real-time bidirectional voice communication with Google Gemini Live API.

Features

Real-time voice streaming - Talk to Gemini and hear responses instantly
Automatic audio handling - Mic capture, playback, resampling all handled for you
Screen sharing support - Optional video frame streaming for visual context
Live transcription - Both user and AI speech transcribed in real-time
Auto-reconnection - Exponential backoff reconnection on connection loss
Speaker mute control - Mute AI audio output independently from microphone
Connection metrics - Track audio chunks, messages, reconnects, and uptime
TypeScript - Full type definitions included

Installation

npm install gemini-live-react

Quick Start

import { useGeminiLive } from 'gemini-live-react';

function VoiceChat() {
  const {
    connect,
    disconnect,
    transcripts,
    isConnected,
    isSpeaking,
  } = useGeminiLive({
    proxyUrl: 'wss://your-project.supabase.co/functions/v1/gemini-live-proxy',
  });

  return (
    <div>
      <button onClick={() => isConnected ? disconnect() : connect()}>
        {isConnected ? 'Disconnect' : 'Connect'}
      </button>

      {isSpeaking && <div>AI is speaking...</div>}

      <div>
        {transcripts.map((t) => (
          <div key={t.id}>
            <strong>{t.role}:</strong> {t.text}
          </div>
        ))}
      </div>
    </div>
  );
}

API Reference

`useGeminiLive(options)`

Options

| Option | Type | Required | Default | Description | |--------|------|----------|---------|-------------| | proxyUrl | string | Yes | - | WebSocket URL of your proxy server | | sessionId | string | No | - | Passed to proxy as query param | | onTranscript | (t: Transcript) => void | No | - | Called when transcript is finalized | | onError | (error: string) => void | No | - | Called on errors | | onConnectionChange | (connected: boolean) => void | No | - | Called when connection state changes | | minBufferMs | number | No | 200 | Audio buffer before playback (ms) | | transcriptDebounceMs | number | No | 1500 | Debounce time for transcripts (ms) |

Return Value

| Property | Type | Description | |----------|------|-------------| | isConnected | boolean | Currently connected to proxy | | isConnecting | boolean | Attempting to connect | | connectionState | ConnectionState | Granular connection state | | isSpeaking | boolean | AI audio is playing | | isMuted | boolean | Microphone is muted | | isSpeakerMuted | boolean | AI audio output is muted | | error | string \| null | Current error message | | transcripts | Transcript[] | All transcript entries | | connect | (video?: HTMLVideoElement) => Promise<void> | Connect to proxy | | disconnect | () => void | Disconnect and cleanup | | retry | () => Promise<void> | Retry connection after error | | sendText | (text: string) => void | Send text message | | setMuted | (muted: boolean) => void | Set microphone mute state | | setSpeakerMuted | (muted: boolean) => void | Set speaker mute state | | clearTranscripts | () => void | Clear transcript history | | getMetrics | () => ConnectionMetrics | Get connection quality metrics | | startRecording | () => void | Start session recording | | stopRecording | () => SessionRecording | Stop recording and get data | | isRecording | boolean | Currently recording | | exportRecording | () => Blob | Export recording as JSON blob | | registerWorkflow | (workflow: Workflow) => void | Register a workflow | | executeWorkflow | (id: string, vars?) => Promise<WorkflowExecution> | Run a workflow | | pauseWorkflow | () => void | Pause current workflow | | resumeWorkflow | () => void | Resume paused workflow | | cancelWorkflow | () => void | Cancel current workflow | | workflowExecution | WorkflowExecution \| null | Current workflow state | | detectElements | () => Promise<DetectionResult> | Detect interactive elements | | clickDetectedElement | (id: string) => Promise<BrowserControlResult> | Click detected element | | detectedElements | DetectedElement[] | Latest detection results | | isDetecting | boolean | Detection in progress |

Types

interface Transcript {
  id: string;
  role: 'user' | 'assistant';
  text: string;
  timestamp: Date;
}

type ConnectionState =
  | 'idle'
  | 'connecting'
  | 'connected'
  | 'reconnecting'
  | 'error'
  | 'disconnected';

interface ConnectionMetrics {
  audioChunksReceived: number;
  messagesReceived: number;
  reconnectCount: number;
  lastConnectedAt: number | null;
  totalConnectedTime: number;
}

// Session Recording
interface SessionRecording {
  id: string;
  startTime: number;
  endTime?: number;
  events: SessionEvent[];
}

interface SessionEvent {
  type: SessionEventType; // 'transcript' | 'audio_chunk' | 'tool_call' | ...
  timestamp: number;
  data: unknown;
}

// Workflow Builder
interface Workflow {
  id: string;
  name: string;
  entryPoint: string;
  steps: Record<string, WorkflowStep>;
}

interface WorkflowStep {
  id: string;
  type: 'browser_control' | 'wait' | 'condition' | 'ai_prompt';
  action?: BrowserControlAction;
  args?: Record<string, unknown>;
  waitMs?: number;
  condition?: { selector: string; check: 'exists' | 'visible' | 'contains_text'; value?: string };
  prompt?: string;
  next?: string | string[];
  onError?: string;
}

// Smart Element Detection
interface DetectedElement {
  id: string;
  bounds: { x: number; y: number; width: number; height: number };
  type: 'button' | 'input' | 'link' | 'text' | 'image' | 'unknown';
  text?: string;
  selector?: string;
  confidence: number;
}

Screen Recording

For apps that need screen/camera capture with recordings and screenshots, use the useScreenRecording hook:

import { useGeminiLive, useScreenRecording, shouldUseCameraMode } from 'gemini-live-react';

function ScreenShareApp() {
  const {
    state,
    startRecording,
    stopRecording,
    getVideoElement,
  } = useScreenRecording({
    screenshotInterval: 2000,  // Capture every 2 seconds
    maxScreenshots: 30,        // Keep last 30 screenshots
  });

  const {
    connect,
    disconnect,
    transcripts,
    isConnected,
  } = useGeminiLive({ proxyUrl: 'wss://your-proxy.com' });

  const handleStart = async () => {
    // Use camera on mobile devices that don't support screen capture
    const useCameraMode = shouldUseCameraMode();
    await startRecording(useCameraMode);

    // Get the video element and connect to Gemini
    const videoEl = getVideoElement();
    if (videoEl) {
      await connect(videoEl);
    }
  };

  const handleStop = async () => {
    disconnect();
    const result = await stopRecording();
    // result.videoBlob - recorded video
    // result.audioBlob - separate microphone audio
    // result.screenshots - array of timestamped screenshots
  };

  return (
    <div>
      <button onClick={state.isRecording ? handleStop : handleStart}>
        {state.isRecording ? 'Stop' : 'Start'} Recording
      </button>
      <div>Duration: {state.duration}s</div>
    </div>
  );
}

`useScreenRecording(options?)`

Options

| Option | Type | Default | Description | |--------|------|---------|-------------| | screenshotInterval | number | 2000 | Interval between screenshot captures (ms) | | maxScreenshots | number | 30 | Maximum screenshots to keep (rolling window) | | screenshotQuality | number | 0.8 | JPEG quality for screenshots (0-1) | | audioConstraints | MediaTrackConstraints | - | Custom mic audio constraints |

Return Value

| Property | Type | Description | |----------|------|-------------| | state | RecordingState | { isRecording, isPaused, duration, error } | | startRecording | (useCameraMode?: boolean) => Promise<void> | Start recording | | stopRecording | () => Promise<RecordingResult \| null> | Stop and get results | | pauseRecording | () => void | Pause recording | | resumeRecording | () => void | Resume recording | | getVideoElement | () => HTMLVideoElement \| null | Get video element for connect() | | getStream | () => MediaStream \| null | Get media stream | | getLatestScreenshot | () => string \| null | Get latest auto-captured screenshot | | captureScreenshotNow | () => string \| null | Capture screenshot immediately |

Mobile/iOS Support

Screen capture isn't available on iOS and some mobile browsers. Use the shouldUseCameraMode() helper to fall back to camera capture:

import { shouldUseCameraMode } from 'gemini-live-react';

// Returns true on mobile devices without screen capture support
if (shouldUseCameraMode()) {
  await startRecording(true); // Uses rear camera
} else {
  await startRecording();     // Uses screen capture
}

Manual Screen Sharing

For simple screen sharing without recording features, you can set up the stream manually:

function ScreenShareChat() {
  const videoRef = useRef<HTMLVideoElement>(null);
  const { connect, disconnect, isConnected } = useGeminiLive({
    proxyUrl: 'wss://your-proxy.com',
  });

  const startWithScreenShare = async () => {
    // Get screen capture stream
    const stream = await navigator.mediaDevices.getDisplayMedia({
      video: true,
      audio: false,
    });

    // Attach to video element
    if (videoRef.current) {
      videoRef.current.srcObject = stream;
      await videoRef.current.play();
    }

    // Connect with video element - frames will be sent at 1 FPS
    await connect(videoRef.current!);
  };

  return (
    <div>
      <video ref={videoRef} style={{ display: 'none' }} />
      <button onClick={startWithScreenShare}>
        Start with Screen Share
      </button>
    </div>
  );
}

How Audio Works

This library handles the complex audio format juggling that Gemini Live requires:

Input (Microphone → Gemini)

Captures audio at 16kHz using getUserMedia
Uses AudioWorklet for low-latency processing
Converts Float32 → Int16 PCM with proper clamping
Base64 encodes and sends via WebSocket

Output (Gemini → Speakers)

Receives 24kHz PCM16 audio (little-endian)
Decodes base64 and parses with DataView (endianness matters!)
Resamples to browser's native sample rate (44.1kHz or 48kHz)
Buffers 200ms minimum before starting playback
Chains audio buffers for seamless playback

Why This Matters

Most tutorials get audio wrong because:

They use Int16Array directly (ignores endianness)
They force AudioContext to 24kHz (browsers often ignore this)
They don't buffer enough (causes choppy audio)
They don't chain playback (causes gaps)

This library handles all of this correctly.

Proxy Setup

You need a WebSocket proxy to keep your Google AI API key secure. See:

Supabase Edge Functions proxy
More platforms coming soon

License

MIT

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

gemini-live-react

Features

Installation

Quick Start

API Reference

useGeminiLive(options)

Options

Return Value

Types

Screen Recording

useScreenRecording(options?)

Options

Return Value

Mobile/iOS Support

Manual Screen Sharing

How Audio Works

Input (Microphone → Gemini)

Output (Gemini → Speakers)

Why This Matters

Proxy Setup

License

`useGeminiLive(options)`

`useScreenRecording(options?)`