@omarimai/agents-plugin-google

v1.1.13

Published

9 months ago

Support for Gemini, Gemini Live, Cloud Speech-to-Text, and Cloud Text-to-Speech.

Downloads

0High
0Medium
0Low

omarimai

Google AI plugin for LiveKit Agents

Support for Gemini, Gemini Live, Cloud Speech-to-Text, and Cloud Text-to-Speech.

Installation

npm install @omarimai/agents-plugin-google

Usage

import { multimodal } from '@livekit/agents';
import * as google from '@omarimai/agents-plugin-google';

const model = new google.realtime.RealtimeModel({
  apiKey: process.env.GOOGLE_API_KEY,
  voice: 'Puck',
});

const agent = new multimodal.MultimodalAgent({
  model,
  fncCtx,
});

Configuration

Set your Google API key:

GOOGLE_API_KEY environment variable, or
Pass apiKey parameter to the constructor

For VertexAI, also set:

GOOGLE_CLOUD_PROJECT environment variable
GOOGLE_APPLICATION_CREDENTIALS pointing to your service account key

Step 7: Build and Test

7.1 Build the Project

pnpm build

7.2 Test the Integration

Create a simple test file to verify it works with MultimodalAgent:

// test.ts
import { multimodal, llm } from '@livekit/agents';
import * as google from './src/index.js';

const model = new google.realtime.RealtimeModel({
  apiKey: 'your-api-key',
  voice: 'Puck',
});

const fncCtx = new llm.FunctionContext();

const agent = new multimodal.MultimodalAgent({
  model,
  fncCtx,
});

console.log('Google plugin integrated successfully!');

Next Steps

Implement Google Live API Connection: Research Google's Live API documentation and implement the actual WebSocket connection
Add Authentication: Implement proper Google Cloud authentication
Complete Audio Processing: Finish the audio streaming implementation
Add Function Calling: Implement function calling support in the realtime session
Add Error Handling: Implement robust error handling and reconnection logic
Add Tests: Create comprehensive tests
Add LLM/STT/TTS: Complete the standard service implementations

Your plugin structure is now ready and should integrate seamlessly with the existing MultimodalAgent!

Google Gemini Live API TypeScript Plugin

A TypeScript implementation of the Google Gemini Live API for real-time audio conversations with advanced features including function calling, conversation management, and turn detection.

Features

✅ Real-time audio streaming with Gemini Live API
✅ Function calling and tool integration
✅ Advanced conversation management with session.conversation.item.create()
✅ Response generation control with session.response.create()
✅ Server-side Voice Activity Detection (VAD) with adaptive thresholds
✅ Multi-feature speech detection (audio level, energy, zero crossing rate)
✅ Event-driven architecture with comprehensive event emission
✅ Session management with recovery and error handling

Installation

npm install

Environment Setup

Set your Google API key:

export GOOGLE_API_KEY="your-api-key-here"

Basic Usage

import { RealtimeModel } from './src/realtime/realtime_model.js';

// Create a realtime model with advanced features
const model = new RealtimeModel({
  model: 'gemini-2.0-flash-live-001',
  voice: 'Puck',
  instructions: 'You are a helpful AI assistant.',
  turnDetection: {
    type: 'server_vad',
    threshold: 0.1,
    silence_duration_ms: 1000
  }
});

// Create a session
const session = model.session({
  fncCtx: {},
  chatCtx: new ChatContext()
});

// Advanced conversation management
session.conversation.item.create({
  role: 'user',
  text: 'Hello, how are you?'
});

// Start response generation
session.response.create();

// Enhanced conversation management
const items = session.conversation.item.list();
console.log('Conversation items:', items);

// Update a conversation item
session.conversation.item.update('msg_1', {
  content: 'Updated message content'
});

// Delete a conversation item
session.conversation.item.delete('msg_1');

// Clear all conversation items
session.conversation.item.clear();

Advanced Turn Detection

The plugin includes sophisticated turn detection with multiple features:

const model = new RealtimeModel({
  turnDetection: {
    type: 'server_vad',
    threshold: 0.1,           // Audio level threshold
    silence_duration_ms: 1000, // Silence duration before turn end
    prefix_padding_ms: 200     // Padding before speech start
  }
});

// Listen for turn detection events
session.on('turn_detected', (event) => {
  console.log('Turn detected:', event);
  // event.type: 'silence_threshold'
  // event.duration: silence duration in ms
  // event.timestamp: when the turn was detected
});

session.on('input_speech_started', (event) => {
  console.log('Speech started:', event);
  // event.audioLevel: current audio level
  // event.energyLevel: current energy level
  // event.threshold: adaptive threshold used
});

Function Calling

// Register a tool
session.updateTools([
  {
    name: 'get_weather',
    description: 'Get current weather for a location',
    parameters: {
      type: 'object',
      properties: {
        location: { type: 'string' }
      }
    },
    handler: async (args) => {
      const { location } = args;
      return { temperature: '72°F', condition: 'sunny' };
    }
  }
]);

// Listen for tool calls
session.on('toolCall', (toolCall) => {
  console.log('Tool called:', toolCall);
});

Event System

The plugin emits comprehensive events:

// Transcript events
session.on('transcript', (event) => {
  console.log('Transcript:', event.transcript, 'Final:', event.isFinal);
});

// Generation events
session.on('generation_created', (event) => {
  console.log('Generation started:', event.messageId);
});

// Error handling
session.on('error', (error) => {
  console.error('Session error:', error);
});

// Metrics
session.on('metrics_collected', (metrics) => {
  console.log('Usage metrics:', metrics);
});

Session Management

Advanced session control features:

// Interrupt current generation
session.interrupt();

// Start user activity
session.startUserActivity();

// Truncate conversation at specific message
session.truncate('msg_5', 5000); // Truncate at message 5, audio end at 5s

// Update session options
session.updateOptions({
  temperature: 0.7,
  maxOutputTokens: 1000
});

// Update instructions
session.updateInstructions('You are now a coding assistant.');

// Clear audio buffer
session.clearAudio();

// Commit audio for processing
session.commitAudio();

Audio Processing

Handle audio frames with automatic resampling:

// Push audio frames (automatically resampled)
session.pushAudio(audioFrame);

// Push video frames
session.pushVideo(videoFrame);

// Get current audio buffer
const audioBuffer = session.inputAudioBuffer;

Error Recovery

The plugin includes robust error recovery:

// Recover from text response
session.recoverFromTextResponse('item_123');

// Session automatically retries on connection failures
// Exponential backoff with configurable max retries

Configuration Options

const model = new RealtimeModel({
  // Model configuration
  model: 'gemini-2.0-flash-live-001',
  voice: 'Puck',
  instructions: 'Custom instructions',
  
  // Generation parameters
  temperature: 0.8,
  maxOutputTokens: 1000,
  topP: 0.9,
  topK: 40,
  
  // Turn detection
  turnDetection: {
    type: 'server_vad',
    threshold: 0.1,
    silence_duration_ms: 1000
  },
  
  // Language and location
  language: 'en-US',
  location: 'us-central1',
  
  // VertexAI (optional)
  vertexai: false,
  project: process.env.GOOGLE_CLOUD_PROJECT
});

API Reference

RealtimeModel

session(options): Create a new session
close(): Close all sessions

RealtimeSession

Conversation Management

conversation.item.create(message): Create conversation item
conversation.item.update(id, updates): Update conversation item
conversation.item.delete(id): Delete conversation item
conversation.item.list(): List all conversation items
conversation.item.get(id): Get specific conversation item
conversation.item.clear(): Clear all conversation items

Response Management

response.create(): Start response generation

Audio Processing

pushAudio(frame): Push audio frame
pushVideo(frame): Push video frame
commitAudio(): Commit audio for processing
clearAudio(): Clear audio buffer

Session Control

interrupt(): Interrupt current generation
startUserActivity(): Start user activity
truncate(messageId, audioEndMs): Truncate conversation
updateOptions(options): Update session options
updateInstructions(instructions): Update instructions
updateTools(tools): Update available tools

Events

on(event, listener): Listen for events
off(event, listener): Remove event listener
emit(event, ...args): Emit event

Available events:

transcript: Text transcript updates
error: Error events
toolCall: Tool call events
generation_created: New generation started
input_audio_transcription_completed: Audio transcription completed
input_speech_started: Speech started
metrics_collected: Usage metrics
turn_detected: Turn detection events

License

Apache-2.0

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

Google AI plugin for LiveKit Agents

Installation

Usage

Configuration

Step 7: Build and Test

7.1 Build the Project

7.2 Test the Integration

Next Steps

Google Gemini Live API TypeScript Plugin

Features

Installation

Environment Setup

Basic Usage

Advanced Turn Detection

Function Calling

Event System

Session Management

Audio Processing

Error Recovery

Configuration Options

API Reference

RealtimeModel

RealtimeSession

Conversation Management

Response Management

Audio Processing

Session Control

Events

License