npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2025 – Pkg Stats / Ryan Hefner

@omarimai/agents-plugin-google

v1.1.13

Published

Support for Gemini, Gemini Live, Cloud Speech-to-Text, and Cloud Text-to-Speech.

Readme

Google AI plugin for LiveKit Agents

Support for Gemini, Gemini Live, Cloud Speech-to-Text, and Cloud Text-to-Speech.

Installation

npm install @omarimai/agents-plugin-google

Usage

import { multimodal } from '@livekit/agents';
import * as google from '@omarimai/agents-plugin-google';

const model = new google.realtime.RealtimeModel({
  apiKey: process.env.GOOGLE_API_KEY,
  voice: 'Puck',
});

const agent = new multimodal.MultimodalAgent({
  model,
  fncCtx,
});

Configuration

Set your Google API key:

  • GOOGLE_API_KEY environment variable, or
  • Pass apiKey parameter to the constructor

For VertexAI, also set:

  • GOOGLE_CLOUD_PROJECT environment variable
  • GOOGLE_APPLICATION_CREDENTIALS pointing to your service account key

Step 7: Build and Test

7.1 Build the Project

pnpm build

7.2 Test the Integration

Create a simple test file to verify it works with MultimodalAgent:

// test.ts
import { multimodal, llm } from '@livekit/agents';
import * as google from './src/index.js';

const model = new google.realtime.RealtimeModel({
  apiKey: 'your-api-key',
  voice: 'Puck',
});

const fncCtx = new llm.FunctionContext();

const agent = new multimodal.MultimodalAgent({
  model,
  fncCtx,
});

console.log('Google plugin integrated successfully!');

Next Steps

  • Implement Google Live API Connection: Research Google's Live API documentation and implement the actual WebSocket connection
  • Add Authentication: Implement proper Google Cloud authentication
  • Complete Audio Processing: Finish the audio streaming implementation
  • Add Function Calling: Implement function calling support in the realtime session
  • Add Error Handling: Implement robust error handling and reconnection logic
  • Add Tests: Create comprehensive tests
  • Add LLM/STT/TTS: Complete the standard service implementations

Your plugin structure is now ready and should integrate seamlessly with the existing MultimodalAgent!

Google Gemini Live API TypeScript Plugin

A TypeScript implementation of the Google Gemini Live API for real-time audio conversations with advanced features including function calling, conversation management, and turn detection.

Features

  • Real-time audio streaming with Gemini Live API
  • Function calling and tool integration
  • Advanced conversation management with session.conversation.item.create()
  • Response generation control with session.response.create()
  • Server-side Voice Activity Detection (VAD) with adaptive thresholds
  • Multi-feature speech detection (audio level, energy, zero crossing rate)
  • Event-driven architecture with comprehensive event emission
  • Session management with recovery and error handling

Installation

npm install

Environment Setup

Set your Google API key:

export GOOGLE_API_KEY="your-api-key-here"

Basic Usage

import { RealtimeModel } from './src/realtime/realtime_model.js';

// Create a realtime model with advanced features
const model = new RealtimeModel({
  model: 'gemini-2.0-flash-live-001',
  voice: 'Puck',
  instructions: 'You are a helpful AI assistant.',
  turnDetection: {
    type: 'server_vad',
    threshold: 0.1,
    silence_duration_ms: 1000
  }
});

// Create a session
const session = model.session({
  fncCtx: {},
  chatCtx: new ChatContext()
});

// Advanced conversation management
session.conversation.item.create({
  role: 'user',
  text: 'Hello, how are you?'
});

// Start response generation
session.response.create();

// Enhanced conversation management
const items = session.conversation.item.list();
console.log('Conversation items:', items);

// Update a conversation item
session.conversation.item.update('msg_1', {
  content: 'Updated message content'
});

// Delete a conversation item
session.conversation.item.delete('msg_1');

// Clear all conversation items
session.conversation.item.clear();

Advanced Turn Detection

The plugin includes sophisticated turn detection with multiple features:

const model = new RealtimeModel({
  turnDetection: {
    type: 'server_vad',
    threshold: 0.1,           // Audio level threshold
    silence_duration_ms: 1000, // Silence duration before turn end
    prefix_padding_ms: 200     // Padding before speech start
  }
});

// Listen for turn detection events
session.on('turn_detected', (event) => {
  console.log('Turn detected:', event);
  // event.type: 'silence_threshold'
  // event.duration: silence duration in ms
  // event.timestamp: when the turn was detected
});

session.on('input_speech_started', (event) => {
  console.log('Speech started:', event);
  // event.audioLevel: current audio level
  // event.energyLevel: current energy level
  // event.threshold: adaptive threshold used
});

Function Calling

Register and use tools with the session:

// Register a tool
session.updateTools([
  {
    name: 'get_weather',
    description: 'Get current weather for a location',
    parameters: {
      type: 'object',
      properties: {
        location: { type: 'string' }
      }
    },
    handler: async (args) => {
      const { location } = args;
      return { temperature: '72°F', condition: 'sunny' };
    }
  }
]);

// Listen for tool calls
session.on('toolCall', (toolCall) => {
  console.log('Tool called:', toolCall);
});

Event System

The plugin emits comprehensive events:

// Transcript events
session.on('transcript', (event) => {
  console.log('Transcript:', event.transcript, 'Final:', event.isFinal);
});

// Generation events
session.on('generation_created', (event) => {
  console.log('Generation started:', event.messageId);
});

// Error handling
session.on('error', (error) => {
  console.error('Session error:', error);
});

// Metrics
session.on('metrics_collected', (metrics) => {
  console.log('Usage metrics:', metrics);
});

Session Management

Advanced session control features:

// Interrupt current generation
session.interrupt();

// Start user activity
session.startUserActivity();

// Truncate conversation at specific message
session.truncate('msg_5', 5000); // Truncate at message 5, audio end at 5s

// Update session options
session.updateOptions({
  temperature: 0.7,
  maxOutputTokens: 1000
});

// Update instructions
session.updateInstructions('You are now a coding assistant.');

// Clear audio buffer
session.clearAudio();

// Commit audio for processing
session.commitAudio();

Audio Processing

Handle audio frames with automatic resampling:

// Push audio frames (automatically resampled)
session.pushAudio(audioFrame);

// Push video frames
session.pushVideo(videoFrame);

// Get current audio buffer
const audioBuffer = session.inputAudioBuffer;

Error Recovery

The plugin includes robust error recovery:

// Recover from text response
session.recoverFromTextResponse('item_123');

// Session automatically retries on connection failures
// Exponential backoff with configurable max retries

Configuration Options

const model = new RealtimeModel({
  // Model configuration
  model: 'gemini-2.0-flash-live-001',
  voice: 'Puck',
  instructions: 'Custom instructions',
  
  // Generation parameters
  temperature: 0.8,
  maxOutputTokens: 1000,
  topP: 0.9,
  topK: 40,
  
  // Turn detection
  turnDetection: {
    type: 'server_vad',
    threshold: 0.1,
    silence_duration_ms: 1000
  },
  
  // Language and location
  language: 'en-US',
  location: 'us-central1',
  
  // VertexAI (optional)
  vertexai: false,
  project: process.env.GOOGLE_CLOUD_PROJECT
});

API Reference

RealtimeModel

  • session(options): Create a new session
  • close(): Close all sessions

RealtimeSession

Conversation Management

  • conversation.item.create(message): Create conversation item
  • conversation.item.update(id, updates): Update conversation item
  • conversation.item.delete(id): Delete conversation item
  • conversation.item.list(): List all conversation items
  • conversation.item.get(id): Get specific conversation item
  • conversation.item.clear(): Clear all conversation items

Response Management

  • response.create(): Start response generation

Audio Processing

  • pushAudio(frame): Push audio frame
  • pushVideo(frame): Push video frame
  • commitAudio(): Commit audio for processing
  • clearAudio(): Clear audio buffer

Session Control

  • interrupt(): Interrupt current generation
  • startUserActivity(): Start user activity
  • truncate(messageId, audioEndMs): Truncate conversation
  • updateOptions(options): Update session options
  • updateInstructions(instructions): Update instructions
  • updateTools(tools): Update available tools

Events

  • on(event, listener): Listen for events
  • off(event, listener): Remove event listener
  • emit(event, ...args): Emit event

Available events:

  • transcript: Text transcript updates
  • error: Error events
  • toolCall: Tool call events
  • generation_created: New generation started
  • input_audio_transcription_completed: Audio transcription completed
  • input_speech_started: Speech started
  • metrics_collected: Usage metrics
  • turn_detected: Turn detection events

License

Apache-2.0