npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2025 – Pkg Stats / Ryan Hefner

gemini-realtime-stream

v1.0.0

Published

Google Gemini AI real-time streaming with audio processing capabilities

Downloads

9

Readme

Gemini Real-time Stream MCP Server

A Model Context Protocol (MCP) server that provides real-time streaming capabilities with Google's Gemini AI models, including live audio/video processing, function calling, and bidirectional WebSocket communication.

Features

Core Capabilities

  • Real-time Streaming: Bidirectional WebSocket communication with Gemini models
  • Live Audio Processing: Real-time audio input/output with voice activity detection
  • Live Video Processing: Screen capture and video stream processing
  • Function Calling: Dynamic tool discovery and execution with JSON schema validation
  • Multimodal Support: Text, image, audio, and video input/output processing
  • Session Management: Persistent conversation contexts and state management

Available Tools

start_realtime_session

Initialize a real-time streaming session with Gemini Live API.

Parameters:

  • model (string, optional): Gemini model to use (default: "gemini-2.0-flash-exp")
  • voice (string, optional): Voice configuration for audio output
  • system_instruction (string, optional): System instructions for the model
  • tools (array, optional): Available tools for function calling

send_realtime_message

Send a message to an active real-time session.

Parameters:

  • session_id (string): Active session identifier
  • content (string): Message content to send
  • content_type (string, optional): Content type (default: "text")

stream_audio_input

Stream audio input to the real-time session.

Parameters:

  • session_id (string): Active session identifier
  • audio_data (string): Base64-encoded audio data
  • format (string, optional): Audio format (default: "pcm16")
  • sample_rate (number, optional): Sample rate in Hz (default: 16000)

capture_screen_stream

Capture and stream screen content to the session.

Parameters:

  • session_id (string): Active session identifier
  • region (object, optional): Screen region to capture
  • quality (string, optional): Capture quality ("high", "medium", "low")

get_session_status

Retrieve the current status of a real-time session.

Parameters:

  • session_id (string): Session identifier to check

end_realtime_session

Terminate an active real-time streaming session.

Parameters:

  • session_id (string): Session identifier to terminate

list_active_sessions

List all currently active real-time sessions.

Parameters: None

Installation

  1. Install dependencies:
npm install
  1. Build the TypeScript code:
npm run build
  1. Configure your Gemini API key:
export GEMINI_API_KEY="your-api-key-here"

Configuration

Add the server to your MCP client configuration:

{
  "mcpServers": {
    "gemini-realtime-stream": {
      "command": "node",
      "args": ["/path/to/gemini-realtime-stream/dist/gemini-realtime-stream.js"],
      "env": {
        "GEMINI_API_KEY": "your-api-key-here"
      }
    }
  }
}

Usage Examples

Basic Real-time Chat

// Start a new session
const session = await startRealtimeSession({
  model: "gemini-2.0-flash-exp",
  system_instruction: "You are a helpful AI assistant."
});

// Send a message
await sendRealtimeMessage({
  session_id: session.id,
  content: "Hello, how are you today?"
});

Audio Streaming

// Start session with voice capabilities
const session = await startRealtimeSession({
  model: "gemini-2.0-flash-exp",
  voice: "Aoede"
});

// Stream audio input
await streamAudioInput({
  session_id: session.id,
  audio_data: base64AudioData,
  format: "pcm16",
  sample_rate: 16000
});

Screen Sharing

// Capture and stream screen content
await captureScreenStream({
  session_id: session.id,
  region: { x: 0, y: 0, width: 1920, height: 1080 },
  quality: "high"
});

API Reference

Session Management

  • Sessions are automatically managed with unique identifiers
  • Each session maintains its own conversation context
  • Sessions can be terminated manually or will timeout after inactivity

Audio Processing

  • Supports PCM16 audio format at various sample rates
  • Real-time voice activity detection
  • Bidirectional audio streaming (input and output)

Video Processing

  • Screen capture with configurable regions and quality
  • Real-time video stream processing
  • Support for multiple video formats

Function Calling

  • Dynamic tool discovery and registration
  • JSON schema validation for tool parameters
  • Parallel function execution support

Error Handling

The server provides comprehensive error handling:

  • Invalid session IDs return appropriate error messages
  • Network connectivity issues are handled gracefully
  • Audio/video processing errors are logged and reported

Security Considerations

  • API keys should be stored securely as environment variables
  • Screen capture requires appropriate system permissions
  • Audio input requires microphone access permissions

Dependencies

  • @modelcontextprotocol/sdk: MCP SDK for server implementation
  • @google/generative-ai: Google Generative AI SDK
  • ws: WebSocket library for real-time communication
  • Additional dependencies for audio/video processing

License

This project is licensed under the MIT License.

Contributing

Contributions are welcome! Please read the contributing guidelines before submitting pull requests.

Support

For issues and questions, please use the GitHub issue tracker.