npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

@ariaflowagents/realtime-audio

v1.0.0

Published

Realtime audio pipeline for AriaFlow — multi-provider speech-to-speech and orchestration.

Downloads

201

Readme

@ariaflowagents/realtime-audio

Realtime audio pipeline for AriaFlow — the multi-provider foundation for speech-to-speech voice agents and their orchestration. Ships provider clients for Google Gemini Live and OpenAI Realtime today, a provider-agnostic RealtimeAudioClient interface other providers plug into, and a VoiceEngine / CallWorker pair that bridges any audio transport (WebSocket, LiveKit, etc.) to the chosen provider while handling tools, session state, and event logging via AriaFlow's Foundation primitives. (Renamed from @ariaflowagents/gemini-native-audio at v0.10.0; the historical "Gemini Live native audio" docs below reflect the original Gemini-specific slice and remain accurate for that provider.)

What This Does

Unlike traditional voice pipelines (STT → LLM → TTS), Gemini Live accepts raw audio input and produces raw audio output in a single model call. This package wraps that capability for AriaFlow agents:

  • VoiceEngine — Call acceptor. Accepts incoming audio connections and creates per-call workers.
  • CallWorker — Per-call lifecycle manager. Bridges your audio transport (WebSocket, LiveKit, etc.) to a Gemini Live session. Handles tool calls, session state, and event logging using AriaFlow's Foundation primitives.
  • GeminiLiveSession — Thin wrapper around @google/genai ai.live.connect(). Manages the WebSocket connection to Gemini, audio encoding (base64 PCM ↔ Uint8Array), tool dispatch, and session resumption.
  • toolSetToGeminiDeclarations — Converts AriaFlow/AI SDK tool definitions (Zod schemas) to Gemini's FunctionDeclaration format.

Architecture

┌─────────────┐     ┌─────────────┐     ┌────────────────────┐
│   Client     │────>│ CallWorker  │────>│ GeminiLiveSession  │
│  (WebSocket) │     │             │     │                    │
│              │<────│  audio +    │<────│  Gemini Live API   │
│  audio in/out│     │  tool calls │     │  (native audio)    │
└─────────────┘     └─────────────┘     └────────────────────┘
                          │
                          ├── ToolExecutor (runs AriaFlow tools)
                          ├── ConversationState (persists transcripts)
                          └── ConversationEventLog (records events)

Usage

import { VoiceEngine } from '@ariaflowagents/realtime-audio';
import { createFoundation } from '@ariaflowagents/core/foundation';

const foundation = createFoundation({ /* ... */ });

const engine = new VoiceEngine({
  foundation,
  agents: [
    {
      id: 'receptionist',
      name: 'Hospital Receptionist',
      prompt: 'You are a hospital receptionist. Help patients schedule appointments.',
      voice: 'Charon', // Gemini voice preset
      tools: { /* AriaFlow tools */ },
    },
  ],
  defaultAgentId: 'receptionist',
  gemini: {
    apiKey: process.env.GOOGLE_API_KEY!,
    model: 'gemini-2.5-flash-native-audio-preview', // default
  },
});

// Accept a call from any audio transport
const worker = await engine.acceptCall({
  callId: crypto.randomUUID(),
  transport: myWebSocketTransport, // implements TransportSession
});

await worker.start();

TransportSession Interface

Implement this to connect any audio source/sink:

interface TransportSession {
  sendAudio(data: Uint8Array): void;       // Send audio to client
  onAudio(handler: (data: Uint8Array) => void): void;  // Receive audio from client
  onClose(handler: () => void): void;      // Handle disconnect
  close(): void;                           // Close the transport
}

Events

GeminiLiveSession emits RealtimeEvents:

| Event | Description | |-------|-------------| | audio | Raw PCM audio from Gemini (send to client) | | transcript | Text transcript (user or assistant) | | tool-call | Gemini wants to call a tool | | tool-result | Tool execution result | | turn-complete | Model finished speaking | | interrupted | User interrupted the model | | session-resumed | Session resumption handle updated | | error | Error from Gemini |

Key Details

  • Audio format: 16-bit PCM at 24kHz
  • Default model: gemini-2.5-flash-native-audio-preview
  • Session resumption: Automatic — GeminiLiveSession tracks resumption handles
  • Tool execution: Uses AriaFlow's ToolExecutor with timeout support
  • State persistence: Transcripts are saved to session via ConversationState

Peer Dependencies

  • @ariaflowagents/core — Foundation primitives (ToolExecutor, ConversationState, etc.)
  • ai (v6+) — Vercel AI SDK
  • zod — Schema definitions for tools