@ariaflowagents/realtime-audio
v1.0.0
Published
Realtime audio pipeline for AriaFlow — multi-provider speech-to-speech and orchestration.
Downloads
201
Maintainers
Readme
@ariaflowagents/realtime-audio
Realtime audio pipeline for AriaFlow — the multi-provider foundation for speech-to-speech voice agents and their orchestration. Ships provider clients for Google Gemini Live and OpenAI Realtime today, a provider-agnostic RealtimeAudioClient interface other providers plug into, and a VoiceEngine / CallWorker pair that bridges any audio transport (WebSocket, LiveKit, etc.) to the chosen provider while handling tools, session state, and event logging via AriaFlow's Foundation primitives. (Renamed from @ariaflowagents/gemini-native-audio at v0.10.0; the historical "Gemini Live native audio" docs below reflect the original Gemini-specific slice and remain accurate for that provider.)
What This Does
Unlike traditional voice pipelines (STT → LLM → TTS), Gemini Live accepts raw audio input and produces raw audio output in a single model call. This package wraps that capability for AriaFlow agents:
VoiceEngine— Call acceptor. Accepts incoming audio connections and creates per-call workers.CallWorker— Per-call lifecycle manager. Bridges your audio transport (WebSocket, LiveKit, etc.) to a Gemini Live session. Handles tool calls, session state, and event logging using AriaFlow's Foundation primitives.GeminiLiveSession— Thin wrapper around@google/genaiai.live.connect(). Manages the WebSocket connection to Gemini, audio encoding (base64 PCM ↔ Uint8Array), tool dispatch, and session resumption.toolSetToGeminiDeclarations— Converts AriaFlow/AI SDK tool definitions (Zod schemas) to Gemini's FunctionDeclaration format.
Architecture
┌─────────────┐ ┌─────────────┐ ┌────────────────────┐
│ Client │────>│ CallWorker │────>│ GeminiLiveSession │
│ (WebSocket) │ │ │ │ │
│ │<────│ audio + │<────│ Gemini Live API │
│ audio in/out│ │ tool calls │ │ (native audio) │
└─────────────┘ └─────────────┘ └────────────────────┘
│
├── ToolExecutor (runs AriaFlow tools)
├── ConversationState (persists transcripts)
└── ConversationEventLog (records events)Usage
import { VoiceEngine } from '@ariaflowagents/realtime-audio';
import { createFoundation } from '@ariaflowagents/core/foundation';
const foundation = createFoundation({ /* ... */ });
const engine = new VoiceEngine({
foundation,
agents: [
{
id: 'receptionist',
name: 'Hospital Receptionist',
prompt: 'You are a hospital receptionist. Help patients schedule appointments.',
voice: 'Charon', // Gemini voice preset
tools: { /* AriaFlow tools */ },
},
],
defaultAgentId: 'receptionist',
gemini: {
apiKey: process.env.GOOGLE_API_KEY!,
model: 'gemini-2.5-flash-native-audio-preview', // default
},
});
// Accept a call from any audio transport
const worker = await engine.acceptCall({
callId: crypto.randomUUID(),
transport: myWebSocketTransport, // implements TransportSession
});
await worker.start();TransportSession Interface
Implement this to connect any audio source/sink:
interface TransportSession {
sendAudio(data: Uint8Array): void; // Send audio to client
onAudio(handler: (data: Uint8Array) => void): void; // Receive audio from client
onClose(handler: () => void): void; // Handle disconnect
close(): void; // Close the transport
}Events
GeminiLiveSession emits RealtimeEvents:
| Event | Description |
|-------|-------------|
| audio | Raw PCM audio from Gemini (send to client) |
| transcript | Text transcript (user or assistant) |
| tool-call | Gemini wants to call a tool |
| tool-result | Tool execution result |
| turn-complete | Model finished speaking |
| interrupted | User interrupted the model |
| session-resumed | Session resumption handle updated |
| error | Error from Gemini |
Key Details
- Audio format: 16-bit PCM at 24kHz
- Default model:
gemini-2.5-flash-native-audio-preview - Session resumption: Automatic —
GeminiLiveSessiontracks resumption handles - Tool execution: Uses AriaFlow's
ToolExecutorwith timeout support - State persistence: Transcripts are saved to session via
ConversationState
Peer Dependencies
@ariaflowagents/core— Foundation primitives (ToolExecutor, ConversationState, etc.)ai(v6+) — Vercel AI SDKzod— Schema definitions for tools
