@kuralle-agents/realtime-audio
v0.11.0
Published
Realtime audio pipeline for Kuralle — multi-provider speech-to-speech and orchestration.
Maintainers
Readme
@kuralle-agents/realtime-audio
⏸️ Paused for now. Kuralle is hardening text as the primary primitive. The realtime model code here is kept intact, but the realtime
VoiceDriveris off the framework's headline API (it lives behind@kuralle-agents/core/runtime) and is not part of the active stability program. For voice today, prefer cascaded voice over text via@kuralle-agents/livekit-plugin(STT → Kuralle text runtime → TTS). Provider-native speech-to-speech will resume later.
Provider-native realtime audio for Kuralle — speech-to-speech voice agents powered by Gemini Live, OpenAI Realtime, or xAI Grok Realtime, with Kuralle keeping tool, flow, and handoff authority.
Install
npm install @kuralle-agents/realtime-audioPeer dependencies:
npm install @kuralle-agents/core ai zodWhat it does
Unlike the cascaded path in @kuralle-agents/livekit-plugin (STT → LLM → TTS), provider-native realtime sends raw audio directly to the model and receives audio back in a single connection — lower latency, no transcript round-trip.
Whole-answer content gates are advisory on native realtime audio. The provider speaks audio as it generates it, before any Kuralle gate runs, so a blocking gate cannot un-speak audio already played — it emits a safety-* event and speaks a correction utterance (post-hoc). The reliable controls on native realtime are input-side gating (pre-turn policies) and tool authority (tools return data, not conversational text). On the cascaded substrate, by contrast, Kuralle controls the emission boundary and the gate is preventive.
VoiceEngine— call acceptor. Accepts incoming audio connections and creates per-callVoiceCallSessionworkers that bridge a transport to the chosen provider.VoiceCallSession/RealtimeCallWorker— per-call lifecycle: connects to the provider, routes tool calls through Kuralle runtime, manages session state.GeminiLiveSession— thin wrapper around@google/genailive.connect(); manages the WebSocket to Gemini, PCM audio encoding, tool dispatch, and session resumption.OpenAIRealtimeClient— OpenAI Realtime API client.CloudflareRealtimeAdapter— plugs anyRealtimeAudioClientinto Kuralle runtime authority inside a Cloudflare Durable Object.CloudflareGeminiLiveClient,CloudflareOpenAIRealtimeClient,CloudflareXAIGrokRealtimeClient— Cloudflare Workers variants.createGeminiClientFactory/createOpenAIClientFactory— provider client factories.voiceAgentToRuntimeAgent— converts aVoiceAgentConfigto a standard Kuralle agent config.
Usage
import { VoiceEngine, createGeminiClientFactory } from '@kuralle-agents/realtime-audio';
const engine = new VoiceEngine({
agents: [
{
id: 'support',
name: 'Support Agent',
instructions: 'You are a support agent.',
voice: 'Charon',
tools: { /* Kuralle tool definitions */ },
},
],
defaultAgentId: 'support',
modelClientFactory: createGeminiClientFactory({
apiKey: process.env.GOOGLE_API_KEY!,
model: 'gemini-2.5-flash-preview-native-audio',
}),
});
// Accept a call from any transport (WebSocket, LiveKit, etc.)
const session = await engine.acceptCall({
callId: crypto.randomUUID(),
transport: myTransportSession, // implements TransportSession
});
await session.start();Related
@kuralle-agents/livekit-plugin— cascaded STT → LLM → TTS voice path via LiveKit@kuralle-agents/livekit-plugin-transport-ws— WebSocket transport for audio connections@kuralle-agents/core— agents, flows, runtime
