@ariaflowagents/livekit-plugin
v1.0.0
Published
AriaFlow plugin for LiveKit Agents — runtime adapter, transport foundation, Gemini STT/TTS
Readme
@ariaflow/livekit-plugin
AriaFlow plugin for LiveKit Agents — connects AriaFlow Runtime to LiveKit's voice infrastructure for building production voice agents.
Features
- Runtime Adapter — wraps AriaFlow Runtime as a LiveKit LLM provider (
AriaRuntimeLLMAdapter) - Voice Sessions —
AriaFlowVoiceSession(WebSocket) andAriaFlowLivekitSession(LiveKit rooms) - Two-Layer Turn Detection — Silero VAD + EOU text model + LLM turn markers
- Gemini STT/TTS — Google Gemini Live speech-to-text and text-to-speech
- Filler Coordinator — audio filler management during tool execution
- Recording — session recording with pluggable storage adapters
- Codec Utilities — G.711 (PCMU/PCMA) encode/decode, resampling
Install
bun add @ariaflow/livekit-pluginPeer dependencies:
bun add @livekit/agents @livekit/rtc-node @ariaflowagents/coreQuick Start
import { WebSocketAgentServer } from '@ariaflow/livekit-plugin-transport-ws';
import { AriaFlowVoiceSession, TurnDetector } from '@ariaflow/livekit-plugin';
import { GeminiLiveSTT, GeminiLiveTTS } from '@ariaflow/livekit-plugin/gemini';
import { Runtime } from '@ariaflowagents/core';
import { openai } from '@ai-sdk/openai';
const runtime = new Runtime({
agents: [{
id: 'assistant',
name: 'Voice Assistant',
model: openai('gpt-4o-mini'),
prompt: 'You are a helpful voice assistant.',
}],
defaultAgentId: 'assistant',
defaultModel: openai('gpt-4o-mini'),
});
// Load turn detection models once at startup
const detector = new TurnDetector();
await detector.initialize();
const server = new WebSocketAgentServer({ port: 8080 });
server.onConnection(async (transport) => {
const voiceSession = new AriaFlowVoiceSession({
runtime,
stt: new GeminiLiveSTT(),
tts: new GeminiLiveTTS(),
vad: detector.vad ?? undefined,
turnDetection: detector.eouTurnDetector ?? undefined,
turnMarkerConfig: detector.turnMarkerConfig,
greeting: 'Hello! How can I help you?',
});
await server.startSession(transport, voiceSession);
});
await server.listen();Turn Detection
The plugin implements a two-layer turn detection system:
Layer 1: VAD + EOU text model
- Silero VAD detects speech boundaries (start/end of speech)
- EOU (End-of-Utterance) ONNX model predicts whether the transcript looks like a complete turn
- These run at the audio/STT level inside LiveKit's
AgentSession
Layer 2: LLM turn markers
- The LLM is instructed to prefix responses with a Unicode marker indicating turn completeness
✓— user's turn is complete, respond normally○— user will likely continue shortly, suppress response◐— user needs more time to think, suppress and wait longer
const detector = new TurnDetector({
// Disable individual layers:
// disableVAD: true,
// disableEOU: true,
// Custom marker config:
turnMarkerConfig: {
enabled: true,
incompleteShortWaitMs: 5000,
incompleteLongWaitMs: 10000,
},
});
await detector.initialize();Exports
| Export Path | Contents |
|-------------|----------|
| @ariaflow/livekit-plugin | Core: sessions, LLM adapter, turn detection, transport, codecs |
| @ariaflow/livekit-plugin/gemini | Gemini Live STT/TTS |
| @ariaflow/livekit-plugin/recording | Recording manager and storage adapters |
| @ariaflow/livekit-plugin/codec/g711 | G.711 PCMU/PCMA encode/decode |
| @ariaflow/livekit-plugin/utils/resample | Audio resampling utilities |
Examples
See examples/ for runnable examples:
- basic_voice_agent.ts — minimal agent with a tool
- multi_agent_handoff.ts — router + game agent handoff
- restaurant_agent.ts — 4-agent restaurant system
- turn_detection_demo.ts — turn detection configuration
- livekit_room_agent.ts — LiveKit room (WebRTC) agent
- livekit_room_with_tools.ts — room agent with tools
Changelog
0.1.0
Turn Detection (RFC-011, RFC-012)
- Added two-layer turn detection system (
src/turn_detection/)TurnDetector— orchestrator that loads VAD and EOU models in parallelloadSileroVAD()— process-level singleton loader for Silero VAD with concurrent-call deduplicationEOUDetector— standalone port of LiveKit's EOU ONNX model, nogetJobContext()dependencyEOUTurnDetectorAdapter— bridgesEOUDetectorto LiveKit's_TurnDetectorinterfacedetectTurnMarker()/stripMarker()— LLM turn marker detection and strippingTURN_MARKER_SYSTEM_PROMPT— system prompt fragment for LLM marker compliance
- Modified
AriaRuntimeLLMAdapterto intercept turn markers in LLM responses when configured - Modified
AriaFlowVoiceSessionto acceptturnMarkerConfigoption - Modified
AriaFlowLivekitSessionto acceptturnMarkerConfigoption - Added dependencies:
@livekit/agents-plugin-silero,onnxruntime-node,@huggingface/hub,@huggingface/transformers
Examples
- Added
basic_voice_agent.ts— minimal voice agent with turn detection - Added
multi_agent_handoff.ts— multi-agent routing demo - Added
restaurant_agent.ts— 4-agent restaurant system - Added
turn_detection_demo.ts— turn detection configuration demo - Added
livekit_room_with_tools.ts— room-based agent with tools - Updated
livekit_room_agent.ts— added turn detection, migrated toServerOptions/defineAgentAPI
Initial Release
AriaRuntimeLLMAdapter— wraps AriaFlow Runtime as a LiveKit LLM providerAriaFlowVoiceSession— WebSocket-based voice sessionAriaFlowLivekitSession— LiveKit room-based voice sessionFillerCoordinator— audio filler management during tool execution- Gemini Live STT/TTS integration
- Recording with pluggable storage adapters (S3)
- G.711 codec utilities (PCMU/PCMA)
- Audio resampling utilities
