@sonoba/pipeline
v0.1.4
Published
Voice Pipeline orchestration server — STT/LLM/TTS adapters, WebSocket session management, browser-driven and agent-driven flows
Downloads
729
Readme
@sonoba/pipeline
Voice Pipeline orchestration server for the sonoba WebSocket voice protocol. Turns swappable STT / LLM / TTS providers into a session-managed pipeline that talks to @sonoba/voice-client browsers.
Looking for the frontend SDK? See
@sonoba/voice-clientand@sonoba/voice-client-react. This package is the server side of the same protocol.
Install
npm install @sonoba/pipeline @sonoba/shared@sonoba/shared is a peer of the protocol surface and a hard dependency — install it explicitly so your TypeScript types resolve to the same version the server emits.
What's in the box
Two entry shapes for two integration styles:
| Entry | Use when... |
|---|---|
| startServer / createOwnershipHandler (ws-handler.ts) | You're building a new app on the protocol and want the canonical session:start → session:ack → typed-message flow with built-in ownership / grace / factory_failed envelopes. |
| BrowserWSServer + BrowserVoicePipeline (browser-ws-server.ts, browser-voice-pipeline.ts) | You're lifting a host that already drives the agent push-style (e.g. a Minecraft AI), where TTS/STT are wired through TTSAdapter / STTAdapter factories per session. |
Both honour the same @sonoba/shared envelope shape; both emit session:ack so the SDK's resume gate releases promptly.
Provider adapters
Pull-style providers (used by the LLM-driven VoicePipeline):
- STT:
DeepgramSTTProvider,OpenAISTTProvider - LLM:
ExternalAgentLLMProvider(andAnthropicLLMProviderif you install the optional@anthropic-ai/sdk) - TTS:
AivisSpeechTTSProvider,ElevenLabsTTSProvider,AivisCloudTTSProvider
Push-style adapters (used by BrowserVoicePipeline):
- TTS:
AivisSpeechAdapter,ElevenLabsAdapter,AivisCloudAdapter - STT:
PipelineSTTAdapter
Adapters / providers are constructed lazily per session via the factory you supply to createPipeline / startServer, so per-tenant configuration (API keys, voice IDs, locale, prompts) stays isolated.
Quickstart — startServer
import { startServer } from '@sonoba/pipeline';
import { OpenAISTTProvider } from '@sonoba/pipeline';
import { AivisSpeechTTSProvider } from '@sonoba/pipeline';
const server = await startServer({
port: 8080,
// Per-session factories — called once per session:start envelope.
sttFactory: (sessionId, config) => new OpenAISTTProvider({
apiKey: process.env.OPENAI_API_KEY!,
model: 'gpt-4o-mini-transcribe',
language: config.language ?? 'ja',
}),
ttsFactory: (sessionId, config) => new AivisSpeechTTSProvider({
baseUrl: process.env.AIVISSPEECH_URL ?? 'http://localhost:10101',
voiceId: config.voiceId ?? 1999504576,
}),
// LLM factory wires whatever agent you have. Use `ExternalAgentLLMProvider`
// to bridge to a separate agent process via WebSocket.
llmFactory: (sessionId, config, transport) =>
new ExternalAgentLLMProvider({ transport, sessionId }),
});The server now accepts SDK clients on ws://localhost:8080/. See packages/pipeline/src/ws-handler.ts:32 for the full options surface (verifyClient, transcriptGraceMs, ownership grace ms, ping interval, etc.).
Protocol parity
This package implements the server side of the protocol declared in @sonoba/shared protocol.ts:
- Accepts:
session:start,audio:vad_start,audio:vad_end,audio:vad_cancel,audio:barge_in,text:input, plus binary mic frames between VAD envelopes. - Emits:
session:ack,transcript:user,transcript:agent(delta — see below),audio:tts_start,audio:tts_end, plus binary TTS frames.
transcript:agent.text is a delta (per the @sonoba/shared 0.2.0 contract) — server emits each LLM token / inject chunk as the slice produced since the last emission for the same streamId, then a final { text: '', isFinal: true, streamId } marker. Consumers must accumulate; the React Provider in @sonoba/voice-client-react does this for you.
License
MIT © sonoba
