@sonoba/pipeline

v0.1.4

Published

3 days ago

Voice Pipeline orchestration server — STT/LLM/TTS adapters, WebSocket session management, browser-driven and agent-driven flows

Downloads

729

0High
0Medium
0Low

ryuichi_maniwa

tamoooon99

voice websocket pipeline stt llm tts japanese sonoba

@sonoba/pipeline

Voice Pipeline orchestration server for the sonoba WebSocket voice protocol. Turns swappable STT / LLM / TTS providers into a session-managed pipeline that talks to @sonoba/voice-client browsers.

Looking for the frontend SDK? See @sonoba/voice-client and @sonoba/voice-client-react. This package is the server side of the same protocol.

Install

npm install @sonoba/pipeline @sonoba/shared

@sonoba/shared is a peer of the protocol surface and a hard dependency — install it explicitly so your TypeScript types resolve to the same version the server emits.

What's in the box

Two entry shapes for two integration styles:

| Entry | Use when... | |---|---| | startServer / createOwnershipHandler (ws-handler.ts) | You're building a new app on the protocol and want the canonical session:start → session:ack → typed-message flow with built-in ownership / grace / factory_failed envelopes. | | BrowserWSServer + BrowserVoicePipeline (browser-ws-server.ts, browser-voice-pipeline.ts) | You're lifting a host that already drives the agent push-style (e.g. a Minecraft AI), where TTS/STT are wired through TTSAdapter / STTAdapter factories per session. |

Both honour the same @sonoba/shared envelope shape; both emit session:ack so the SDK's resume gate releases promptly.

Provider adapters

Pull-style providers (used by the LLM-driven VoicePipeline):

STT: DeepgramSTTProvider, OpenAISTTProvider
LLM: ExternalAgentLLMProvider (and AnthropicLLMProvider if you install the optional @anthropic-ai/sdk)
TTS: AivisSpeechTTSProvider, ElevenLabsTTSProvider, AivisCloudTTSProvider

Push-style adapters (used by BrowserVoicePipeline):

TTS: AivisSpeechAdapter, ElevenLabsAdapter, AivisCloudAdapter
STT: PipelineSTTAdapter

Adapters / providers are constructed lazily per session via the factory you supply to createPipeline / startServer, so per-tenant configuration (API keys, voice IDs, locale, prompts) stays isolated.

Quickstart — `startServer`

import { startServer } from '@sonoba/pipeline';
import { OpenAISTTProvider } from '@sonoba/pipeline';
import { AivisSpeechTTSProvider } from '@sonoba/pipeline';

const server = await startServer({
  port: 8080,
  // Per-session factories — called once per session:start envelope.
  sttFactory: (sessionId, config) => new OpenAISTTProvider({
    apiKey: process.env.OPENAI_API_KEY!,
    model: 'gpt-4o-mini-transcribe',
    language: config.language ?? 'ja',
  }),
  ttsFactory: (sessionId, config) => new AivisSpeechTTSProvider({
    baseUrl: process.env.AIVISSPEECH_URL ?? 'http://localhost:10101',
    voiceId: config.voiceId ?? 1999504576,
  }),
  // LLM factory wires whatever agent you have. Use `ExternalAgentLLMProvider`
  // to bridge to a separate agent process via WebSocket.
  llmFactory: (sessionId, config, transport) =>
    new ExternalAgentLLMProvider({ transport, sessionId }),
});

The server now accepts SDK clients on ws://localhost:8080/. See packages/pipeline/src/ws-handler.ts:32 for the full options surface (verifyClient, transcriptGraceMs, ownership grace ms, ping interval, etc.).

Protocol parity

This package implements the server side of the protocol declared in @sonoba/shared protocol.ts:

Accepts: session:start, audio:vad_start, audio:vad_end, audio:vad_cancel, audio:barge_in, text:input, plus binary mic frames between VAD envelopes.
Emits: session:ack, transcript:user, transcript:agent (delta — see below), audio:tts_start, audio:tts_end, plus binary TTS frames.

transcript:agent.text is a delta (per the @sonoba/shared 0.2.0 contract) — server emits each LLM token / inject chunk as the slice produced since the last emission for the same streamId, then a final { text: '', isFinal: true, streamId } marker. Consumers must accumulate; the React Provider in @sonoba/voice-client-react does this for you.

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

@sonoba/pipeline

Install

What's in the box

Provider adapters

Quickstart — startServer

Protocol parity

License

Quickstart — `startServer`