@silyze/async-audio-openai
v1.1.11
Published
OpenAI real-time conversations implementation using @silyze/async-audio-stream
Readme
Async Audio OpenAI
@silyze/async-audio-openai is a full-duplex wrapper around the
OpenAI Realtime API
built on @silyze/async-audio-stream.
It streams µ-law (G.711) audio to the /v1/realtime WebSocket endpoint,
receives delta/response frames, and exposes them through the familiar
AudioStream interface – so you can plug it straight into the rest of the
@silyze/* async-stream ecosystem.
- ✔️ 8 kHz µ-law audio codec via
@silyze/async-audio-format-ulaw - ✔️ Real-time speech transcripts (
transcriptandtranscriptDeltastreams) - ✔️
writeMessage()helper for sending text prompts - ✔️ Bridging OpenAI Function Calls with
@silyze/async-openai-function-stream - ✔️ DTMF helper for telephony integrations
- ✔️ Optional external TTS integration (
TextToSpeachModel) - ✔️ Abort-signal aware and back-pressure friendly
Install
npm install @silyze/async-audio-openai \
@silyze/async-openai-function-stream # ← if you need function callsRequires Node ≥ 18 and the WS polyfill (
peerDependency) when used outside browsers.
Quick Start
import OpenAiStream, {
OpenAiVoice,
OpenAiSessionConfig,
} from "@silyze/async-audio-openai";
const key = process.env.OPENAI_API_KEY!;
const voice: OpenAiVoice = "alloy";
const session: OpenAiSessionConfig = {
voice,
model: "gpt-4o-audio-preview",
instructions: "You are a helpful voice assistant.",
};
// 1. Create the realtime connection & audio stream.
const { stream, socket } = await OpenAiStream.create(key, session, []);
// 2. Capture microphone in µ-law and pipe to OpenAI.
getMicrophoneStream().pipe(stream);
// 3. Handle transcript lines.
stream.transcript.forEach((line) => {
console.log(`[${line.source}] ${line.content}`);
});External TTS Mode (Example: ElevenLabs)
Instead of using OpenAI’s built-in voices, you can plug in any
TextToSpeachModel. For example, using
@silyze/async-audio-tts-elevenlabs:
import ElevenLabsTextToSpeachModel, {
ElevenLabsRegion,
} from "@silyze/async-audio-tts-elevenlabs";
import OpenAiStream, { OpenAiSessionConfig } from "@silyze/async-audio-openai";
const tts = ElevenLabsTextToSpeachModel.connect({
region: ElevenLabsRegion.auto, // default
voice_id: "JBFqnCBsd6RMkjVDRZzb", // pick a voice ID
"xi-api-key": process.env.ELEVEN_LABS_API_KEY!, // supply your token
settings: {
model_id: "eleven_multilingual_v2",
output_format: "ulaw_8000",
enable_logging: false,
},
init: {
voice_settings: { speed: 1.2 },
},
});
const key = process.env.OPENAI_API_KEY!;
const session: OpenAiSessionConfig = {
model: "gpt-4o-audio-preview",
instructions: "You are a helpful voice assistant.",
};
// 1. Create with external TTS directly
const { stream, socket } = await OpenAiStream.create(key, session, [], tts);
// 2. Microphone → OpenAI
getMicrophoneStream().pipe(stream);
// 3. Transcript lines
stream.transcript.forEach((line) => {
console.log(`[${line.source}] ${line.content}`);
});In this mode:
session.voiceis unset automatically.modalitiesis set to["text"](no OpenAI audio).- Assistant text deltas are streamed directly into
tts.speak(). OpenAiStream.read()and.formatautomatically reflect the external TTS.- Transcriptions (
transcript/transcriptDelta) continue to work.
OpenAI Function Calls
Combine OpenAiStream with @silyze/async-openai-function-stream to
transparently proxy function calls:
import {
createJsonStreamFunctions,
FunctionStream,
} from "@silyze/async-openai-function-stream";
const { tools, collection } = createJsonStreamFunctions(
[
{
type: "function",
name: "get_time",
description: "Return the current ISO date/time.",
parameters: { type: "object", properties: {}, required: [] },
},
],
wire.reverse()
);
const wire = new FunctionStream(collection); // JSON ⬅️➡️ OpenAI frames
wire.pipe(stream);
// Tell OpenAI about them:
await stream.update(session, tools);
// Later – handle calls:
wire.forEach((frame) => {
if (frame.name === "get_time") {
wire.write({ ...frame, value: new Date().toISOString() });
}
});Class Reference
class OpenAiStream implements AudioStream
| Property / Method | Type | Description |
| --------------------------------------- | ------------------------------------------------ | ---------------------------------------------------------------------------- |
| format | AudioFormat | µ-law 8 kHz by default, or external TTS format when active. |
| ready | Promise<void> | Resolves when the session.updated event arrives (and TTS is ready). |
| transcript | AsyncReadStream<Transcript> | Emits {source, content} for user & agent finalized transcripts. |
| transcriptDelta | AsyncReadStream<Transcript> | Emits incremental assistant transcript deltas (live captions). |
| write(chunk) | (Buffer) → Promise<void> | Write raw µ-law audio to the session. |
| writeMessage(role, text) | ( "user" \| "system", string ) → Promise<void> | Send a text message to the conversation. |
| handleFunctionCalls(functionStream) | Promise<void> | Bidirectional pipe of OpenAI function-call frames. |
| handleDtmf(dtmfStream, signal) | Promise<void> | Converts DTMF digits into system messages. |
| update(config, tools, [tts]) | Promise<void> | Hot-update session (voice, temp, tools). If tts provided → text-only mode. |
| flush() | Promise<void> | Commit input audio buffer (needed if server-VAD is disabled). |
| cancelResponse() (optional) | Promise<void> | Interrupt current assistant output and stop TTS. |
| close() | Promise<void> | Gracefully closes stream, socket, and TTS. |
| static create(key, cfg, tools, [tts]) | Promise<{stream, socket}> | Opens the WS & performs initial session.update. |
| static from(ws) | Promise<OpenAiStream> | Wrap an existing WebSocket. |
Session Config
interface OpenAiSessionConfig {
voice?: OpenAiVoice; // Built-in OpenAI voice (omit when using external TTS)
instructions: string; // System prompt
model: string; // e.g. "gpt-4o-audio-preview"
temperature?: number; // default 0.2
transcriptionModel?: string; // default "whisper-1"
functions?: FunctionTool[]; // OpenAI function-call tools
}Related Packages
| Package | Purpose |
| ------------------------------------------------------------------------------------------------------------ | ------------------------------------------ |
| @silyze/async-audio-stream | Core duplex audio interfaces. |
| @silyze/async-audio-format-ulaw | µ-law codec used internally. |
| @silyze/async-openai-function-stream | JSON-RPC bridge for OpenAI Function Calls. |
| @silyze/async-audio-tts | Unified interface for external TTS models. |
| @silyze/async-audio-tts-elevenlabs | ElevenLabs TextToSpeachModel adapter. |
