npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

azure-realtime-webrtc

v0.2.1

Published

TypeScript SDK for Azure OpenAI Realtime API with WebRTC and WebSocket support

Readme


azure-realtime-webrtc is the missing npm package for Azure OpenAI's Realtime API. It handles the complex wiring of ephemeral tokens, SDP negotiation, WebRTC data channels, audio streams, and function calling — so you can build voice AI in minutes, not days.

// 5 lines to a working voice assistant
import { VoiceAssistant } from "azure-realtime-webrtc/sdk";

const assistant = new VoiceAssistant({
  resource: "my-resource",
  deployment: "gpt-4o-realtime-preview",
  tokenProvider: () => fetch("/api/token", { method: "POST" }).then(r => r.json()).then(d => d.token),
  instructions: "You are a helpful assistant.",
  voice: "alloy",
});

assistant.on("transcript", (entries) => renderConversation(entries));
assistant.on("stateChange", (state) => updateUI(state)); // "listening" | "thinking" | "speaking"
await assistant.start();

What's Inside

| Entry Point | Purpose | |-------------|---------| | azure-realtime-webrtc | Low-level WebRTC & WebSocket client, typed events, audio management | | azure-realtime-webrtc/sdk | High-level classes: VoiceAssistant, TextChat, ToolAgent | | azure-realtime-webrtc/streaming | Async iterators, ReadableStreams, Server-Sent Events | | azure-realtime-webrtc/server | Express middleware: token server, SDP proxy, Entra ID auth |

Features

| Feature | Details | |---------|---------| | WebRTC + WebSocket | Both connection modes with a unified API | | Full TypeScript | Strict types for all 32+ server events and 11 client events | | SDK: VoiceAssistant | Complete voice chat with state machine (listeningthinkingspeaking) | | SDK: TextChat | Streaming text chat with message history | | SDK: ToolAgent | Autonomous multi-step tool calling with execution trace | | Streaming | for await iterators, Web ReadableStreams, SSE handler | | Audio-Synced Text | Transcript streams in sync with the AI's voice playback | | Function Calling | registerTool() with automatic call → execute → respond cycle | | Express Middleware | Drop-in token server with rate limiting, CORS, input validation | | Both Auth Methods | API Key and Microsoft Entra ID | | Zero Runtime Deps | Only express as optional peer dep for the server module | | Security First | API keys never reach the browser. Ephemeral tokens only. |

Install

npm install azure-realtime-webrtc

For the server module:

npm install azure-realtime-webrtc express

For Entra ID:

npm install azure-realtime-webrtc express @azure/identity

Prerequisites

You need three things from the Azure Portal:

| Value | Where to find it | Example | |-------|-------------------|---------| | Resource name | Your Azure OpenAI resource URL: https://<THIS>.openai.azure.com | my-openai-resource | | API Key | Azure Portal → Your OpenAI resource → Keys and Endpoint | abc123... | | Deployment name | Azure AI Foundry → Deployments (must be a realtime model) | gpt-4o-realtime-preview |

Your deployment must be a realtime-capable model deployed in East US 2 or Sweden Central.

Architecture

Browser                          Your Server                    Azure OpenAI
  │                                  │                              │
  │  POST /api/realtime/token        │                              │
  │─────────────────────────────────>│  POST /client_secrets        │
  │                                  │─────────────────────────────>│
  │                                  │  { value: ephemeral_token }  │
  │  { token: ephemeral_token }      │<─────────────────────────────│
  │<─────────────────────────────────│                              │
  │                                                                 │
  │  WebRTC SDP offer + ephemeral token                             │
  │────────────────────────────────────────────────────────────────>│
  │                                         SDP answer              │
  │<────────────────────────────────────────────────────────────────│
  │                                                                 │
  │  ◄══════════ Bidirectional Audio (WebRTC media) ═══════════►   │
  │  ◄══════════ JSON Events (WebRTC data channel)  ═══════════►   │

Your API key never leaves your server. Only short-lived ephemeral tokens reach the browser.


Quick Start

Step 1: Token Server (Node.js)

import express from "express";
import { createRealtimeMiddleware } from "azure-realtime-webrtc/server";

const app = express();
app.use(createRealtimeMiddleware({
  resource: "my-resource",
  deployment: "gpt-4o-realtime-preview",
  auth: { type: "api-key", apiKey: process.env.AZURE_OPENAI_API_KEY! },
  session: {
    instructions: "You are a helpful assistant.",
    audio: { output: { voice: "alloy" } },
  },
  express,
}));
app.listen(3001);

Creates: POST /api/realtime/token · POST /api/realtime/negotiate · GET /api/realtime/health

Step 2: Browser Client

import { RealtimeClient } from "azure-realtime-webrtc";

const res = await fetch("/api/realtime/token", { method: "POST" });
const { token } = await res.json();

const client = new RealtimeClient({
  resource: "my-resource",
  deployment: "gpt-4o-realtime-preview",
  ephemeralToken: token,
  webrtcFilter: true,
});

client.on("session.created", () => console.log("Ready!"));
client.on("response.audio_transcript.delta", (e) => console.log(e.delta));
client.on("error", (e) => console.error(e.error.message));

await client.connect(); // mic + audio playback handled automatically

client.addItem({
  type: "message", role: "user",
  content: [{ type: "input_text", text: "Hello!" }],
});
client.createResponse();

High-Level SDK

Import from azure-realtime-webrtc/sdk. These classes handle all event wiring, state management, and conversation lifecycle for you.

VoiceAssistant

Complete voice chat with automatic state machine: idleconnectinglisteningthinkingspeakinglistening

import { VoiceAssistant } from "azure-realtime-webrtc/sdk";

const assistant = new VoiceAssistant({
  resource: "my-resource",
  deployment: "gpt-4o-realtime-preview",
  tokenProvider: async () => {
    const res = await fetch("/api/realtime/token", { method: "POST" });
    return (await res.json()).token;
  },
  instructions: "You are a travel advisor.",
  voice: "coral",
  transcriptionModel: "whisper-1",
});

assistant.on("transcript", (entries) => {
  // Full conversation — renders both user speech and AI responses
  entries.forEach((e) => console.log(`[${e.role}] ${e.text}${e.partial ? "..." : ""}`));
});

assistant.on("stateChange", (state) => updateStatusBadge(state));

await assistant.start();            // connects, requests mic, starts listening
assistant.sendText("Beach destinations in Europe?");
assistant.setMuted(true);           // mute mic
assistant.interrupt();              // stop AI mid-sentence
assistant.updateInstructions("Focus on budget options.");
assistant.stop();                   // disconnect

Events: transcript · stateChange · userSpeechStarted · userSpeechStopped · assistantAudioStarted · assistantAudioStopped · error · rawEvent

TextChat

Streaming text chat. No mic complexity. Ideal for chatbots.

import { TextChat } from "azure-realtime-webrtc/sdk";

const chat = new TextChat({
  resource: "my-resource",
  deployment: "gpt-4o-realtime-preview",
  tokenProvider,
  instructions: "You are customer support for Acme Corp.",
});

chat.on("message", (msg) => {
  if (msg.streaming) updateBubble(msg.id, msg.content);
  else finalizeBubble(msg.id, msg.content);
});

chat.on("responseStart", () => showTypingIndicator());
chat.on("responseEnd", () => hideTypingIndicator());

await chat.connect();
chat.send("How do I reset my password?");

Events: message · messages · responseStart · responseEnd · connected · error · rawEvent

ToolAgent

Autonomous agent that handles multi-turn tool calling loops. Send a task, get a result with full execution trace.

import { ToolAgent } from "azure-realtime-webrtc/sdk";

const agent = new ToolAgent({
  resource: "my-resource",
  deployment: "gpt-4o-realtime-preview",
  tokenProvider,
  instructions: "You are a research assistant. Use tools to find information.",
  maxToolRounds: 10,
});

agent.registerTool({
  definition: {
    type: "function", name: "web_search",
    description: "Search the web",
    parameters: { type: "object", properties: { query: { type: "string" } }, required: ["query"] },
  },
  handler: async (args) => JSON.stringify(await searchWeb(args.query)),
});

agent.on("step", (step) => console.log(`[${step.type}] ${step.content}`));

await agent.connect();
const result = await agent.run("Latest WebRTC developments in 2026");
console.log(result.response);        // Final answer
console.log(result.toolCallCount);   // How many tool calls were made
console.log(result.steps);           // Full execution trace

Events: step · toolCall · toolResult · textDelta · runComplete · connected · error · rawEvent


Streaming

Import from azure-realtime-webrtc/streaming. Works with any client — RealtimeClient, VoiceAssistant, TextChat, or ToolAgent.

Async Iterators

import { transcriptStream, audioStream, eventStream } from "azure-realtime-webrtc/streaming";

// Stream transcript word by word
for await (const chunk of transcriptStream(client)) {
  if (chunk.type === "delta") process.stdout.write(chunk.text);
  if (chunk.type === "done") console.log(`\n[${chunk.role}] Complete`);
}

// Stream all events
for await (const { type, event } of eventStream(client)) {
  console.log(type, event);
}

Web ReadableStreams

import { createTranscriptReadableStream } from "azure-realtime-webrtc/streaming";

const stream = createTranscriptReadableStream(client);
const reader = stream.getReader();

while (true) {
  const { value, done } = await reader.read();
  if (done) break;
  document.getElementById("output").textContent += value.text;
}

// Or pipe to a Response (Next.js streaming route, Cloudflare Worker)
return new Response(stream.pipeThrough(new TransformStream({
  transform(chunk, ctrl) { ctrl.enqueue(`data: ${JSON.stringify(chunk)}\n\n`); },
})), { headers: { "Content-Type": "text/event-stream" } });

Server-Sent Events (SSE)

import { createSSEHandler } from "azure-realtime-webrtc/streaming";

// Express endpoint
app.get("/api/stream", async (req, res) => {
  const client = new RealtimeClient({ ... });
  await client.connect();
  createSSEHandler(client, { events: "transcript" })(req, res);
});

// Browser
const source = new EventSource("/api/stream");
source.addEventListener("response.output_audio_transcript.delta", (e) => {
  console.log(JSON.parse(e.data).text);
});

Audio-Synced Transcript

Text transcript arrives faster than audio plays. To show text in sync with the AI's voice:

let wordBuffer = [], displayedText = "", dripTimer = null;

client.on("output_audio_buffer.started", () => {
  dripTimer = setInterval(() => {
    const word = wordBuffer.shift();
    if (word) { displayedText += word; render(displayedText); }
  }, 285); // ~3.5 words/sec = natural speech pace
});

client.on("output_audio_buffer.stopped", () => {
  displayedText += wordBuffer.join(""); wordBuffer = [];
  clearInterval(dripTimer); render(displayedText);
});

for await (const chunk of transcriptStream(client)) {
  if (chunk.role === "assistant" && chunk.type === "delta") {
    wordBuffer.push(chunk.text); // buffered, not shown yet
  }
}

Framework Guides

React

import { useRef, useState, useCallback, useEffect } from "react";
import { VoiceAssistant, TranscriptEntry, VoiceAssistantState } from "azure-realtime-webrtc/sdk";

export function useVoiceAssistant() {
  const ref = useRef<VoiceAssistant | null>(null);
  const [state, setState] = useState<VoiceAssistantState>("idle");
  const [transcript, setTranscript] = useState<TranscriptEntry[]>([]);

  const start = useCallback(async () => {
    const a = new VoiceAssistant({
      resource: process.env.NEXT_PUBLIC_AZURE_RESOURCE!,
      deployment: process.env.NEXT_PUBLIC_AZURE_DEPLOYMENT!,
      tokenProvider: async () => {
        const res = await fetch("/api/realtime/token", { method: "POST" });
        return (await res.json()).token;
      },
      instructions: "You are a helpful assistant.",
      voice: "alloy",
    });
    a.on("stateChange", setState);
    a.on("transcript", setTranscript);
    ref.current = a;
    await a.start();
  }, []);

  const stop = useCallback(() => { ref.current?.stop(); ref.current = null; }, []);
  const sendText = useCallback((t: string) => ref.current?.sendText(t), []);
  const toggleMute = useCallback(() => {
    const a = ref.current;
    if (a) a.setMuted(!a.isMuted);
  }, []);

  useEffect(() => () => { ref.current?.stop(); }, []);

  return { state, transcript, start, stop, sendText, toggleMute };
}

Next.js (App Router)

// app/api/realtime/token/route.ts
import { NextResponse } from "next/server";

export async function POST() {
  const res = await fetch(
    `https://${process.env.AZURE_RESOURCE}.openai.azure.com/openai/v1/realtime/client_secrets`,
    {
      method: "POST",
      headers: { "api-key": process.env.AZURE_OPENAI_API_KEY!, "Content-Type": "application/json" },
      body: JSON.stringify({
        session: { type: "realtime", model: process.env.AZURE_DEPLOYMENT! },
      }),
    }
  );
  const data = await res.json();
  return NextResponse.json({ token: data.value });
}

Vue / Nuxt

<script setup lang="ts">
import { ref, onUnmounted } from "vue";
import { VoiceAssistant } from "azure-realtime-webrtc/sdk";

const state = ref("idle");
const transcript = ref([]);
let assistant = null;

async function start() {
  assistant = new VoiceAssistant({
    resource: import.meta.env.VITE_AZURE_RESOURCE,
    deployment: import.meta.env.VITE_AZURE_DEPLOYMENT,
    tokenProvider: async () => (await fetch("/api/realtime/token", { method: "POST" }).then(r => r.json())).token,
  });
  assistant.on("stateChange", (s) => (state.value = s));
  assistant.on("transcript", (t) => (transcript.value = t));
  await assistant.start();
}

onUnmounted(() => assistant?.stop());
</script>

Angular

import { Injectable, OnDestroy } from "@angular/core";
import { BehaviorSubject } from "rxjs";
import { VoiceAssistant, TranscriptEntry, VoiceAssistantState } from "azure-realtime-webrtc/sdk";

@Injectable({ providedIn: "root" })
export class RealtimeService implements OnDestroy {
  private assistant: VoiceAssistant | null = null;
  state$ = new BehaviorSubject<VoiceAssistantState>("idle");
  transcript$ = new BehaviorSubject<TranscriptEntry[]>([]);

  async start() {
    this.assistant = new VoiceAssistant({ /* config */ });
    this.assistant.on("stateChange", (s) => this.state$.next(s));
    this.assistant.on("transcript", (t) => this.transcript$.next(t));
    await this.assistant.start();
  }

  stop() { this.assistant?.stop(); }
  ngOnDestroy() { this.stop(); }
}

Vanilla JavaScript

<script type="module">
  import { VoiceAssistant } from "https://cdn.jsdelivr.net/npm/azure-realtime-webrtc/dist/sdk.js";

  const assistant = new VoiceAssistant({
    resource: "my-resource",
    deployment: "gpt-4o-realtime-preview",
    tokenProvider: async () => (await fetch("/api/token", { method: "POST" }).then(r => r.json())).token,
  });
  assistant.on("transcript", (e) => { /* render */ });
  await assistant.start();
</script>

Node.js (Server-Only)

import { RealtimeClient } from "azure-realtime-webrtc";

const client = new RealtimeClient({
  resource: "my-resource",
  deployment: "gpt-4o-realtime-preview",
  mode: "websocket",
  auth: { type: "api-key", apiKey: process.env.AZURE_OPENAI_API_KEY! },
});

client.on("response.text.done", (e) => console.log("AI:", e.text));
await client.connect();

client.addItem({ type: "message", role: "user", content: [{ type: "input_text", text: "Explain WebRTC." }] });
client.createResponse();

API Reference

RealtimeClient

| Option | Type | Default | Description | |--------|------|---------|-------------| | resource | string | required | Azure resource name | | deployment | string | required | Model deployment name | | mode | "webrtc" \| "websocket" | "webrtc" | Connection mode | | ephemeralToken | string | — | Token from your server (browser) | | auth | AuthConfig | — | Direct auth (server-side) | | session | SessionConfig | — | Session config | | webrtcFilter | boolean | false | Filter data channel events | | autoMicrophone | boolean | true | Auto-request mic | | channelTimeout | number | 10000 | Data channel open timeout (ms) |

| Method | Description | |--------|-------------| | connect() | Connect to Azure OpenAI Realtime | | send(event) | Send any client event | | createResponse() | Trigger model response | | updateSession(config) | Update session mid-conversation | | addItem(item) | Add a conversation item | | registerTool(reg) | Register a function tool with auto-handling | | setMicrophoneMuted(muted) | Mute/unmute mic (WebRTC only) | | disconnect() | Disconnect and cleanup |

createRealtimeMiddleware(options)

| Option | Type | Default | Description | |--------|------|---------|-------------| | resource | string | required | Azure resource name | | deployment | string | required | Deployment name | | auth | AuthConfig | required | Server-side auth | | session | SessionConfig | — | Default session config | | prefix | string | "/api/realtime" | Route prefix | | corsOrigin | string | "*" | CORS origin | | rateLimit | number | 10 | Requests/min/IP | | express | Express | — | Required in ESM |

createEntraAuth(options?)

import { createEntraAuth } from "azure-realtime-webrtc/server";
const auth = await createEntraAuth({ tenantId: "...", clientId: "..." });

Session Configuration

const session = {
  instructions: "You are a helpful assistant.",
  audio: {
    output: { voice: "alloy", format: "pcm16" },
    input: {
      format: "pcm16",
      transcription: { model: "whisper-1" },
      turn_detection: {
        type: "server_vad",
        threshold: 0.5,
        prefix_padding_ms: 300,
        silence_duration_ms: 200,
        create_response: true,
      },
    },
  },
  modalities: ["audio", "text"],
  temperature: 0.8,
  max_response_output_tokens: 4096,
  tools: [{ type: "function", name: "...", description: "...", parameters: { ... } }],
  tool_choice: "auto",
};

Voices: alloy · ash · ballad · coral · echo · sage · shimmer · verse · marin

Function Calling

client.registerTool({
  definition: {
    type: "function", name: "get_weather",
    description: "Get current weather for a city",
    parameters: { type: "object", properties: { city: { type: "string" } }, required: ["city"] },
  },
  handler: async (args: { city: string }) => {
    const data = await fetchWeather(args.city);
    return JSON.stringify(data);
  },
});
// Model calls get_weather → handler runs → result sent back → model continues

Events Reference

Server Events

| Category | Events | |----------|--------| | Session | session.created · session.updated | | Conversation | conversation.created · conversation.item.created · conversation.item.deleted · conversation.item.truncated | | User Speech | input_audio_buffer.speech_started · input_audio_buffer.speech_stopped · conversation.item.input_audio_transcription.completed | | AI Transcript | response.audio_transcript.delta · response.audio_transcript.done · response.output_audio_transcript.delta · response.output_audio_transcript.done | | AI Text | response.text.delta · response.text.done · response.output_text.delta · response.output_text.done | | AI Audio | response.audio.delta · response.audio.done · output_audio_buffer.started · output_audio_buffer.stopped | | Tool Calls | response.function_call_arguments.delta · response.function_call_arguments.done | | Response | response.created · response.done · response.output_item.added · response.content_part.added | | Other | error · rate_limits.updated |

Wildcard + Connection

client.on("*", (event) => console.log(event.type));    // all events
client.on("connected", () => { });                      // connected
client.on("disconnected", ({ reason }) => { });         // disconnected

Supported Models

| Model | Version | |-------|---------| | gpt-4o-mini-realtime-preview | 2024-12-17 | | gpt-4o-realtime-preview | 2024-12-17 | | gpt-realtime | 2025-08-28 | | gpt-realtime-mini | 2025-10-06, 2025-12-15 | | gpt-realtime-1.5 | 2026-02-23 |

Regions: East US 2 and Sweden Central only.

Security

| Measure | Details | |---------|---------| | Token isolation | API keys never reach the browser — only short-lived ephemeral tokens | | Rate limiting | Built-in per-IP sliding window (configurable) | | Input validation | SDP offers validated (format + 64KB max). JSON bodies validated. | | Security headers | Cache-Control: no-store · X-Content-Type-Options: nosniff | | No eval | All JSON parsed with JSON.parse. No eval() or Function(). | | CORS | Configurable origin restrictions |

Production Checklist

  • [ ] Set corsOrigin to your specific domain(s)
  • [ ] Use environment variables for API keys
  • [ ] Deploy behind HTTPS
  • [ ] Use Entra ID instead of API keys (createEntraAuth())
  • [ ] Set up Azure billing alerts

Troubleshooting

| Issue | Solution | |-------|----------| | Token request 500 | Voice must be nested: audio.output.voice. Transcription: audio.input.transcription.model. Not flat fields. | | No transcript streaming | Listen to BOTH event names: response.audio_transcript.delta AND response.output_audio_transcript.delta | | RTCPeerConnection not available | Use mode: "websocket" for server-side | | Mic permission denied | Check browser permissions. HTTPS required in production. | | Data channel timeout | Verify deployment is in East US 2 or Sweden Central | | Auth 401 | Check API key belongs to the correct resource | | Text arrives before audio | Use the Audio-Synced Transcript pattern |

Author & Maintainer

Komal Vardhan Lolugu Lead Product Engineer — Agentic AI & Generative Models

| Platform | Link | |----------|------| | Portfolio | komalsrinivas.vercel.app | | LinkedIn | linkedin.com/in/komalvardhanlolugu | | GitHub | github.com/komalSrinivasan | | Medium | komalvardhan.medium.com | | Topmate | topmate.io/komal_vardhan_lolugu |

For bugs, questions, or collaboration — reach out via LinkedIn or open an issue.

License

MIT