npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

@kognitivedev/voice

v0.2.29

Published

Browser voice agent layer for Kognitive with OpenAI Realtime and Gemini Live runtimes

Downloads

287

Readme

@kognitivedev/voice

Browser voice agents for Kognitive with built-in direct runtimes plus a backend-executed pipeline mode.

Installation

bun add @kognitivedev/voice @kognitivedev/tools

What It Provides

  • createVoiceAgent()
  • createVoiceAgentNetwork()
  • issueRealtimeClientSecret()
  • createBrowserVoiceSession()
  • @kognitivedev/voice/telephony for server-side telephony adapters and session management

Runtime Adapters

@kognitivedev/voice ships with three runtime modes:

  • openai-realtime via WebRTC
  • gemini-live via WebSocket
  • kognitive-voice via the backend pipeline runtime, bootstrapped with /session

createBrowserVoiceSession() selects the adapter automatically from prepare.runtime.provider, so the browser code stays the same after prepare().

Runtime Differences

  • OpenAI Realtime supports short-lived client secrets via issueRealtimeClientSecret().
  • OpenAI Realtime supports live instruction updates and in-session hot handoff when the next specialist stays on the same provider/runtime.
  • Gemini Live uses credentialsEndpoint, credentials, or getCredentials() instead of OpenAI client secrets.
  • Gemini Live currently requires reconnects for instruction changes and cross-agent handoff.
  • Kognitive pipeline mode keeps the browser API the same, but the live STT/LLM/TTS execution runs in the separate Python apps/voice-runtime service.

Quick Start

OpenAI Realtime

import { createVoiceAgent, createBrowserVoiceSession } from "@kognitivedev/voice";
import { createTool } from "@kognitivedev/tools";
import { z } from "zod";

const weatherTool = createTool({
  id: "weather-lookup",
  description: "Look up weather",
  inputSchema: z.object({ city: z.string() }),
  execute: async ({ city }) => ({ city, temperature: 24 }),
});

const voiceAgent = createVoiceAgent({
  name: "voice-assistant",
  instructions: "Help the user in a concise spoken style.",
  runtime: {
    provider: "openai-realtime",
    model: "gpt-realtime",
    voice: "marin",
  },
  tools: [weatherTool],
});

const prepared = await voiceAgent.prepare({
  resourceId: { userId: "user_1" },
});

const session = createBrowserVoiceSession({
  prepare: prepared,
  clientSecretEndpoint: "/api/kognitive/voice/agents/voice-assistant/client-secret",
  toolEndpoint: "/api/kognitive/voice/agents/voice-assistant/tools/execute",
});

await session.connect();
session.sendText("What is the weather in Paris?");

VoiceSessionState.messages uses KognitiveUIMessage[], so existing @kognitivedev/ui tool registrations can render voice tool calls and results without a second registry.

Context Adapters

Voice agents support the same structurally compatible contextAdapters contract as @kognitivedev/agents. Adapters resolve during prepare() and can append session instructions or add session-scoped tools:

import { createCloudKnowledgeBaseContextAdapter } from "@kognitivedev/cloud-knowledge-base";

const voiceAgent = createVoiceAgent({
  name: "voice-assistant",
  instructions: "Help the user in a concise spoken style.",
  contextAdapters: [
    createCloudKnowledgeBaseContextAdapter({
      pipelineId: "support-docs",
      autoRegisterSearchTool: true,
    }),
  ],
});

Because voice prepare() usually happens before the live conversation starts, context adapters are best for session bootstrap context such as account state, tenant configuration, preloaded knowledge, and tool registration. For turn-by-turn retrieval based on the user's spoken request, expose retrieval as a tool so the realtime model can call it during the session.

Gemini Live

import { createVoiceAgent, createBrowserVoiceSession } from "@kognitivedev/voice";

const voiceAgent = createVoiceAgent({
  name: "voice-assistant",
  instructions: "Help the user in a concise spoken style.",
  runtime: {
    provider: "gemini-live",
    model: "gemini-3.1-flash-live-preview",
    voice: "Aoede",
  },
});

const prepared = await voiceAgent.prepare({
  resourceId: { userId: "user_1" },
});

const session = createBrowserVoiceSession({
  prepare: prepared,
  credentialsEndpoint: "/api/kognitive/voice/agents/voice-assistant/credentials",
  toolEndpoint: "/api/kognitive/voice/agents/voice-assistant/tools/execute",
});

await session.connect();
session.sendText("Summarize what I said in one sentence.");

Credentials

  • Use clientSecretEndpoint or issueRealtimeClientSecret() for OpenAI Realtime sessions.
  • Use credentialsEndpoint, credentials, or getCredentials() for Gemini Live sessions.
  • Use sessionEndpoint, session, or getSession() for backend pipeline sessions.
  • The mounted Kognitive runtime now exposes /session as the canonical bootstrap endpoint. /client-secret and /credentials remain available for direct runtimes.

Pipeline Mode

Use pipeline mode when you want Kognitive to keep a single public voice API while a backend runtime owns the live STT/LLM/TTS execution.

const voiceAgent = createVoiceAgent({
  name: "voice-pipeline",
  instructions: "Help the user in a concise spoken style.",
  runtime: {
    provider: "kognitive-voice",
    mode: "pipeline",
    sessionEndpoint: "/api/kognitive/voice/agents/voice-pipeline/session",
    pipeline: {
      transport: { type: "websocket" },
      stt: { provider: "deepgram", model: "nova-3", language: "en" },
      llm: { provider: "xai", model: "grok-4.1-fast" },
      tts: { provider: "cartesia", model: "sonic-3", voice: "blake" },
      backgroundAudio: { preset: "none" },
    },
  },
});

const prepared = await voiceAgent.prepare({
  resourceId: { userId: "user_1" },
});

const session = createBrowserVoiceSession({
  prepare: prepared,
  sessionEndpoint: "/api/kognitive/voice/agents/voice-pipeline/session",
});

await session.connect();

Telephony

Use @kognitivedev/voice/telephony when the caller is not a browser microphone. This entry point is Node-only and is intended for PSTN, SIP, or provider WebSocket integrations.

The current implementation ships with:

  • a provider-neutral telephony session registry
  • normalized call/session types
  • a Twilio Media Streams adapter
  • Twilio request-signature validation
  • TwiML generation for <Connect><Stream>
  • Twilio bidirectional media message parsing and serialization
  • mu-law 8k codec helpers for Twilio audio payloads

Install

@kognitivedev/voice/telephony is exported from the same package:

import {
  createVoiceTelephonyService,
  createTwilioInboundCallResponse,
  handleTwilioMediaStreamMessage,
} from "@kognitivedev/voice/telephony";

Session Model

The telephony layer does not replace createVoiceAgent(). It resolves an existing voice agent, calls prepare(), and stores the resulting VoicePrepareResult alongside provider call metadata.

import { createVoiceTelephonyService } from "@kognitivedev/voice/telephony";

const telephony = createVoiceTelephonyService({
  resolveAgent: async ({ agentName }) => {
    if (agentName === "billing") return billingVoiceAgent;
    return conciergeVoiceAgent;
  },
});

Twilio Inbound Call Example

import { createTwilioInboundCallResponse } from "@kognitivedev/voice/telephony";

const response = await createTwilioInboundCallResponse({
  service: telephony,
  authToken: process.env.TWILIO_AUTH_TOKEN!,
  resourceId: { userId: "user_123" },
  buildStreamUrl: (session) =>
    `wss://example.com/api/twilio/media/${session.sessionId}`,
  customParameters: {
    projectId: "project_1",
  },
}, {
  url: "https://example.com/api/twilio/inbound",
  headers: {
    "x-twilio-signature": req.headers["x-twilio-signature"] as string,
  },
  params: {
    CallSid: "CA123",
    AccountSid: "AC123",
    From: "+15550001111",
    To: "+15550002222",
    Direction: "inbound",
  },
});

return new Response(response.twiml, {
  headers: { "Content-Type": "text/xml" },
});

The generated TwiML uses <Connect><Stream> and injects sessionId, callId, and agentName as Twilio <Parameter> values so the subsequent WebSocket stream can be correlated back to your Kognitive session.

Twilio Media Stream Example

import {
  handleTwilioMediaStreamMessage,
  parseTwilioMediaStreamMessage,
} from "@kognitivedev/voice/telephony";

const parsed = parseTwilioMediaStreamMessage(rawMessage);
const event = handleTwilioMediaStreamMessage(telephony, sessionId, parsed);

if (event.type === "call.audio") {
  // event.audio.payload is base64 mu-law 8k
  // forward it into your STT / realtime / pipeline runtime here
}

Twilio Security

The helper validates the X-Twilio-Signature header against the exact webhook URL and form parameters. For Twilio WebSocket validation, the helper also supports the documented trailing-slash fallback used by Twilio's signature validation guidance.

What This Layer Does Not Do

The telephony entry point does not yet mount framework routes or run the live STT-LLM-TTS bridge for you. It gives you the provider-neutral session model and the first carrier adapter so your app can wire:

  • webhook ingress
  • WebSocket upgrades
  • audio bridging
  • call-control decisions
  • tracing/reporting

Multi-Agent Voice Networks

Use createVoiceAgentNetwork() when one live call should move between specialists instead of forcing a single prompt to do everything.

import {
  createVoiceAgent,
  createVoiceAgentNetwork,
  createBrowserVoiceSession,
} from "@kognitivedev/voice";

const supportAgent = createVoiceAgent({
  name: "support",
  instructions: "Handle general support questions.",
});

const billingAgent = createVoiceAgent({
  name: "billing",
  instructions: "Handle refunds, invoices, and payment failures.",
});

const network = createVoiceAgentNetwork({
  name: "support-network",
  agents: {
    support: supportAgent,
    billing: billingAgent,
  },
  defaultAgent: "support",
  maxHops: 5,
});

const prepared = await network.prepare({
  resourceId: { userId: "user_1" },
});

const session = createBrowserVoiceSession({
  prepare: prepared,
  credentialsEndpoint: "/api/kognitive/voice/agents/support-network/credentials",
  handoffDelayMs: 1500,
  handoff: async ({ handoff, currentPrepare }) => ({
    prepare: await network.handoff({
      targetAgentName: handoff.agent,
      currentAgentName: currentPrepare.network?.activeAgentName,
      reason: handoff.reason,
      resourceId: currentPrepare.resourceId,
      metadata: currentPrepare.metadata,
      hopCount: currentPrepare.network?.hopCount,
      sharedState: currentPrepare.network?.sharedState,
      transferState: handoff.transferState,
    }),
  }),
});

Network Behavior

  • A networked prepare result includes prepare.network with the active specialist, hop count, shared state, and handoff tool metadata.
  • The active specialist gets an internal handoff_to_agent tool by default.
  • OpenAI Realtime sessions can hot-swap specialists in-place when the next specialist uses the same provider, transport, and model.
  • Gemini Live currently emits a clear handoff failure because in-session specialist replacement is not yet supported by the runtime.

Exports

  • createVoiceAgent()
  • createVoiceAgentNetwork()
  • VoiceContextAdapter
  • VoiceContextAdapterResult
  • createBrowserVoiceSession()
  • issueRealtimeClientSecret()
  • @kognitivedev/voice/telephony
  • OPENAI_REALTIME_RUNTIME
  • GEMINI_LIVE_RUNTIME
  • OPENAI_REALTIME_CAPABILITIES
  • GEMINI_LIVE_CAPABILITIES
  • resolveVoiceRuntimeAdapter()
  • resolvePreparedVoiceRuntime()
  • sanitizePreparedVoiceSession()
  • toSdkRealtimeSessionConfig()
  • voice session state reducers and telemetry/reporting helpers
  • telephony service, Twilio Media Streams helpers, and telephony codec utilities

Browser Handoff Resolution

Browser sessions can resolve handoffs in two ways:

  • Pass handoff(request) directly and return the next VoicePrepareResult.
  • Pass handoffEndpoint, or rely on auto-derivation from credentialsEndpoint / clientSecretEndpoint, and return JSON like:
  • Optionally set handoffDelayMs to delay the in-session swap. This can be a fixed number or a function that computes the delay from the handoff request/result.
{
  "prepare": {
    "...": "VoicePrepareResult"
  }
}

The request body includes:

  • handoff
  • resourceId
  • metadata
  • network
  • callId
  • sessionId