npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

@deepgram/agents

v0.1.1

Published

Deepgram Voice Agent SDK — browser/Node WebSocket client with microphone, audio playback, and VAD

Readme

@deepgram/agents

Core SDK for the Deepgram Voice Agent API. Manages the WebSocket session, microphone capture, and audio playback with volume/frequency analysis.

Install

bun add @deepgram/agents

Quick Start

import { AgentSession, AgentMicrophone, AgentPlayer } from "@deepgram/agents";

const session = new AgentSession({
  auth: { tokenFactory: () => fetch('/api/deepgram-token').then(r => r.text()) },
  agent: { think: { provider: { type: 'open_ai' }, model: 'gpt-4o-mini' } },
});

const player = new AgentPlayer();
session.on("audio", (chunk) => player.queue(chunk));
session.on("conversation-text", (msg) => {
  console.log(`${msg.role}: ${msg.content}`);
});

const mic = new AgentMicrophone((data) => session.sendAudio(data));

await session.connect();
await mic.start();

// Later:
// mic.stop();
// session.disconnect();
// player.dispose();

Authentication

Two auth modes via AgentSessionConfig.auth:

// Server-side: raw API key
{ apiKey: "your-deepgram-api-key" }

// Browser-safe: token factory (recommended)
{ tokenFactory: () => fetch('/api/token').then(r => r.text()) }

The token factory is called before every connection and reconnection attempt. Tokens are cached until invalidated by a reconnect, so the factory is not called on every audio frame.

The SDK authenticates using the Sec-WebSocket-Protocol header trick -- the browser WebSocket constructor doesn't support custom headers, so the token is passed as a subprotocol value. This is handled internally.

AgentSession

Core WebSocket session. Wraps @deepgram/sdk's agent connection with:

  • Token factory auth (fresh credentials on every reconnect)
  • Typed EventEmitter events
  • Exponential-backoff reconnect with jitter
  • Automatic KeepAlive pings
  • Audio buffering until SettingsApplied

Constructor

const session = new AgentSession(config: AgentSessionConfig);

Config

interface AgentSessionConfig {
  auth: { apiKey: string } | { tokenFactory: () => Promise<string> };
  agent: AgentSettingsObject | string;  // inline config or pre-built agent UUID
  audio?: {
    input?: { encoding?: AudioEncoding; sampleRate?: number };   // default: linear16 @ 16kHz
    output?: { encoding?: OutputEncoding; sampleRate?: number }; // default: 24kHz
  };
  keepAliveInterval?: number;  // default: 10_000ms
  reconnect?: ReconnectConfig;
  experimental?: boolean;
  tags?: string[];
}

Methods

| Method | Description | |--------|-------------| | connect() | Open WebSocket connection | | disconnect() | Close connection (no reconnect) | | sendAudio(data: ArrayBuffer) | Send PCM audio frame (queued until SettingsApplied) | | injectUserMessage(content) | Send a text message as the user | | injectAgentMessage(message) | Inject text as the agent | | updateSpeak(speak) | Update TTS settings mid-session | | updateThink(think) | Update LLM settings mid-session | | updatePrompt(prompt) | Update system prompt mid-session | | sendFunctionCallResponse(id, name, content) | Respond to a function call request | | getId() | Returns session ID (available after Welcome) |

Events

session.on("welcome", (msg) => {});
session.on("settings-applied", (msg) => {});
session.on("conversation-text", (msg) => {});
session.on("user-started-speaking", (msg) => {});
session.on("agent-thinking", (msg) => {});
session.on("agent-started-speaking", (msg) => {});
session.on("agent-audio-done", (msg) => {});
session.on("function-call-request", (msg) => {});
session.on("function-call-response", (msg) => {});
session.on("prompt-updated", (msg) => {});
session.on("speak-updated", (msg) => {});
session.on("think-updated", (msg) => {});
session.on("injection-refused", (msg) => {});
session.on("error", (msg) => {});
session.on("warning", (msg) => {});

// Binary audio from the agent
session.on("audio", (chunk: ArrayBuffer) => {});

// SDK lifecycle
session.on("connecting", () => {});
session.on("connected", () => {});
session.on("reconnecting", (attempt, delayMs) => {});
session.on("disconnected", (reason) => {});
session.on("sdk-error", (err) => {});

State

session.state; // "idle" | "connecting" | "connected" | "reconnecting" | "disconnected"

Reconnect

Auto-reconnect is enabled by default with exponential backoff + jitter. Configure via reconnect:

{
  enabled: true,       // default
  maxAttempts: 8,      // default
  baseDelay: 500,      // ms, default
  maxDelay: 30_000,    // ms, default
  jitter: true,        // default: +/-20%
}

AgentMicrophone

Captures PCM audio from the user's microphone via AudioWorklet.

Usage

const mic = new AgentMicrophone(
  (data: ArrayBuffer) => session.sendAudio(data),
  {
    sampleRate: 16_000,         // default
    echoCancellation: true,     // default
    noiseSuppression: true,     // default
    autoGainControl: true,      // default
  },
);

await mic.start();
mic.mute();
mic.unmute();
mic.stop();

Volume and Frequency Data

mic.getInputVolume();            // 0-1, RMS-based
mic.getInputByteFrequencyData(); // Uint8Array of frequency bin magnitudes (0-255)

Events

mic.on("audio-frame", (data: ArrayBuffer) => {});
mic.on("error", (err: Error) => {});

AgentPlayer

Decodes and plays PCM Int16 audio from the agent. Provides volume/frequency analysis for visualizations and supports barge-in via interrupt().

Usage

const player = new AgentPlayer({ sampleRate: 24_000 }); // default

// Queue audio from the session
session.on("audio", (chunk) => player.queue(chunk));

// Barge-in: interrupt when the user starts speaking
session.on("user-started-speaking", () => player.interrupt());

// Volume control
player.setVolume(0.8);
player.mute();
player.unmute();

// Cleanup
player.dispose();

Volume and Frequency Data

player.getOutputVolume();            // 0-1, RMS-based
player.getOutputByteFrequencyData(); // Uint8Array of frequency bin magnitudes (0-255)
player.getRemainingPlaybackTime();   // seconds of queued audio remaining

Exports

// Classes
export { AgentSession, AgentMicrophone, AgentPlayer };

// Types
export type {
  AgentState,
  AgentSessionConfig, AuthConfig, TokenFactory, ReconnectConfig,
  AgentSessionEvents,
  MicrophoneOptions,
  PlayerOptions,
  AgentSettingsObject, ThinkSettings, SpeakSettings,
  // Server messages
  WelcomeMessage, SettingsAppliedMessage, ConversationTextMessage,
  UserStartedSpeakingMessage, AgentThinkingMessage,
  FunctionCallRequestMessage, FunctionCallItem,
  AgentStartedSpeakingMessage, AgentAudioDoneMessage,
  AgentErrorMessage, AgentWarningMessage,
  InjectionRefusedMessage, ServerMessage,
};

License

MIT