npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

@drawdream/livespeech

v0.1.16

Published

Real-time speech-to-speech AI conversation SDK

Readme

LiveSpeech SDK for TypeScript

npm License: MIT

A TypeScript/JavaScript SDK for real-time speech-to-speech AI conversations.

Features

  • 🎙️ Real-time Voice Conversations - Natural, low-latency voice interactions
  • 🌐 Multi-language Support - Korean, English, Japanese, Chinese, and more
  • 🔊 Streaming Audio - Send and receive audio in real-time
  • ⏹️ Barge-in Support - Interrupt AI mid-speech by talking or programmatically
  • 🔄 Auto-reconnection - Automatic recovery from network issues
  • 🌐 Browser & Node.js - Works in both environments

Installation

npm install @drawdream/livespeech

Quick Start (5 minutes)

import { LiveSpeechClient } from '@drawdream/livespeech';

const client = new LiveSpeechClient({
  region: 'ap-northeast-2',
  apiKey: 'your-api-key',
});

// Handle only 4 essential events!
client.setAudioHandler((audioData) => {
  audioPlayer.queue(audioData);  // PCM16 — use event.sampleRate (24kHz Live, 16kHz Composed)
});

client.on('interrupted', () => {
  audioPlayer.clear();  // CRITICAL: Clear buffer on interrupt!
});

client.on('turnComplete', () => {
  console.log('AI finished');
});

client.setErrorHandler((error) => {
  console.error('Error:', error.message);
});

// Connect and start
await client.connect();
await client.startSession({ prePrompt: 'You are a helpful assistant.' });

// Send audio
client.audioStart();
client.sendAudioChunk(pcmData);  // PCM16 @ 16kHz
client.audioEnd();

// Cleanup
await client.endSession();
client.disconnect();

Core API

Everything you need for basic voice conversations.

Methods

| Method | Description | |--------|-------------| | connect() | Establish connection | | disconnect() | Close connection | | startSession(config) | Start conversation with system prompt | | endSession() | End conversation | | sendAudioChunk(data) | Send PCM16 audio (16kHz) |

Events

| Event | Description | Action Required | |-------|-------------|-----------------| | audio | AI's audio output | Play audio (PCM16 — check sampleRate) | | turnComplete | AI finished speaking | Ready for next input | | interrupted | User barged in | Clear audio buffer! | | error | Error occurred | Handle/log error |

⚠️ Critical: Handle interrupted

When the user speaks while AI is responding, you must clear your audio buffer:

client.on('interrupted', () => {
  audioPlayer.clear();  // Stop buffered audio immediately
  audioPlayer.stop();
});

Without this, 2-3 seconds of buffered audio continues playing after the user interrupts.

Audio Format

| Direction | Format | Sample Rate | |-----------|--------|-------------| | Input (mic) | PCM16 | 16,000 Hz | | Output (AI) — Live mode | PCM16 | 24,000 Hz | | Output (AI) — Composed mode | PCM16 | 16,000 Hz |

Important: The audio event includes a sampleRate field. Always use it to configure your audio decoder rather than hardcoding a rate.

Configuration

const client = new LiveSpeechClient({
  region: 'ap-northeast-2',       // Required
  apiKey: 'your-api-key',         // Required
});

await client.startSession({
  prePrompt: 'You are a helpful assistant.',
  language: 'ko-KR',              // Optional: ko-KR, en-US, ja-JP, etc.
});

Composed Mode

Use composed mode for higher accuracy with slightly more latency. It runs a separate STT → LLM → TTS pipeline instead of direct audio-to-audio.

await client.startSession({
  prePrompt: 'You are a helpful assistant.',
  pipelineMode: 'composed',
  language: 'ko-KR',
});

client.audioStart();
// Send/receive audio the same way as live mode

Live vs Composed

| | Live | Composed | |---|---|---| | Latency | ~300ms | ~1-2s | | Pipeline | Direct audio-to-audio (Gemini Live) | STT → LLM → TTS | | Accuracy | Good | Higher | | aiSpeaksFirst | ✅ Supported | ❌ Not supported | | tools (function calling) | ✅ Supported | ❌ Not supported | | Output sample rate | 24,000 Hz | 16,000 Hz | | Barge-in | Automatic (Gemini VAD) | Automatic |

Note: All other SDK methods and events work identically in both modes. The only code change is adding pipelineMode: 'composed' to your session config.

Event Correlation (turnId)

In Composed mode, all events include a turnId field (monotonic counter starting from 0). Events sharing the same turnId belong to the same speech turn — use this to match userTranscript, response, audio, and turnComplete events together. In Live mode, turnId is not present.

client.on('userTranscript', (e) => {
    console.log(`Turn ${e.turnId}: User said '${e.text}'`);
});
client.on('response', (e) => {
    if (e.isFinal) console.log(`Turn ${e.turnId}: AI responded '${e.text}'`);
});
client.on('turnComplete', (e) => {
    console.log(`Turn ${e.turnId} complete`);
});

Advanced API

Optional features for power users.

Additional Methods

| Method | Description | |--------|-------------| | audioStart() / audioEnd() | Manual audio stream control | | interrupt() | Explicitly stop AI response (for Stop button) | | sendSystemMessage(msg) | Inject context during conversation | | sendToolResponse(id, result) | Reply to function calls | | updateUserId(userId) | Migrate guest to authenticated user |

Additional Events

| Event | Description | |-------|-------------| | connected / disconnected | Connection lifecycle | | sessionStarted / sessionEnded | Session lifecycle | | ready | Session ready for audio | | userTranscript | User's speech transcribed | | response | AI's response text | | toolCall | AI wants to call a function | | reconnecting | Auto-reconnection attempt | | userIdUpdated | Guest-to-user migration complete | | sessionWarning | Session nearing duration limit | | sessionGoodbye | Session about to end |


Explicit Interrupt (Stop Button)

For UI "Stop" buttons or programmatic control:

// User clicks Stop button
client.interrupt();

Note: Voice barge-in works automatically via Gemini's VAD. This method is for explicit control.


System Messages

Inject text context during live sessions (game events, app state, etc.):

// AI responds immediately
client.sendSystemMessage("User completed level 5. Congratulate them!");

// Context only, no response
client.sendSystemMessage({ text: "User is browsing", triggerResponse: false });

Requires active live session (audioStart() called). Max 500 characters.


Function Calling (Tool Use)

Let AI call functions in your app:

1. Define Tools

const tools = [{
  name: 'get_price',
  description: 'Gets product price by ID',
  parameters: {
    type: 'OBJECT',
    properties: { productId: { type: 'string' } },
    required: ['productId']
  }
}];

await client.startSession({
  prePrompt: 'You are helpful.',
  tools,
});

2. Handle toolCall Events

client.on('toolCall', (event) => {
  if (event.name === 'get_price') {
    const price = lookupPrice(event.args.productId);
    client.sendToolResponse(event.id, { price });
  }
});

Conversation Memory

Enable persistent memory across sessions:

const client = new LiveSpeechClient({
  region: 'ap-northeast-2',
  apiKey: 'your-api-key',
  userId: 'user-123',  // Enables memory
});

| Mode | Memory | |------|--------| | With userId | Permanent (entities, summaries) | | Without userId | Session only (guest) |

Guest-to-User Migration

// User logs in during session
await client.updateUserId('authenticated-user-123');

// Listen for confirmation
client.on('userIdUpdated', (event) => {
  console.log(`Migrated ${event.migratedMessages} messages`);
});

AI Speaks First

AI initiates the conversation:

await client.startSession({
  prePrompt: 'Greet the customer warmly.',
  aiSpeaksFirst: true,
});

client.audioStart();  // AI speaks immediately

Session Options

| Option | Default | Description | |--------|---------|-------------| | prePrompt | - | System prompt | | language | 'en-US' | Language code | | outputLanguage | - | TTS voice language override (composed mode only) | | pipelineMode | 'live' | 'live' (~300ms) or 'composed' (~1-2s) | | aiSpeaksFirst | false | AI initiates (live mode only) | | allowHarmCategory | false | Disable safety filters | | tools | [] | Function definitions | | sessionDuration | - | Enables session duration limits when provided |

Notes

  • Duration checks are disabled by default. They activate only when sessionDuration is provided.
  • If only sessionDuration.maxSeconds is provided, enableWarning/enableGoodbye default to false in the SDK.
  • Server limits take precedence in production.

Browser Example

import { LiveSpeechClient, float32ToInt16, int16ToUint8 } from '@drawdream/livespeech';

// Capture microphone
const stream = await navigator.mediaDevices.getUserMedia({
  audio: { sampleRate: 16000, channelCount: 1 }
});

const audioContext = new AudioContext({ sampleRate: 16000 });
const source = audioContext.createMediaStreamSource(stream);
const processor = audioContext.createScriptProcessor(4096, 1, 1);

processor.onaudioprocess = (e) => {
  const float32 = e.inputBuffer.getChannelData(0);
  const int16 = float32ToInt16(float32);
  const pcm = int16ToUint8(int16);
  client.sendAudioChunk(pcm);
};

source.connect(processor);
processor.connect(audioContext.destination);

Audio Utilities

import { float32ToInt16, int16ToUint8, wrapPcmInWav } from '@drawdream/livespeech';

const int16 = float32ToInt16(float32Data);
const bytes = int16ToUint8(int16);
const wav = wrapPcmInWav(bytes, { sampleRate: 16000, channels: 1, bitDepth: 16 });

Error Handling

client.on('error', (event) => {
  switch (event.code) {
    case 'authentication_failed': console.error('Invalid API key'); break;
    case 'connection_timeout': console.error('Timed out'); break;
    default: console.error(`Error: ${event.message}`);
  }
});

client.on('reconnecting', (event) => {
  console.log(`Reconnecting ${event.attempt}/${event.maxAttempts}`);
});

Regions

| Region | Code | |--------|------| | Seoul (Korea) | ap-northeast-2 |

License

MIT