npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2025 – Pkg Stats / Ryan Hefner

turncut

v0.0.3

Published

A real-time speech interruption detection library designed specifically for LLM voice agents and phone call applications. TurnCut enables AI assistants to detect when a caller starts speaking so they can immediately stop their own text-to-speech output an

Downloads

12

Readme

TurnCut

A real-time speech interruption detection library designed specifically for LLM voice agents and phone call applications. TurnCut enables AI assistants to detect when a caller starts speaking so they can immediately stop their own text-to-speech output and listen.

Purpose

In conversational AI systems, especially phone-based voice agents, natural conversation requires the ability to detect when a human user begins speaking during the AI's response. This "barge-in" or interruption detection is crucial for:

  • Natural conversation flow: Users expect to be able to interrupt the AI just like they would interrupt a human
  • Reduced latency: Immediate response to user speech instead of waiting for AI to finish speaking
  • Better user experience: Prevents the AI from talking over the user
  • Efficient bandwidth usage: Stops unnecessary TTS audio transmission

Features

  • Real-time detection: Optimized for 20ms audio frames (telephony standard)
  • Twilio-ready: Native support for 8kHz μ-law encoded audio streams
  • Adaptive noise floor: Automatically adjusts to background noise using rolling median
  • Multi-feature fusion: Combines speech-band energy, spectral flux, and zero-crossing rate
  • Hysteresis thresholding: Prevents false positives from brief noise spikes
  • Low CPU overhead: Efficient FFT-based processing with minimal memory allocation
  • Configurable: Tunable parameters for different environments and sensitivity requirements

Installation

# Using Bun (recommended)
bun add turncut

# Using npm
npm install turncut

# Using yarn
yarn add turncut

Quick Start

Basic Usage with Twilio Media Streams

import { SpeechDetector } from 'turncut'

// Initialize detector for Twilio's default format (8kHz μ-law)
const detector = new SpeechDetector({
  sampleRate: 8000,
  encoding: 'mulaw'
})

// Handle incoming audio chunks from Twilio
ws.on('message', (data) => {
  const message = JSON.parse(data)
  
  if (message.event === 'media') {
    // Decode base64 audio data
    const audioBuffer = Buffer.from(message.media.payload, 'base64')
    
    // Detect speech onset
    const speechStarted = detector.detectSpeechOnset(audioBuffer)
    
    if (speechStarted) {
      console.log('🎙️ User started speaking - stopping TTS')
      // Stop your TTS output here
      stopTextToSpeech()
      // Optionally clear audio output buffer
      clearAudioBuffer()
    }
  }
})

Advanced Configuration

import { SpeechDetector } from 'turncut'

const detector = new SpeechDetector({
  sampleRate: 16000,        // Higher quality audio
  encoding: 'pcm16',        // 16-bit PCM instead of μ-law
  medianWindowFrames: 75    // 1.5 second noise floor window
})

// Reset detector state for new calls
detector.reset()

// Process audio in a loop
while (audioStream.isActive) {
  const audioChunk = await audioStream.read()
  const interrupted = detector.detectSpeechOnset(audioChunk)
  
  if (interrupted) {
    await handleUserInterruption()
  }
}

API Reference

SpeechDetector

The main class for speech detection.

Constructor Options

interface SpeechDetectorOpts {
  sampleRate?: number        // Audio sample rate (default: 8000)
  encoding?: 'mulaw' | 'pcm16'  // Audio encoding (default: 'mulaw')
  medianWindowFrames?: number   // Frames for noise floor calculation (default: 50)
}

Methods

detectSpeechOnset(buffer: Buffer): boolean

Processes an audio chunk and returns true exactly when speech begins.

  • Parameters:
    • buffer: Raw audio data (μ-law bytes or 16-bit PCM-LE)
  • Returns: true on speech onset, false otherwise
  • Notes:
    • Only processes the first frame worth of data
    • Requires at least 20ms of audio data
    • Returns true only once per speech segment
reset(): void

Resets the detector's internal state. Use this when starting a new call or conversation.

Audio Format Support

| Format | Sample Rate | Encoding | Use Case | |--------|-------------|----------|----------| | Twilio Default | 8kHz | μ-law | Phone calls via Twilio | | High Quality | 16kHz | PCM-16 | Local/high-quality audio | | Custom | Any | μ-law/PCM-16 | Custom telephony systems |

How It Works

TurnCut uses a sophisticated multi-feature approach to detect speech onset:

1. Signal Preprocessing

  • Pre-emphasis: Boosts high-frequency content (1-4kHz) where speech intelligibility lives
  • Windowing: Applies Hann window to reduce spectral leakage
  • FFT: Converts time-domain signal to frequency domain for analysis

2. Feature Extraction

  • Speech-band Energy Ratio: Measures energy in 300-3400Hz range vs. total energy
  • Spectral Flux: Detects frame-to-frame changes in spectrum (onset-sensitive)
  • Zero-crossing Rate: Captures high-frequency activity patterns

3. Adaptive Thresholding

  • Rolling Median: Continuously estimates background noise floor
  • Hysteresis: Uses separate thresholds for speech start/stop to prevent jitter
  • Onset Confirmation: Requires multiple consecutive frames before triggering

4. Decision Logic

Speech Score = 0.6 × Band Ratio + 0.3 × Spectral Flux + 0.1 × ZCR
Speech Detected = Score > (Noise Floor + Hysteresis Threshold)

Performance Characteristics

  • Latency: 20-60ms (1-3 frames) detection delay
  • CPU Usage: ~1-2% on modern hardware for 8kHz audio
  • Memory: <1MB working set per detector instance (with default window size)

Integration Examples

Express.js + WebSocket Server

import express from 'express'
import WebSocket from 'ws'
import { SpeechDetector } from 'turncut'

const app = express()
const wss = new WebSocket.Server({ port: 8080 })

wss.on('connection', (ws) => {
  const detector = new SpeechDetector()
  
  ws.on('message', (data) => {
    const audioBuffer = Buffer.from(data)
    const speechStarted = detector.detectSpeechOnset(audioBuffer)
    
    if (speechStarted) {
      ws.send(JSON.stringify({ 
        event: 'speech_detected',
        timestamp: Date.now()
      }))
    }
  })
})

Node.js Twilio Function

import { SpeechDetector } from 'turncut'

const detector = new SpeechDetector({
  sampleRate: 8000,
  encoding: 'mulaw'
})

export const handler = (context, event, callback) => {
  const audioData = Buffer.from(event.media.payload, 'base64')
  
  if (detector.detectSpeechOnset(audioData)) {
    // Interrupt current TTS
    return callback(null, {
      event: 'interrupt',
      streamSid: event.streamSid
    })
  }
  
  callback(null, { event: 'continue' })
}

Troubleshooting

Common Issues

No speech detected despite audio input

  • Verify audio format matches detector configuration
  • Check if audio volume is sufficient (>40dB SNR)
  • Ensure audio chunks are at least 20ms worth of data

Too many false positives

  • Increase medianWindowFrames for longer noise floor averaging
  • Add additional pre-filtering for known noise sources
  • Consider adjusting hysteresis thresholds

Debug Mode

Enable debug logging to see internal detector state:

process.env.DEBUG_SPEECH = true

License

This project is licensed under the MIT License

Built with ❤️ by Mike Vegeto