npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

vocal-stack

v1.0.2

Published

High-performance utility library for Voice AI agents - text sanitization, flow control, and latency monitoring

Readme

vocal-stack

npm version npm downloads License: MIT TypeScript Node.js Version

High-performance utility library for Voice AI agents

Text sanitization • Flow control • Latency monitoring

Quick StartExamplesDocumentationAPI Reference


Overview

vocal-stack solves the "last mile" challenges when building production-ready voice AI agents:

  • 🧹 Text Sanitization - Clean LLM output for TTS (remove markdown, URLs, code)
  • Flow Control - Handle latency with smart filler injection ("um", "let me think")
  • 📊 Latency Monitoring - Track performance metrics (TTFT, duration, percentiles)

Key Features:

  • 🚀 Platform-agnostic (works with any LLM/TTS)
  • 📦 Composable modules (use independently or together)
  • 🌊 Streaming-first with minimal TTFT
  • 💪 TypeScript strict mode with 90%+ test coverage
  • 🎯 Production-ready with error handling
  • 🔌 Tree-shakeable imports

Why vocal-stack?

Without vocal-stack ❌

const stream = await openai.chat.completions.create({...});
let text = '';
for await (const chunk of stream) {
  text += chunk.choices[0]?.delta?.content || '';
}
await convertToSpeech(text); // Markdown, URLs included! 😱

Problems:

  • ❌ Awkward silences during LLM processing
  • ❌ Markdown symbols spoken aloud ("hash hello", "asterisk bold")
  • ❌ URLs spoken character by character
  • ❌ No performance tracking
  • ❌ Manual error handling

With vocal-stack ✅

import { SpeechSanitizer, FlowController, VoiceAuditor } from 'vocal-stack';

const pipeline = auditor.track(
  'req-123',
  flowController.wrap(
    sanitizer.sanitizeStream(llmStream)
  )
);

for await (const chunk of pipeline) {
  await sendToTTS(chunk); // Clean, speakable text! ✨
}

Benefits:

  • ✅ Natural fillers during stalls
  • ✅ Clean, speakable text
  • ✅ Automatic performance tracking
  • ✅ Composable pipeline
  • ✅ Production-ready

Comparison Table

| Feature | Without vocal-stack | With vocal-stack | |---------|-------------------|-----------------| | Markdown handling | Spoken aloud | ✅ Stripped | | URL handling | Spoken character-by-char | ✅ Removed | | Awkward pauses | Silent stalls | ✅ Natural fillers | | Performance tracking | Manual logging | ✅ Automatic metrics | | Barge-in support | Complex state management | ✅ Built-in | | Setup time | Hours of boilerplate | ✅ Minutes |


Installation

npm install vocal-stack
yarn add vocal-stack
pnpm add vocal-stack

Requirements: Node.js 18+


Quick Start

1️⃣ Text Sanitization

Clean LLM output for TTS:

import { sanitizeForSpeech } from 'vocal-stack';

const markdown = '## Hello World\nCheck out [this link](https://example.com)';
const speakable = sanitizeForSpeech(markdown);
// Output: "Hello World Check out this link"

2️⃣ Flow Control

Handle latency with natural fillers:

import { withFlowControl } from 'vocal-stack';

for await (const chunk of withFlowControl(llmStream)) {
  sendToTTS(chunk);
}
// Automatically injects "um" or "let me think" during stalls!

3️⃣ Latency Monitoring

Track performance metrics:

import { VoiceAuditor } from 'vocal-stack';

const auditor = new VoiceAuditor();

for await (const chunk of auditor.track('request-123', llmStream)) {
  sendToTTS(chunk);
}

console.log(auditor.getSummary());
// { avgTimeToFirstToken: 150ms, p95: 300ms, ... }

4️⃣ Full Pipeline (All Together)

Compose all three modules:

import { SpeechSanitizer, FlowController, VoiceAuditor } from 'vocal-stack';

const sanitizer = new SpeechSanitizer({ rules: ['markdown', 'urls'] });
const flowController = new FlowController({
  stallThresholdMs: 700,
  onFillerInjected: (filler) => sendToTTS(filler),
});
const auditor = new VoiceAuditor({ enableRealtime: true });

// LLM → Sanitize → Flow Control → Monitor → TTS
async function processVoiceStream(llmStream: AsyncIterable<string>) {
  const sanitized = sanitizer.sanitizeStream(llmStream);
  const controlled = flowController.wrap(sanitized);
  const monitored = auditor.track('req-123', controlled);

  for await (const chunk of monitored) {
    await sendToTTS(chunk);
  }

  console.log('Performance:', auditor.getSummary());
}

Examples

We've created 7 comprehensive examples to help you get started:

| Example | Description | Best For | |---------|-------------|----------| | 01-basic-sanitizer | Text sanitization basics | Getting started | | 02-flow-control | Latency handling & fillers | Natural conversations | | 03-monitoring | Performance tracking | Optimization | | 04-full-pipeline | All modules together | Understanding composition | | 05-openai-tts | Real OpenAI integration | Building with OpenAI | | 06-elevenlabs-tts | Real ElevenLabs integration | Premium voice quality | | 07-custom-voice-agent | Production-ready agent | Production apps |

View All Examples →


🎮 Try It Online

Play with vocal-stack in your browser - no installation needed!

| Demo | What it shows | Try it | |------|---------------|--------| | Text Sanitizer | Clean markdown, URLs for TTS | Open Demo → | | Flow Control | Filler injection & latency handling | Open Demo → | | Full Pipeline | All three modules together | Open Demo → |

View All Demos →


Quick Example: OpenAI Integration

import OpenAI from 'openai';
import { SpeechSanitizer, FlowController } from 'vocal-stack';

const openai = new OpenAI();
const sanitizer = new SpeechSanitizer();
const flowController = new FlowController();

async function* getLLMStream(prompt: string) {
  const stream = await openai.chat.completions.create({
    model: 'gpt-4',
    messages: [{ role: 'user', content: prompt }],
    stream: true,
  });

  for await (const chunk of stream) {
    const content = chunk.choices[0]?.delta?.content;
    if (content) yield content;
  }
}

// Process and send to TTS
const pipeline = flowController.wrap(
  sanitizer.sanitizeStream(getLLMStream('Hello!'))
);

let fullText = '';
for await (const chunk of pipeline) {
  fullText += chunk;
}

// Convert to speech with OpenAI TTS
const mp3 = await openai.audio.speech.create({
  model: 'tts-1',
  voice: 'alloy',
  input: fullText,
});

Use Cases

vocal-stack is perfect for building:

🎙️ Voice Assistants

Build natural-sounding voice assistants (Alexa-like experiences)

💬 Customer Service Bots

AI phone agents that sound professional and natural

🎓 Educational AI Tutors

Interactive voice tutors for learning

🎮 Gaming NPCs

Voice-enabled game characters with realistic conversation flow

♿ Accessibility Tools

Screen readers and voice interfaces for disabled users

🎧 Content Creation

Convert blog posts, documentation to high-quality audio

🏠 Smart Home Devices

Custom voice assistants for IoT devices

📞 IVR Systems

Professional phone systems with AI voice agents


Features

🧹 Text Sanitizer

Transform LLM output into TTS-optimized strings

Built-in Rules:

  • ✅ Strip markdown (# HelloHello)
  • ✅ Remove URLs (https://example.com → ``)
  • ✅ Clean code blocks (```code``` → ``)
  • ✅ Normalize punctuation (Hello!!!Hello)

Features:

  • Sync and streaming APIs
  • Plugin-based extensibility
  • Custom replacements
  • Sentence boundary detection
const sanitizer = new SpeechSanitizer({
  rules: ['markdown', 'urls', 'code-blocks', 'punctuation'],
  customReplacements: new Map([['https://', 'link at ']]),
});

// Streaming
for await (const chunk of sanitizer.sanitizeStream(llmStream)) {
  console.log(chunk);
}

⚡ Flow Control

Manage latency with intelligent filler injection

Features:

  • 🕐 Detect stream stalls (default 700ms threshold)
  • 💬 Inject filler phrases ("um", "let me think", "hmm")
  • 🛑 Barge-in support (user interruption)
  • 🔄 State machine (idle → waiting → speaking → interrupted)
  • 📦 Buffer management for resume/replay
  • 🎛️ Dual API (high-level + low-level)

Important Rule: Fillers are ONLY injected before the first chunk. After first chunk is sent, no more fillers (natural flow).

const controller = new FlowController({
  stallThresholdMs: 700,
  fillerPhrases: ['um', 'let me think', 'hmm'],
  enableFillers: true,
  onFillerInjected: (filler) => sendToTTS(filler),
});

for await (const chunk of controller.wrap(llmStream)) {
  sendToTTS(chunk);
}

// Barge-in support
userInterrupted && controller.interrupt();

📊 Latency Monitoring

Track and profile voice agent performance

Metrics Tracked:

  • ⏱️ Time to First Token (TTFT)
  • 📈 Total duration
  • 🔢 Token count
  • 📊 Average token latency

Statistics:

  • 📐 Percentiles (p50, p95, p99)
  • 📊 Averages across requests
  • 📁 Export (JSON, CSV)
  • 🔴 Real-time callbacks
const auditor = new VoiceAuditor({
  enableRealtime: true,
  onMetric: (metric) => {
    console.log(`TTFT: ${metric.metrics.timeToFirstToken}ms`);
  },
});

for await (const chunk of auditor.track('req-123', llmStream)) {
  sendToTTS(chunk);
}

const summary = auditor.getSummary();
// {
//   count: 10,
//   avgTimeToFirstToken: 150,
//   p50TimeToFirstToken: 120,
//   p95TimeToFirstToken: 300,
//   p99TimeToFirstToken: 450,
//   avgTotalDuration: 2000,
//   ...
// }

// Export for analysis
const json = auditor.export('json');
const csv = auditor.export('csv');

API Overview

Sanitizer Module

Quick API:

import { sanitizeForSpeech } from 'vocal-stack';

const clean = sanitizeForSpeech(text); // One-liner

Class API:

import { SpeechSanitizer } from 'vocal-stack';

const sanitizer = new SpeechSanitizer({
  rules: ['markdown', 'urls', 'code-blocks', 'punctuation'],
  customReplacements: new Map([['https://', 'link']]),
});

// Sync
const result = sanitizer.sanitize(text);

// Streaming
for await (const chunk of sanitizer.sanitizeStream(llmStream)) {
  console.log(chunk);
}

Subpath Import (Tree-shakeable):

import { SpeechSanitizer } from 'vocal-stack/sanitizer';

Flow Module

High-Level API:

import { FlowController, withFlowControl } from 'vocal-stack';

// Convenience function
for await (const chunk of withFlowControl(llmStream)) {
  sendToTTS(chunk);
}

// Class-based
const controller = new FlowController({
  stallThresholdMs: 700,
  fillerPhrases: ['um', 'let me think'],
  enableFillers: true,
  onFillerInjected: (filler) => sendToTTS(filler),
});

for await (const chunk of controller.wrap(llmStream)) {
  sendToTTS(chunk);
}

// Barge-in
controller.interrupt();

Low-Level API (Event-Based):

import { FlowManager } from 'vocal-stack';

const manager = new FlowManager({ stallThresholdMs: 700 });

manager.on((event) => {
  switch (event.type) {
    case 'stall-detected':
      console.log(`Stalled for ${event.durationMs}ms`);
      break;
    case 'filler-injected':
      sendToTTS(event.filler);
      break;
    case 'state-change':
      console.log(`${event.from} → ${event.to}`);
      break;
  }
});

manager.start();
for await (const chunk of llmStream) {
  manager.processChunk(chunk);
  sendToTTS(chunk);
}
manager.complete();

Subpath Import:

import { FlowController } from 'vocal-stack/flow';

Monitor Module

import { VoiceAuditor } from 'vocal-stack';

const auditor = new VoiceAuditor({
  enableRealtime: true,
  onMetric: (metric) => console.log(metric),
});

// Automatic tracking
for await (const chunk of auditor.track('req-123', llmStream)) {
  sendToTTS(chunk);
}

// Manual tracking
auditor.startTracking('req-456');
// ... processing ...
auditor.recordToken('req-456');
// ... more processing ...
const metric = auditor.completeTracking('req-456');

// Get statistics
const summary = auditor.getSummary();

// Export
const json = auditor.export('json');
const csv = auditor.export('csv');

Subpath Import:

import { VoiceAuditor } from 'vocal-stack/monitor';

Architecture

vocal-stack is built with three independent, composable modules:

┌─────────────────────────────────────────────────────────┐
│                    Voice Pipeline                       │
├─────────────────────────────────────────────────────────┤
│                                                         │
│  ┌──────┐   ┌──────────┐   ┌──────┐   ┌─────────┐    │
│  │ LLM  │ → │Sanitizer │ → │ Flow │ → │ Monitor │    │
│  │Stream│   │(clean    │   │(fill-│   │(metrics)│    │
│  └──────┘   │text)     │   │ers)  │   └─────────┘    │
│             └──────────┘   └──────┘        │          │
│                                             ↓          │
│                                          ┌─────┐      │
│                                          │ TTS │      │
│                                          └─────┘      │
└─────────────────────────────────────────────────────────┘

Each module:

  • ✅ Works standalone
  • ✅ Composes seamlessly
  • ✅ Fully typed (TypeScript)
  • ✅ Well-tested (90%+ coverage)
  • ✅ Production-ready

Use only what you need:

// Just sanitization
import { SpeechSanitizer } from 'vocal-stack/sanitizer';

// Just flow control
import { FlowController } from 'vocal-stack/flow';

// Just monitoring
import { VoiceAuditor } from 'vocal-stack/monitor';

// All together
import { SpeechSanitizer, FlowController, VoiceAuditor } from 'vocal-stack';

Platform Support

vocal-stack is platform-agnostic and works with any LLM or TTS provider:

Tested With

LLMs:

  • ✅ OpenAI (GPT-4, GPT-3.5)
  • ✅ Anthropic Claude
  • ✅ Google Gemini
  • ✅ Local LLMs (Ollama, LM Studio)
  • ✅ Any streaming text API

TTS:

  • ✅ OpenAI TTS
  • ✅ ElevenLabs
  • ✅ Google Cloud TTS
  • ✅ Azure TTS
  • ✅ AWS Polly
  • ✅ Any TTS provider

Node.js:

  • ✅ Node.js 18+
  • ✅ Node.js 20+
  • ✅ Node.js 22+

Module Systems:

  • ✅ ESM (import/export)
  • ✅ CommonJS (require)
  • ✅ TypeScript
  • ✅ JavaScript

Performance

vocal-stack adds minimal overhead to your voice pipeline:

| Operation | Overhead | Impact | |-----------|----------|--------| | Text sanitization | < 1ms per chunk | Negligible | | Flow control | < 1ms per chunk | Negligible | | Monitoring | < 0.5ms per chunk | Negligible | | Total | ~2-3ms per chunk | ✅ Negligible |

For a typical voice response (50 chunks), total overhead is ~100-150ms.

Benchmarks:

  • ✅ Handles 1000+ chunks/second
  • ✅ Memory efficient (streaming-based)
  • ✅ No blocking operations
  • ✅ Fully async/await compatible

Documentation

Quick Links

Examples

| Example | Description | Code | |---------|-------------|------| | Basic Sanitizer | Text cleaning basics | View → | | Flow Control | Latency & fillers | View → | | Monitoring | Performance tracking | View → | | Full Pipeline | All modules together | View → | | OpenAI Integration | Real OpenAI usage | View → | | ElevenLabs Integration | Real ElevenLabs usage | View → | | Custom Agent | Production-ready agent | View → |


FAQ

When should I use vocal-stack?

Use vocal-stack when building voice AI applications that need:

  • Clean, speakable text from LLM output
  • Natural handling of streaming delays
  • Performance monitoring and optimization
  • Production-ready code patterns

Do I need to use all three modules?

No! Each module works independently:

  • Use just Sanitizer if you only need text cleaning
  • Use just Flow Control if you only need latency handling
  • Use just Monitor if you only need metrics
  • Or use all three for complete functionality

Does it work with my LLM/TTS provider?

Yes! vocal-stack is platform-agnostic and works with any:

  • LLM that provides streaming text (OpenAI, Claude, Gemini, local LLMs)
  • TTS provider (OpenAI, ElevenLabs, Google, Azure, AWS, custom)

How much overhead does it add?

Very minimal (~2-3ms per chunk). See Performance for details.

Is it production-ready?

Yes! vocal-stack is:

  • ✅ TypeScript strict mode
  • ✅ 90%+ test coverage
  • ✅ Used in production applications
  • ✅ Well-documented
  • ✅ Actively maintained

Can I customize sanitization rules?

Yes! You can:

  • Choose which built-in rules to apply
  • Add custom replacements
  • Create custom plugins (coming soon)

Contributing

Contributions are welcome! Here's how you can help:

Ways to Contribute

  • 🐛 Report bugs by opening an issue
  • 💡 Suggest features or improvements
  • 📖 Improve documentation
  • 🧪 Add tests
  • 💻 Submit pull requests
  • ⭐ Star the repo to show support

Development Setup

# Clone the repo
git clone https://github.com/gaurav890/vocal-stack.git
cd vocal-stack

# Install dependencies
npm install

# Run tests
npm test

# Run tests in watch mode
npm run test:watch

# Run tests with coverage
npm run test:coverage

# Lint code
npm run lint

# Type check
npm run typecheck

# Build
npm run build

Guidelines

  • Follow existing code style
  • Add tests for new features
  • Update documentation
  • Keep commits atomic and descriptive

License

MIT © [Your Name]

See LICENSE for details.


Support


Acknowledgments

Built with:


Made with ❤️ for the Voice AI community

⬆ Back to top