npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

@kond.studio/voicekit

v0.7.0

Published

Voice SDK for AI agents - STT, TTS, Turn Detection

Downloads

1,090

Readme

 ██╗  ██╗ ██████╗ ███╗   ██╗██████╗
 ██║ ██╔╝██╔═══██╗████╗  ██║██╔══██╗
 █████╔╝ ██║   ██║██╔██╗ ██║██║  ██║
 ██╔═██╗ ██║   ██║██║╚██╗██║██║  ██║
 ██║  ██╗╚██████╔╝██║ ╚████║██████╔╝
 ╚═╝  ╚═╝ ╚═════╝ ╚═╝  ╚═══╝╚═════╝

 ██╗   ██╗ ██████╗ ██╗ ██████╗███████╗██╗  ██╗██╗████████╗
 ██║   ██║██╔═══██╗██║██╔════╝██╔════╝██║ ██╔╝██║╚══██╔══╝
 ██║   ██║██║   ██║██║██║     █████╗  █████╔╝ ██║   ██║
 ╚██╗ ██╔╝██║   ██║██║██║     ██╔══╝  ██╔═██╗ ██║   ██║
  ╚████╔╝ ╚██████╔╝██║╚██████╗███████╗██║  ██╗██║   ██║
   ╚═══╝   ╚═════╝ ╚═╝ ╚═════╝╚══════╝╚═╝  ╚═╝╚═╝   ╚═╝

@kond.studio/voicekit

Voice I/O for AI agents — You bring your own LLM

VoiceKit handles STT, TTS, and turn detection. Your AI handles intelligence.

Zero LLM lock-in. Zero markup on your AI calls.


Installation

npm install @kond.studio/voicekit
# or
pnpm add @kond.studio/voicekit
# or
yarn add @kond.studio/voicekit

Why VoiceKit?

Most voice AI platforms (Vapi, Retell, Hume) want to own your entire stack — including your LLM. They proxy your AI calls and charge markup on every token.

VoiceKit is different:

┌─────────────────────────────────────────────────────────────────┐
│  OTHER PLATFORMS                                                 │
│                                                                  │
│  Your App ──► Platform ──► LLM (their proxy) ──► Platform ──► App│
│                    └── 15-30% markup on tokens ──┘               │
│                                                                  │
│  Lock-in: Limited LLM choices, migration required to switch     │
└─────────────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────────┐
│  VOICEKIT                                                        │
│                                                                  │
│  Your App ──► VoiceKit ──► Your App ──► YOUR LLM (direct)       │
│                  │                           │                   │
│               Voice I/O                   Your choice            │
│          (STT, TTS, Turn)           (Claude/GPT/Gemini/...)     │
│                                                                  │
│  Freedom: Any LLM, switch anytime, pay your provider directly   │
└─────────────────────────────────────────────────────────────────┘

Early Access

VoiceKit is currently in early access.

Get your free API key at kond.studio/developers/voicekit/keys.

Free tier includes 10 minutes/month to test.


Quick Start

With Claude (Anthropic)

import { VoiceKit } from '@kond.studio/voicekit';
import Anthropic from '@anthropic-ai/sdk';

const anthropic = new Anthropic();

const voice = new VoiceKit({
  apiKey: 'vk_xxxxxxxxxxxx',
  locale: 'en',
  onTranscript: async (text) => {
    const response = await anthropic.messages.create({
      model: 'claude-sonnet-4-20250514',
      max_tokens: 1024,
      messages: [{ role: 'user', content: text }],
    });
    voice.speak(response.content[0].text);
  },
});

await voice.start();

With GPT (OpenAI)

import { VoiceKit } from '@kond.studio/voicekit';
import OpenAI from 'openai';

const openai = new OpenAI();

const voice = new VoiceKit({
  apiKey: 'vk_xxxxxxxxxxxx',
  locale: 'en',
  onTranscript: async (text) => {
    const response = await openai.chat.completions.create({
      model: 'gpt-4o',
      messages: [{ role: 'user', content: text }],
    });
    voice.speak(response.choices[0].message.content);
  },
});

await voice.start();

With Gemini (Google)

import { VoiceKit } from '@kond.studio/voicekit';
import { GoogleGenerativeAI } from '@google/generative-ai';

const genAI = new GoogleGenerativeAI(process.env.GEMINI_API_KEY);
const model = genAI.getGenerativeModel({ model: 'gemini-pro' });

const voice = new VoiceKit({
  apiKey: 'vk_xxxxxxxxxxxx',
  locale: 'en',
  onTranscript: async (text) => {
    const result = await model.generateContent(text);
    voice.speak(result.response.text());
  },
});

await voice.start();

With Mistral

import { VoiceKit } from '@kond.studio/voicekit';
import { Mistral } from '@mistralai/mistralai';

const mistral = new Mistral();

const voice = new VoiceKit({
  apiKey: 'vk_xxxxxxxxxxxx',
  locale: 'en',
  onTranscript: async (text) => {
    const response = await mistral.chat.complete({
      model: 'mistral-large-latest',
      messages: [{ role: 'user', content: text }],
    });
    voice.speak(response.choices[0].message.content);
  },
});

await voice.start();

With Ollama (Local LLM)

import { VoiceKit } from '@kond.studio/voicekit';
import { Ollama } from 'ollama';

const ollama = new Ollama();

const voice = new VoiceKit({
  apiKey: 'vk_xxxxxxxxxxxx',
  locale: 'en',
  onTranscript: async (text) => {
    const response = await ollama.chat({
      model: 'llama3',
      messages: [{ role: 'user', content: text }],
    });
    voice.speak(response.message.content);
  },
});

await voice.start();

Two API Keys

VoiceKit uses a simple two-key model:

| Key | Purpose | Who provides it | |-----|---------|-----------------| | vk_xxx | Voice I/O (STT, TTS, turn detection) | VoiceKit | | ANTHROPIC_API_KEY / OPENAI_API_KEY / etc. | Your LLM | You (direct from provider) |

VoiceKit never sees your LLM calls. The onTranscript callback is invisible to me — I only handle voice.


Security Considerations

API Key Storage

DO:

  • Store VOICEKIT_API_KEY in environment variables (.env.local)
  • Store LLM API keys server-side when possible
  • Use short-lived tokens for client-side if needed

DON'T:

  • Never hardcode API keys in source code
  • Never store API keys in localStorage or cookies
  • Never log API keys or transcripts in production

HTTPS

VoiceKit enforces HTTPS in production. HTTP is only allowed for localhost development.

// This will throw in production:
new VoiceKit({ baseUrl: 'http://insecure.example.com' }); // Error!

// OK in development:
new VoiceKit({ baseUrl: 'http://localhost:3000' }); // Works

Data Privacy

  • Transcripts are processed in real-time and not stored by VoiceKit
  • Audio is streamed to STT provider (Deepgram) and not retained
  • LLM calls go directly to your provider — VoiceKit never sees them

React

import { useVoiceKit } from '@kond.studio/voicekit/react';

function VoiceChat() {
  const voice = useVoiceKit({
    apiKey: process.env.NEXT_PUBLIC_VOICEKIT_API_KEY,
    locale: 'en',
    onTranscript: async (text) => {
      const reply = await myLLM.chat(text);  // Your LLM, your choice
      voice.speak(reply);
    },
  });

  return (
    <button onClick={voice.isActive ? voice.stop : voice.start}>
      {voice.isActive ? 'Listening...' : 'Start Voice'}
    </button>
  );
}

Premium React Components

VoiceKit includes pre-built UI components for voice conversations with:

  • Glow effects for listening state
  • Ripple effects for speaking state
  • Barge-in indicators for interruptions
  • Real-time waveform visualization

VoiceStatusIndicator

import { useVoiceKit, VoiceStatusIndicator } from '@kond.studio/voicekit/react';

function VoiceUI() {
  const voice = useVoiceKit({ ... });

  return (
    <VoiceStatusIndicator
      state={voice.state}
      isActive={voice.isActive}
      userSpeaking={voice.userSpeaking}
      showGlow={true}           // Glow effect for listening state
      wasInterrupted={false}    // Show interrupted state
    />
  );
}

VoiceWaveform

Real-time audio level visualization:

import { useVoiceKit, VoiceWaveform } from '@kond.studio/voicekit/react';

function VoiceUI() {
  const [audioLevel, setAudioLevel] = useState(0);

  const voice = useVoiceKit({
    ...config,
    onAudioLevel: (level) => setAudioLevel(level),
  });

  return (
    <VoiceWaveform
      isActive={voice.isActive && voice.userSpeaking}
      audioLevel={audioLevel}
      color="#00ff88"
      size="md"            // "sm" | "md" | "lg" | number
      barCount={5}
      showGlow={true}
    />
  );
}

Barge-in (Interruption) Handling

import { useVoiceKit } from '@kond.studio/voicekit/react';

function VoiceUI() {
  const [interrupted, setInterrupted] = useState(false);

  const voice = useVoiceKit({
    ...config,
    onInterruption: (context) => {
      // Called when user speaks while AI is talking
      console.log('User interrupted:', context?.interruptedText);
      setInterrupted(true);
      setTimeout(() => setInterrupted(false), 3000);
    },
  });

  return (
    <>
      {interrupted && <div className="badge">Interrupted!</div>}
      {/* ... */}
    </>
  );
}

VAD Meter

import { useVoiceKit, AnimatedVADMeter } from '@kond.studio/voicekit/react';

function VoiceUI() {
  const [audioLevel, setAudioLevel] = useState(0);

  const voice = useVoiceKit({
    ...config,
    onAudioLevel: (level) => setAudioLevel(level),
  });

  return (
    <AnimatedVADMeter
      level={audioLevel}
      isActive={voice.isActive}
      barCount={10}
      activeColor="#00ff88"
    />
  );
}

What's Included

VoiceKit handles the hard parts of voice:

| Feature | Description | |---------|-------------| | Speech-to-Text | Optimized STT with streaming transcription | | Text-to-Speech | Natural voices with gapless audio queue | | Turn Detection | ML-powered end-of-utterance detection with local ONNX or cloud | | Local ML | ONNX inference in browser (~25-50ms) for desktop devices | | VAD | Local voice activity detection | | Barge-in | Natural user interruption support | | Backchannels | Filters "mh", "yeah", "ok" (no LLM call) | | 9-state FSM | Battle-tested conversation orchestration |


Languages

SUPPORTED LOCALES
─────────────────
├─ "en"     English
├─ "fr"     French
└─ "multi"  Multilingual (auto-detects EN/FR/ES/DE/IT/PT/JA/NL/RU/HI)

Pricing

┌──────────────────────────────────────────────────────────────────────────┐
│  PLAN              │ MINUTES/MONTH      │ PRICE        │ OVERAGE         │
├────────────────────┼────────────────────┼──────────────┼─────────────────┤
│  Free              │ 10                 │ $0/mo        │ Blocked         │
│  Starter           │ 500                │ $49/mo       │ $0.08/min       │
│  Pro               │ 3000               │ $99/mo       │ $0.05/min       │
└──────────────────────────────────────────────────────────────────────────┘

Note: This is just VoiceKit pricing. Your LLM costs are separate and go directly to your provider.


vs Competition

┌─────────────────────────────────────────────────────────────────────────┐
│                           LLM COMPARISON                                 │
├──────────────┬───────────────┬───────────────┬───────────────┬──────────┤
│              │ VoiceKit      │ Vapi.ai       │ Retell.ai     │ Hume.ai  │
├──────────────┼───────────────┼───────────────┼───────────────┼──────────┤
│ LLM choice   │ ANY           │ Limited       │ Limited       │ EVI only │
│              │               │ (their proxy) │ (their proxy) │          │
├──────────────┼───────────────┼───────────────┼───────────────┼──────────┤
│ LLM markup   │ 0%            │ ~15-30%       │ ~10-25%       │ Included │
│              │ (pay direct)  │ on tokens     │ on tokens     │          │
├──────────────┼───────────────┼───────────────┼───────────────┼──────────┤
│ Local LLM    │ Yes           │ No            │ No            │ No       │
│ (Ollama)     │               │               │               │          │
├──────────────┼───────────────┼───────────────┼───────────────┼──────────┤
│ Switch LLM   │ 1 line change │ Migration     │ Migration     │ N/A      │
├──────────────┼───────────────┼───────────────┼───────────────┼──────────┤
│ See prompts? │ Never         │ Yes (proxy)   │ Yes (proxy)   │ Yes      │
└──────────────┴───────────────┴───────────────┴───────────────┴──────────┘

Turn Detection Modes

VoiceKit uses intelligent turn detection to know when you've finished speaking:

| Mode | Best For | Latency | |------|----------|---------| | auto (default) | Most apps — auto-selects based on device | ~25-200ms | | local | Desktop apps wanting lowest latency | ~25-50ms | | cloud | Mobile apps, consistent behavior | ~100-200ms | | heuristic | Offline, testing | ~1-5ms |

Auto Mode (Default)

Device capable (desktop 4GB+ RAM, WebAssembly, IndexedDB)?
├── YES → Local ONNX (~25-50ms)
└── NO  → Cloud API (~100-200ms)
// Force local ONNX for lowest latency
const voice = new VoiceKit({
  apiKey: 'vk_xxx',
  turnDetection: { type: 'local' },
  onTranscript: ...
});

// Force heuristic for offline use
const voice = new VoiceKit({
  apiKey: 'vk_xxx',
  turnDetection: { type: 'heuristic' },
  onTranscript: ...
});

Use Cases

  • Voice chatbots — Customer support, virtual assistants
  • Educational apps — AI tutors, language learning
  • Accessibility — Voice interfaces for visually impaired
  • Gaming — NPCs that talk with players
  • Autonomous agents — Agents that report back vocally

Documentation

Full documentation: kond.studio/developers/voicekit/docs


License

MIT — see LICENSE


Contributing

VoiceKit is extracted from KOND, a personal AI companion.

Issues and PRs welcome on GitHub.


┌──────────────────────────────────────────────────────────────────────────┐
│                                                                          │
│                           [ ^_^ ]                                        │
│                                                                          │
│             @kond.studio/voicekit — built with care                          │
│                                                                          │
│          Voice I/O for AI agents. You bring your own LLM.               │
│                                                                          │
│                            2025                                          │
│                                                                          │
└──────────────────────────────────────────────────────────────────────────┘