voipi

v0.0.12

Published

a month ago

Downloads

485

0High
0Medium
0Low

pi0

pi-extension pi-package text-to-speech tts voice voipi

Give your apps, CLIs, and agents a voice. VoiPi is a universal, zero-dependency, free text-to-speech library for JavaScript.

Pure JS, Zero deps, Less than 100kB total install size and 10kB bundled providers
No API keys required
Multiple providers: Browser TTS, macOS, Edge TTS, Google TTS, Piper, eSpeak NG
Auto fallback: Picks the best available provider per platform
Auto language detection: Detects script (Arabic, Farsi, CJK, Cyrillic, etc.) and Latin-script languages (French, Spanish, German, Portuguese, etc.) — picks the best voice automatically
MCP Server: Give AI agents a voice — auto install with Claude Code, Codex, Cursor, Windsurf, OpenCode and Pi.

Demo

CLI

You can use voipi directly with npx/pnpx/bunx.

# Speak text (auto-selects best available provider)
npx voipi 'The quick brown fox jumps over the lazy dog'
npx voipi speak 'Hello world'

# Choose a specific voice and speed
npx voipi 'Hi' -v en-US-BrianNeural -r 1.5

# Save to file instead of playing
npx voipi speak 'Hi' -o hello.mp3

# Use a specific provider
npx voipi 'Bonjour le monde' -p edge-tts -v fr-FR-DeniseNeural

# List available voices
npx voipi voices

# List voices for a specific provider
npx voipi voices -p edge-tts

# Start MCP server (stdio transport)
npx voipi mcp

MCP Server

VoiPi includes a built-in MCP server that exposes text-to-speech tools over the stdio transport. This lets AI agents and LLM clients speak text, save audio files, and list voices.

Auto-install to all detected agents:

npx voipi@latest mcp --install

Programmatic Usage

VoiPi automatically picks the best available provider with fallback chain (macOS → Edge TTS → Google TTS → Piper → eSpeak NG):

import { VoiPi } from "voipi";

const voice = new VoiPi();

// Speak text
await voice.speak("Hello world!");

// With a prioritized voice list (first available wins)
await voice.speak("Hello!", { voice: ["Samantha", "en-US-AriaNeural"], rate: 1.5 });

// Save to file
await voice.save("Hello!", "output.mp3");

// Get audio data with duration
const audio = await voice.toAudio("Hello world!");
console.log(`Duration: ${audio.duration}s`);

// List available voices
const voices = await voice.listVoices();

You can also provide a custom provider chain using names, [name, options] tuples, or factory functions:

import { VoiPi } from "voipi";

// Using provider names
const voice = new VoiPi({
  providers: ["edge-tts", "macos"],
});

// Using [name, options] tuples for provider configuration
const voice2 = new VoiPi({
  providers: [["edge-tts", { voice: "en-US-GuyNeural" }], "macos"],
});

// Using factory functions for full control
import { MacOS, EdgeTTS } from "voipi";

const voice3 = new VoiPi({
  providers: [() => new EdgeTTS({ voice: "en-US-GuyNeural" }), () => new MacOS()],
});

Language Detection

VoiPi automatically detects the language of input text and selects an appropriate voice. This works across all providers — no manual voice selection needed for non-English text:

await voice.speak("سلام دنیا"); // Farsi → picks a Farsi voice
await voice.speak("مرحبا بالعالم"); // Arabic → picks an Arabic voice
await voice.speak("こんにちは"); // Japanese → picks a Japanese voice
await voice.speak("你好世界"); // Chinese → picks a Chinese voice
await voice.speak("L'éducation française est très appréciée"); // French → picks a French voice
await voice.speak("Straßenbahn und Gemütlichkeit"); // German → picks a German voice
await voice.speak("¿Cómo estás?"); // Spanish → picks a Spanish voice

Detects 30+ languages: unique scripts (Arabic, Farsi, Urdu, CJK, Cyrillic, Devanagari, etc.) and Latin-script languages via diacritics analysis (French, Spanish, German, Portuguese, Turkish, Polish, Czech, Romanian, Vietnamese, and more). You can also use the detection utility directly:

import { detectLanguage } from "voipi";

detectLanguage("سلام دنیا"); // "fa"
detectLanguage("Hello world"); // "en"
detectLanguage("こんにちは世界"); // "ja"
detectLanguage("L'éducation française"); // "fr"
detectLanguage("Straßenbahn"); // "de"

Duration Estimation

Estimate playback duration before or after synthesis:

import { estimateSpeechDuration, getAudioDuration } from "voipi";

// Pre-synthesis: estimate from text (~150 WPM heuristic)
const seconds = estimateSpeechDuration("Hello world!", 1.0);

// Post-synthesis: parse actual audio buffer (WAV/AIFF exact, MP3 estimated)
const audio = await voice.toAudio("Hello world!"); // duration auto-populated
console.log(audio.duration); // seconds

Cancellation

Pass an AbortSignal to cancel synthesis, playback, downloads, and subprocesses:

const ctrl = new AbortController();
setTimeout(() => ctrl.abort(), 500);
await voice.speak("This will be cut off…", { signal: ctrl.signal });

Providers

macOS

Uses the native say command. Only available on macOS.

import { MacOS } from "voipi/macos";

const voice = new MacOS({ voice: "Samantha", rate: 1.2 });
await voice.speak("Hello world!");

// Override defaults per call
await voice.speak("Hello!", { voice: "Daniel", rate: 1.5 });

Edge TTS

Cross-platform online TTS using Microsoft Edge's neural speech service. 322+ voices with configurable rate, pitch, and volume.

import { EdgeTTS } from "voipi/edge-tts";

const voice = new EdgeTTS({ voice: "en-US-AriaNeural" });
await voice.speak("Hello world!");

// List all available voices
const voices = await voice.listVoices();

Google TTS

Cross-platform online TTS using Google Translate's speech endpoint. 55+ languages, zero config.

import { GoogleTTS } from "voipi/google-tts";

const voice = new GoogleTTS({ voice: "en" });
await voice.speak("Hello world!");

// Different language
const fr = new GoogleTTS({ voice: "fr" });
await fr.speak("Bonjour le monde!");

Piper

Local neural TTS powered by Piper. 40+ languages, fully offline after first download. Uses an existing piper install if found in PATH, otherwise auto-installs a standalone binary (Linux x86_64/aarch64) or pip venv (macOS/Windows). Voice models (ONNX) are downloaded on demand from HuggingFace and cached locally.

import { Piper } from "voipi/piper";

const voice = new Piper();
await voice.speak("Hello world!");

// Custom voice, speed, and speaker
const voice2 = new Piper({ voice: "en_US-lessac-medium", lengthScale: 0.8, speaker: 0 });
await voice2.speak("Hello!");

// List all available voices
const voices = await voice.listVoices();

eSpeak NG

Local TTS using the eSpeak NG speech synthesizer. Requires espeak-ng installed on the system (available in KDE, etc). Supports 100+ languages with formant-based synthesis.

Note: It produces robotic-sounding output, for natural-sounding voices, prefer Piper which uses neural TTS.

import { EspeakNG } from "voipi/espeak-ng";

const voice = await EspeakNG.create();
await voice.speak("Hello world!");

// Custom voice and speed
const voice2 = await EspeakNG.create({ voice: "en-us+f3", rate: 1.2 });
await voice2.speak("Hello!");

// List all available voices
const voices = await voice.listVoices();

Browser TTS

Uses the Web Speech API (speechSynthesis). Works in browsers only — speaks directly without producing audio files.

import { BrowserTTS } from "voipi/browser";

const voice = new BrowserTTS();
await voice.speak("Hello world!");

// Pick a specific voice
await voice.speak("Hello!", { voice: "Google US English", rate: 1.2 });

// List available voices (varies by browser/OS)
const voices = await voice.listVoices();

Note: Browser TTS plays audio directly and does not support save() or raw audio export.

Pi Extension

VoiPi also ships with a pi package that adds TTS tools and commands to pi.

pi install git:github.com/pithings/voipi

See packages/pi/README.md for usage details.

License

Published under the MIT license 💛.