npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2025 – Pkg Stats / Ryan Hefner

multi-voice-sdk

v1.1.1

Published

A universal Text-to-Speech (TTS) and Speech-to-Text (STT) SDK supporting multiple providers (OpenAI, Google Gemini, Deepgram, Groq PlayAI, Cartesia, AssemblyAI) with audio merging capabilities

Readme

Multi-Voice SDK

A universal Text-to-Speech (TTS) and Speech-to-Text (STT) SDK that supports multiple providers including Google Gemini, Deepgram, OpenAI, Groq PlayAI, Cartesia, and AssemblyAI. Easily generate audio content, transcribe speech, and manage audio files with a unified API.

Features

  • 🎵 Multi-Provider TTS: Gemini, Deepgram, OpenAI, Groq PlayAI, and Cartesia TTS
  • 🎙️ Speech-to-Text: Deepgram and AssemblyAI STT with advanced features
  • 🔧 Audio Merging: Combine multiple audio files seamlessly
  • 🎯 Simple API: Easy-to-use functions with consistent interface
  • 📦 ESM Ready: Modern ES modules support

Installation

npm install multi-voice-sdk

Quick Start

import { tts, stt, merge } from "multi-voice-sdk";

// Generate speech with OpenAI
tts({
  provider: "openai",
  apiKey: "your-api-key",
  text: "Hello, world!",
  voice: "nova",
  outputFile: "output.mp3",
});

// Transcribe audio with Deepgram
stt({
  apiKey: "your-deepgram-key",
  audioFile: "https://example.com/audio.wav", // Can be URL or local file
});

// Merge multiple audio files
merge({
  inputFiles: ["file1.mp3", "file2.mp3"],
  outputFile: "combined.mp3",
});

API Reference

tts(options)

Generate speech from text using various TTS providers.

Parameters

| Parameter | Type | Required | Description | | ------------ | -------- | -------- | ----------------------------------------------------------------------------- | | provider | string | ✅ | TTS provider: "gemini", "deepgram", "openai", "groq", or "cartesia" | | apiKey | string | ✅ | API key for the chosen provider | | text | string | ✅ | Text to convert to speech | | voice | string | ✅ | Voice identifier (provider-specific, for Cartesia use voice ID) | | outputFile | string | optional | Output file path (default: "output.mp3") | | model | string | optional | Model to use (provider-specific) | | prompt | string | optional | Additional instructions for speech generation |

Examples

OpenAI TTS

tts({
  provider: "openai",
  apiKey: process.env.OPENAI_API_KEY,
  model: "gpt-4o-mini-tts",
  text: "Hello from OpenAI!",
  voice: "nova",
  prompt: "Speak in a cheerful tone",
  outputFile: "openai_output.mp3",
});

Google Gemini TTS

tts({
  provider: "gemini",
  apiKey: process.env.GEMINI_API_KEY,
  text: "Hello from Gemini!",
  voice: "iapetus",
  prompt: "In a pleasant and calm tone",
  outputFile: "gemini_output.mp3",
});

Deepgram TTS

tts({
  provider: "deepgram",
  apiKey: process.env.DEEPGRAM_API_KEY,
  text: "Hello from Deepgram!",
  voice: "aura-2-luna-en",
  outputFile: "deepgram_output.mp3",
});

Groq PlayAI TTS

tts({
  provider: "groq",
  apiKey: process.env.GROQ_API_KEY,
  text: "Hello from Groq PlayAI!",
  voice: "Arista-PlayAI",
  outputFile: "groq_output.wav",
});

Cartesia TTS

tts({
  provider: "cartesia",
  apiKey: process.env.CARTESIA_API_KEY,
  text: "Hello from Cartesia!",
  voice: "694f9389-aac1-45b6-b726-9d9369183238", // Voice ID
  outputFile: "cartesia_output.mp3",
});

stt(options)

Transcribe audio to text using Speech-to-Text providers.

Parameters

| Parameter | Type | Required | Description | | ----------------- | --------- | -------- | ------------------------------------------------------------------------- | | provider | string | ✅ | STT provider: "deepgram" or "assemblyai" | | apiKey | string | ✅ | API key for the chosen provider | | audioFile | string | ✅ | Path to local audio file or URL of remote audio file to transcribe | | outputFile | string | optional | Output file path for results (default: "transcription.json") | | model | string | optional | Model to use (default: "nova-3") | | smartFormat | boolean | optional | Enable smart formatting (default: true) | | detect_language | boolean | optional | Automatic language detection (default: true) | | punctuate | boolean | optional | Enable punctuation (default: true) | | diarize | boolean | optional | Enable speaker diarization (default: false) | | channels | number | optional | Number of audio channels (default: 1) | | fullResponse | boolean | optional | Return full response object instead of just transcript (default: false) |

Returns

  • Default: Returns transcript as a string
  • With fullResponse: true: Returns object with transcript, confidence, words, and metadata

Examples

Deepgram : Basic Transcription (Remote URL)

stt({
  provider: "deepgram",
  apiKey: process.env.DEEPGRAM_API_KEY,
  audioFile: "https://example.com/audio.wav", // Remote URL
});

Deepgram : Local File Transcription

stt({
  provider: "deepgram",
  apiKey: process.env.DEEPGRAM_API_KEY,
  audioFile: "./my-audio.mp3", // Local file path
  outputFile: "transcription.json",
});

AssemblyAI : Basic Transcription (Remote URL)

stt({
  provider: "assemblyai",
  apiKey: process.env.ASSEMBLYAI_API_KEY,
  audioFile: "https://example.com/audio.wav", // Remote URL
  outputFile: "transcription.json",
});

AssemblyAI : Local File Transcription

stt({
  provider: "assemblyai",
  apiKey: process.env.ASSEMBLYAI_API_KEY,
  audioFile: "./my-audio.mp3", // Local file path
  outputFile: "transcription.json",
  fullResponse: true, // Get detailed response
});

merge(options)

Merge multiple audio files into a single file.

Parameters

| Parameter | Type | Required | Description | | ------------ | ---------- | -------- | ------------------------- | | inputFiles | string[] | ✅ | Array of input file paths | | outputFile | string | ✅ | Output file path |

Example

merge({
  inputFiles: ["intro.mp3", "main.mp3", "outro.mp3"],
  outputFile: "complete_audio.mp3",
});

Supported Voices

OpenAI

  • alloy, ash, ballad, coral, echo, fable, onyx, nova, sage, shimmer, verse

Gemini

  • zephyr (Bright), puck (Upbeat), charon (Informative), kore (Firm), fenrir (Excitable), leda (Youthful), orus (Firm), aoede (Breezy), autonoe (Bright), enceladus (Breathy), iapetus (Clear)

For a complete list of available Gemini voices, see: Gemini Speech Generation Documentation

Deepgram

  • aura-2-luna-en, aura-2-stella-en, aura-2-arcas-en, and more

For a complete list of available Deepgram voices, see: Deepgram TTS Models Documentation

Groq PlayAI

  • Atlas-PlayAI, Arista-PlayAI, Basil-PlayAI, Briggs-PlayAI, and more

For a complete list of available Groq PlayAI voices, see: Groq TTS Documentation

Cartesia

Cartesia uses voice IDs instead of voice names. Example voice IDs:

  • 694f9389-aac1-45b6-b726-9d9369183238 (Default voice)
  • Use the Cartesia console to find available voice IDs for your account

For more information about Cartesia voices, see: Cartesia Console

Environment Variables

Create a .env file in your project root:

OPENAI_API_KEY=your_openai_api_key
GEMINI_API_KEY=your_gemini_api_key
DEEPGRAM_API_KEY=your_deepgram_api_key
GROQ_API_KEY=your_groq_api_key
CARTESIA_API_KEY=your_cartesia_api_key

Requirements

  • Node.js 16.x or higher

License

ISC