multi-voice-sdk
v1.1.1
Published
A universal Text-to-Speech (TTS) and Speech-to-Text (STT) SDK supporting multiple providers (OpenAI, Google Gemini, Deepgram, Groq PlayAI, Cartesia, AssemblyAI) with audio merging capabilities
Maintainers
Readme
Multi-Voice SDK
A universal Text-to-Speech (TTS) and Speech-to-Text (STT) SDK that supports multiple providers including Google Gemini, Deepgram, OpenAI, Groq PlayAI, Cartesia, and AssemblyAI. Easily generate audio content, transcribe speech, and manage audio files with a unified API.
Features
- 🎵 Multi-Provider TTS: Gemini, Deepgram, OpenAI, Groq PlayAI, and Cartesia TTS
- 🎙️ Speech-to-Text: Deepgram and AssemblyAI STT with advanced features
- 🔧 Audio Merging: Combine multiple audio files seamlessly
- 🎯 Simple API: Easy-to-use functions with consistent interface
- 📦 ESM Ready: Modern ES modules support
Installation
npm install multi-voice-sdkQuick Start
import { tts, stt, merge } from "multi-voice-sdk";
// Generate speech with OpenAI
tts({
provider: "openai",
apiKey: "your-api-key",
text: "Hello, world!",
voice: "nova",
outputFile: "output.mp3",
});
// Transcribe audio with Deepgram
stt({
apiKey: "your-deepgram-key",
audioFile: "https://example.com/audio.wav", // Can be URL or local file
});
// Merge multiple audio files
merge({
inputFiles: ["file1.mp3", "file2.mp3"],
outputFile: "combined.mp3",
});API Reference
tts(options)
Generate speech from text using various TTS providers.
Parameters
| Parameter | Type | Required | Description |
| ------------ | -------- | -------- | ----------------------------------------------------------------------------- |
| provider | string | ✅ | TTS provider: "gemini", "deepgram", "openai", "groq", or "cartesia" |
| apiKey | string | ✅ | API key for the chosen provider |
| text | string | ✅ | Text to convert to speech |
| voice | string | ✅ | Voice identifier (provider-specific, for Cartesia use voice ID) |
| outputFile | string | optional | Output file path (default: "output.mp3") |
| model | string | optional | Model to use (provider-specific) |
| prompt | string | optional | Additional instructions for speech generation |
Examples
OpenAI TTS
tts({
provider: "openai",
apiKey: process.env.OPENAI_API_KEY,
model: "gpt-4o-mini-tts",
text: "Hello from OpenAI!",
voice: "nova",
prompt: "Speak in a cheerful tone",
outputFile: "openai_output.mp3",
});Google Gemini TTS
tts({
provider: "gemini",
apiKey: process.env.GEMINI_API_KEY,
text: "Hello from Gemini!",
voice: "iapetus",
prompt: "In a pleasant and calm tone",
outputFile: "gemini_output.mp3",
});Deepgram TTS
tts({
provider: "deepgram",
apiKey: process.env.DEEPGRAM_API_KEY,
text: "Hello from Deepgram!",
voice: "aura-2-luna-en",
outputFile: "deepgram_output.mp3",
});Groq PlayAI TTS
tts({
provider: "groq",
apiKey: process.env.GROQ_API_KEY,
text: "Hello from Groq PlayAI!",
voice: "Arista-PlayAI",
outputFile: "groq_output.wav",
});Cartesia TTS
tts({
provider: "cartesia",
apiKey: process.env.CARTESIA_API_KEY,
text: "Hello from Cartesia!",
voice: "694f9389-aac1-45b6-b726-9d9369183238", // Voice ID
outputFile: "cartesia_output.mp3",
});stt(options)
Transcribe audio to text using Speech-to-Text providers.
Parameters
| Parameter | Type | Required | Description |
| ----------------- | --------- | -------- | ------------------------------------------------------------------------- |
| provider | string | ✅ | STT provider: "deepgram" or "assemblyai" |
| apiKey | string | ✅ | API key for the chosen provider |
| audioFile | string | ✅ | Path to local audio file or URL of remote audio file to transcribe |
| outputFile | string | optional | Output file path for results (default: "transcription.json") |
| model | string | optional | Model to use (default: "nova-3") |
| smartFormat | boolean | optional | Enable smart formatting (default: true) |
| detect_language | boolean | optional | Automatic language detection (default: true) |
| punctuate | boolean | optional | Enable punctuation (default: true) |
| diarize | boolean | optional | Enable speaker diarization (default: false) |
| channels | number | optional | Number of audio channels (default: 1) |
| fullResponse | boolean | optional | Return full response object instead of just transcript (default: false) |
Returns
- Default: Returns transcript as a string
- With
fullResponse: true: Returns object with transcript, confidence, words, and metadata
Examples
Deepgram : Basic Transcription (Remote URL)
stt({
provider: "deepgram",
apiKey: process.env.DEEPGRAM_API_KEY,
audioFile: "https://example.com/audio.wav", // Remote URL
});Deepgram : Local File Transcription
stt({
provider: "deepgram",
apiKey: process.env.DEEPGRAM_API_KEY,
audioFile: "./my-audio.mp3", // Local file path
outputFile: "transcription.json",
});AssemblyAI : Basic Transcription (Remote URL)
stt({
provider: "assemblyai",
apiKey: process.env.ASSEMBLYAI_API_KEY,
audioFile: "https://example.com/audio.wav", // Remote URL
outputFile: "transcription.json",
});AssemblyAI : Local File Transcription
stt({
provider: "assemblyai",
apiKey: process.env.ASSEMBLYAI_API_KEY,
audioFile: "./my-audio.mp3", // Local file path
outputFile: "transcription.json",
fullResponse: true, // Get detailed response
});merge(options)
Merge multiple audio files into a single file.
Parameters
| Parameter | Type | Required | Description |
| ------------ | ---------- | -------- | ------------------------- |
| inputFiles | string[] | ✅ | Array of input file paths |
| outputFile | string | ✅ | Output file path |
Example
merge({
inputFiles: ["intro.mp3", "main.mp3", "outro.mp3"],
outputFile: "complete_audio.mp3",
});Supported Voices
OpenAI
alloy,ash,ballad,coral,echo,fable,onyx,nova,sage,shimmer,verse
Gemini
zephyr(Bright),puck(Upbeat),charon(Informative),kore(Firm),fenrir(Excitable),leda(Youthful),orus(Firm),aoede(Breezy),autonoe(Bright),enceladus(Breathy),iapetus(Clear)
For a complete list of available Gemini voices, see: Gemini Speech Generation Documentation
Deepgram
aura-2-luna-en,aura-2-stella-en,aura-2-arcas-en, and more
For a complete list of available Deepgram voices, see: Deepgram TTS Models Documentation
Groq PlayAI
Atlas-PlayAI,Arista-PlayAI,Basil-PlayAI,Briggs-PlayAI, and more
For a complete list of available Groq PlayAI voices, see: Groq TTS Documentation
Cartesia
Cartesia uses voice IDs instead of voice names. Example voice IDs:
694f9389-aac1-45b6-b726-9d9369183238(Default voice)- Use the Cartesia console to find available voice IDs for your account
For more information about Cartesia voices, see: Cartesia Console
Environment Variables
Create a .env file in your project root:
OPENAI_API_KEY=your_openai_api_key
GEMINI_API_KEY=your_gemini_api_key
DEEPGRAM_API_KEY=your_deepgram_api_key
GROQ_API_KEY=your_groq_api_key
CARTESIA_API_KEY=your_cartesia_api_keyRequirements
- Node.js 16.x or higher
License
ISC
