@zyphra/client
v1.0.5
Published
A TypeScript client library for interacting with Zyphra's text-to-speech API.
Readme
Zyphra TypeScript Client
A TypeScript client library for interacting with Zyphra's text-to-speech API.
Installation
npm install @zyphra/client
# or
yarn add @zyphra/clientQuick Start
import { ZyphraClient } from '@zyphra/client';
// Initialize the client
const client = new ZyphraClient({ apiKey: 'your-api-key' });
// Generate speech
const audioBlob = await client.audio.speech.create({
text: 'Hello, world!',
speaking_rate: 15,
model: 'zonos-v0.1-transformer' // Default model
});
// Save to file (browser)
const url = URL.createObjectURL(audioBlob);
const a = document.createElement('a');
a.href = url;
a.download = 'output.webm';
a.click();
URL.revokeObjectURL(url);Features
- Text-to-speech generation with customizable parameters
- Support for multiple languages and audio formats
- Voice cloning capabilities
- Multiple TTS models with specialized capabilities
- TypeScript types included
- Browser and Node.js support
- Returns audio as Blob for easy handling
- Support for default and custom voice selection
Parameters
The text-to-speech API accepts the following parameters:
interface TTSParams {
text: string; // The text to convert to speech (required)
speaker_audio?: string; // Base64 audio for voice cloning
speaking_rate?: number; // Speaking rate (5-35, default: 15.0)
fmax?: number; // Frequency max (0-24000, default: 22050)
pitch_std?: number; // Pitch standard deviation (0-500, default: 45.0) (transformer model only)
emotion?: EmotionWeights; // Emotional weights (transformer model only)
language_iso_code?: string; // Language code (e.g., "en-us", "fr-fr")
mime_type?: string; // Output audio format (e.g., "audio/webm")
model?: SupportedModel; // TTS model (default: 'zonos-v0.1-transformer')
speaker_noised?: boolean; // Denoises to improve stability (hybrid model only, default: True)
default_voice_name?: string; // Name of a default voice to use
voice_name?: string; // Name of one of the user's voices to use
}
// Available models
type SupportedModel = 'zonos-v0.1-transformer' | 'zonos-v0.1-hybrid';
interface EmotionWeights {
happiness: number; // default: 0.6
sadness: number; // default: 0.05
disgust: number; // default: 0.05
fear: number; // default: 0.05
surprise: number; // default: 0.05
anger: number; // default: 0.05
other: number; // default: 0.5
neutral: number; // default: 0.6
}Detailed Usage
Supported TTS Models
The API supports the following TTS models:
zonos-v0.1-transformer(Default): A standard transformer-based TTS model suitable for most applications.- Supports pitch_std and emotions parameters
zonos-v0.1-hybrid: An advanced model with:- Better support for certain languages (especially Japanese)
- Supports
speaker_noiseddenoising parameter - Improved voice quality in some scenarios
Supported Languages
The text-to-speech API supports the following languages:
- English (US) -
en-us - French -
fr-fr - German -
de - Japanese -
ja(recommended to use withzonos-v0.1-hybridmodel) - Korean -
ko - Mandarin Chinese -
cmn
Supported Audio Formats
The API supports multiple output formats through the mime_type parameter:
- WebM (default) -
audio/webm - Ogg -
audio/ogg - WAV -
audio/wav - MP3 -
audio/mp3oraudio/mpeg - MP4/AAC -
audio/mp4oraudio/aac
Language and Format Examples
// Generate French speech in MP3 format
const frenchAudio = await client.audio.speech.create({
text: 'Bonjour le monde!',
language_iso_code: 'fr-fr',
mime_type: 'audio/mp3',
speaking_rate: 15
});
// Generate Japanese speech with hybrid model (recommended)
const japaneseAudio = await client.audio.speech.create({
text: 'こんにちは世界!',
language_iso_code: 'ja',
mime_type: 'audio/wav',
speaking_rate: 15,
model: 'zonos-v0.1-hybrid' // Better for Japanese
});Using Default and Custom Voices
You can use pre-defined default voices or your own custom voices:
// Using a default voice
const defaultVoiceAudio = await client.audio.speech.create({
text: 'This uses a default voice.',
default_voice_name: 'american_female',
speaking_rate: 15
});Available Default Voices
The following default voices are available:
american_female- Standard American English female voiceamerican_male- Standard American English male voiceanime_girl- Stylized anime girl character voicebritish_female- British English female voicebritish_male- British English male voiceenergetic_boy- Energetic young male voiceenergetic_girl- Energetic young female voicejapanese_female- Japanese female voicejapanese_male- Japanese male voice
Using Custom Voices
You can use your own custom voices that have been created and stored in your account:
// Using a custom voice you've created and stored
const customVoiceAudio = await client.audio.speech.create({
text: 'This uses your custom voice.',
voice_name: 'my_custom_voice',
speaking_rate: 15
});Note: When using custom voices, the voice_name parameter should exactly match the name as it appears in your voices list on playground.zyphra.com/audio. The name is case-sensitive.
Model-Specific Parameters
For the hybrid model (zonos-v0.1-hybrid), you can utilize additional parameters:
// Using the hybrid model with its specific parameters
const hybridModelAudio = await client.audio.speech.create({
text: 'This uses the hybrid model with special parameters.',
model: 'zonos-v0.1-hybrid',
speaker_noised: true, // Denoises to improve stability
speaking_rate: 15
});Emotion Control
You can adjust the emotional tone of the speech:
const emotionalSpeech = await client.audio.speech.create({
text: 'This is a happy message!',
emotion: {
happiness: 0.8, // Increase happiness
neutral: 0.3, // Decrease neutrality
sadness: 0.05, // Keep other emotions at default values
disgust: 0.05,
fear: 0.05,
surprise: 0.05,
anger: 0.05,
other: 0.5
}
});Voice Cloning
You can clone voices by providing a reference audio file as a base64 string:
// Node.js environment
const fs = require('fs');
const audio_base64 = fs.readFileSync('reference_voice.wav').toString('base64');
const audioBlob = await client.audio.speech.create({
text: 'This will use the cloned voice',
speaker_audio: audio_base64,
speaking_rate: 15
});
// Browser environment
const fileInput = document.querySelector('input[type="file"]');
const file = await fileInput.files[0];
const reader = new FileReader();
reader.onload = async () => {
const base64 = reader.result.split(',')[1];
const audioBlob = await client.audio.speech.create({
text: 'This will use the cloned voice',
speaker_audio: base64,
speaking_rate: 15
});
};
reader.readAsDataURL(file);Streaming Support
For streaming audio directly:
const { stream, mimeType } = await client.audio.speech.createStream({
text: 'This will be streamed to the client',
speaking_rate: 15,
model: 'zonos-v0.1-transformer'
});
// Use with audio element in browser
const audioElement = document.createElement('audio');
audioElement.src = URL.createObjectURL(new Blob([], { type: mimeType }));
audioElement.controls = true;
// Process the stream
const reader = stream.getReader();
while (true) {
const { done, value } = await reader.read();
if (done) break;
// Add each chunk to the audio element
audioElement.src = URL.createObjectURL(
new Blob([value], { type: mimeType })
);
}
document.body.appendChild(audioElement);Callback Options
You can also use callbacks to track progress during audio generation:
const audioBlob = await client.audio.speech.create(
{
text: 'Audio with progress tracking',
speaking_rate: 15,
model: 'zonos-v0.1-transformer'
},
{
onChunk: (chunk) => {
console.log('Received chunk:', chunk.length, 'bytes');
},
onProgress: (totalBytes) => {
console.log('Total bytes received:', totalBytes);
},
onComplete: (blob) => {
console.log('Audio generation complete!', blob.size, 'bytes');
}
}
);Error Handling
import { ZyphraError } from '@zyphra/client';
try {
const audioBlob = await client.audio.speech.create({
text: 'Hello, world!',
speaking_rate: 15,
model: 'zonos-v0.1-transformer'
});
} catch (error) {
if (error instanceof ZyphraError) {
console.error(`Error: ${error.statusCode} - ${error.response}`);
}
}Available Models
Speech Models
zonos-v0.1-transformer: Default transformer-based TTS modelzonos-v0.1-hybrid: Advanced hybrid TTS model with enhanced language support
License
MIT License
