@zyphra/client

v1.0.5

Published

2 months ago

A TypeScript client library for interacting with Zyphra's text-to-speech API.

Downloads

2,766

0High
0Medium
0Low

zyphra-jim

iansears

berenmillidge

Zyphra TypeScript Client

A TypeScript client library for interacting with Zyphra's text-to-speech API.

Installation

npm install @zyphra/client
# or
yarn add @zyphra/client

Quick Start

import { ZyphraClient } from '@zyphra/client';

// Initialize the client
const client = new ZyphraClient({ apiKey: 'your-api-key' });

// Generate speech
const audioBlob = await client.audio.speech.create({
  text: 'Hello, world!',
  speaking_rate: 15,
  model: 'zonos-v0.1-transformer' // Default model
});

// Save to file (browser)
const url = URL.createObjectURL(audioBlob);
const a = document.createElement('a');
a.href = url;
a.download = 'output.webm';
a.click();
URL.revokeObjectURL(url);

Features

Text-to-speech generation with customizable parameters
Support for multiple languages and audio formats
Voice cloning capabilities
Multiple TTS models with specialized capabilities
TypeScript types included
Browser and Node.js support
Returns audio as Blob for easy handling
Support for default and custom voice selection

Parameters

The text-to-speech API accepts the following parameters:

interface TTSParams {
  text: string;                // The text to convert to speech (required)
  speaker_audio?: string;      // Base64 audio for voice cloning
  speaking_rate?: number;      // Speaking rate (5-35, default: 15.0)
  fmax?: number;               // Frequency max (0-24000, default: 22050)
  pitch_std?: number;          // Pitch standard deviation (0-500, default: 45.0) (transformer model only)
  emotion?: EmotionWeights;    // Emotional weights (transformer model only)
  language_iso_code?: string;  // Language code (e.g., "en-us", "fr-fr") 
  mime_type?: string;          // Output audio format (e.g., "audio/webm")
  model?: SupportedModel;      // TTS model (default: 'zonos-v0.1-transformer')
  speaker_noised?: boolean;    // Denoises to improve stability (hybrid model only, default: True)
  default_voice_name?: string; // Name of a default voice to use
  voice_name?: string;         // Name of one of the user's voices to use
}

// Available models
type SupportedModel = 'zonos-v0.1-transformer' | 'zonos-v0.1-hybrid';

interface EmotionWeights {
  happiness: number;  // default: 0.6
  sadness: number;    // default: 0.05
  disgust: number;    // default: 0.05
  fear: number;       // default: 0.05
  surprise: number;   // default: 0.05
  anger: number;      // default: 0.05
  other: number;      // default: 0.5
  neutral: number;    // default: 0.6
}

Detailed Usage

Supported TTS Models

The API supports the following TTS models:

zonos-v0.1-transformer (Default): A standard transformer-based TTS model suitable for most applications.
- Supports pitch_std and emotions parameters
zonos-v0.1-hybrid: An advanced model with:
- Better support for certain languages (especially Japanese)
- Supports speaker_noised denoising parameter
- Improved voice quality in some scenarios

Supported Languages

The text-to-speech API supports the following languages:

English (US) - en-us
French - fr-fr
German - de
Japanese - ja (recommended to use with zonos-v0.1-hybrid model)
Korean - ko
Mandarin Chinese - cmn

Supported Audio Formats

The API supports multiple output formats through the mime_type parameter:

WebM (default) - audio/webm
Ogg - audio/ogg
WAV - audio/wav
MP3 - audio/mp3 or audio/mpeg
MP4/AAC - audio/mp4 or audio/aac

Language and Format Examples

// Generate French speech in MP3 format
const frenchAudio = await client.audio.speech.create({
  text: 'Bonjour le monde!',
  language_iso_code: 'fr-fr',
  mime_type: 'audio/mp3',
  speaking_rate: 15
});

// Generate Japanese speech with hybrid model (recommended)
const japaneseAudio = await client.audio.speech.create({
  text: 'こんにちは世界！',
  language_iso_code: 'ja',
  mime_type: 'audio/wav',
  speaking_rate: 15,
  model: 'zonos-v0.1-hybrid' // Better for Japanese
});

Using Default and Custom Voices

You can use pre-defined default voices or your own custom voices:

// Using a default voice
const defaultVoiceAudio = await client.audio.speech.create({
  text: 'This uses a default voice.',
  default_voice_name: 'american_female',
  speaking_rate: 15
});

Available Default Voices

The following default voices are available:

american_female - Standard American English female voice
american_male - Standard American English male voice
anime_girl - Stylized anime girl character voice
british_female - British English female voice
british_male - British English male voice
energetic_boy - Energetic young male voice
energetic_girl - Energetic young female voice
japanese_female - Japanese female voice
japanese_male - Japanese male voice

Using Custom Voices

You can use your own custom voices that have been created and stored in your account:

// Using a custom voice you've created and stored
const customVoiceAudio = await client.audio.speech.create({
  text: 'This uses your custom voice.',
  voice_name: 'my_custom_voice',
  speaking_rate: 15
});

Note: When using custom voices, the voice_name parameter should exactly match the name as it appears in your voices list on playground.zyphra.com/audio. The name is case-sensitive.

Model-Specific Parameters

For the hybrid model (zonos-v0.1-hybrid), you can utilize additional parameters:

// Using the hybrid model with its specific parameters
const hybridModelAudio = await client.audio.speech.create({
  text: 'This uses the hybrid model with special parameters.',
  model: 'zonos-v0.1-hybrid',
  speaker_noised: true,   // Denoises to improve stability
  speaking_rate: 15
});

Emotion Control

You can adjust the emotional tone of the speech:

const emotionalSpeech = await client.audio.speech.create({
  text: 'This is a happy message!',
  emotion: {
    happiness: 0.8,  // Increase happiness
    neutral: 0.3,    // Decrease neutrality
    sadness: 0.05,   // Keep other emotions at default values
    disgust: 0.05,
    fear: 0.05,
    surprise: 0.05,
    anger: 0.05,
    other: 0.5
  }
});

Voice Cloning

You can clone voices by providing a reference audio file as a base64 string:

// Node.js environment
const fs = require('fs');
const audio_base64 = fs.readFileSync('reference_voice.wav').toString('base64');

const audioBlob = await client.audio.speech.create({
  text: 'This will use the cloned voice',
  speaker_audio: audio_base64,
  speaking_rate: 15
});

// Browser environment
const fileInput = document.querySelector('input[type="file"]');
const file = await fileInput.files[0];
const reader = new FileReader();

reader.onload = async () => {
  const base64 = reader.result.split(',')[1];
  
  const audioBlob = await client.audio.speech.create({
    text: 'This will use the cloned voice',
    speaker_audio: base64,
    speaking_rate: 15
  });
};

reader.readAsDataURL(file);

Streaming Support

For streaming audio directly:

const { stream, mimeType } = await client.audio.speech.createStream({
  text: 'This will be streamed to the client',
  speaking_rate: 15,
  model: 'zonos-v0.1-transformer'
});

// Use with audio element in browser
const audioElement = document.createElement('audio');
audioElement.src = URL.createObjectURL(new Blob([], { type: mimeType }));
audioElement.controls = true;

// Process the stream
const reader = stream.getReader();
while (true) {
  const { done, value } = await reader.read();
  if (done) break;
  // Add each chunk to the audio element
  audioElement.src = URL.createObjectURL(
    new Blob([value], { type: mimeType })
  );
}

document.body.appendChild(audioElement);

Callback Options

You can also use callbacks to track progress during audio generation:

const audioBlob = await client.audio.speech.create(
  {
    text: 'Audio with progress tracking',
    speaking_rate: 15,
    model: 'zonos-v0.1-transformer'
  },
  {
    onChunk: (chunk) => {
      console.log('Received chunk:', chunk.length, 'bytes');
    },
    onProgress: (totalBytes) => {
      console.log('Total bytes received:', totalBytes);
    },
    onComplete: (blob) => {
      console.log('Audio generation complete!', blob.size, 'bytes');
    }
  }
);

Error Handling

import { ZyphraError } from '@zyphra/client';

try {
  const audioBlob = await client.audio.speech.create({
    text: 'Hello, world!',
    speaking_rate: 15,
    model: 'zonos-v0.1-transformer'
  });
} catch (error) {
  if (error instanceof ZyphraError) {
    console.error(`Error: ${error.statusCode} - ${error.response}`);
  }
}