kugelaudio
v0.1.7
Published
Official JavaScript/TypeScript SDK for KugelAudio TTS API
Maintainers
Readme
KugelAudio JavaScript/TypeScript SDK
Official JavaScript/TypeScript SDK for the KugelAudio Text-to-Speech API.
Installation
npm install kugelaudioOr with yarn:
yarn add kugelaudioOr with pnpm:
pnpm add kugelaudioQuick Start
import { KugelAudio } from 'kugelaudio';
// Initialize the client - just needs an API key!
const client = new KugelAudio({ apiKey: 'your_api_key' });
// Generate speech
const audio = await client.tts.generate({
text: 'Hello, world!',
model: 'kugel-1-turbo',
});
// Create a playable blob (browser)
const blob = new Blob([audio.audio], { type: 'audio/wav' });
const url = URL.createObjectURL(blob);
const audioElement = new Audio(url);
audioElement.play();Client Configuration
import { KugelAudio } from 'kugelaudio';
// Simple setup - single URL handles everything
const client = new KugelAudio({ apiKey: 'your_api_key' });
// Or with custom options
const client = new KugelAudio({
apiKey: 'your_api_key', // Required: Your API key
apiUrl: 'https://api.kugelaudio.com', // Optional: API base URL (default)
timeout: 60000, // Optional: Request timeout in ms
});Single URL Architecture
The SDK uses a single URL for both REST API and WebSocket streaming. The TTS server provides both REST endpoints (/v1/models, /v1/voices) and WebSocket (/ws/tts) - no proxy needed, minimal latency.
Local Development
For local development, point directly to your TTS server:
const client = new KugelAudio({
apiKey: 'your_api_key',
apiUrl: 'http://localhost:8000', // TTS server handles everything
});Or if you have separate backend and TTS servers:
const client = new KugelAudio({
apiKey: 'your_api_key',
apiUrl: 'http://localhost:8001', // Backend for REST API
ttsUrl: 'http://localhost:8000', // TTS server for WebSocket streaming
});Available Models
| Model ID | Name | Parameters | Description |
|----------|------|------------|-------------|
| kugel-1-turbo | Kugel 1 Turbo | 1.5B | Fast, low-latency model for real-time applications |
| kugel-1 | Kugel 1 | 7B | Premium quality model for pre-recorded content |
List Available Models
const models = await client.models.list();
for (const model of models) {
console.log(`${model.id}: ${model.name}`);
console.log(` Description: ${model.description}`);
console.log(` Parameters: ${model.parameters}`);
console.log(` Max Input: ${model.maxInputLength} characters`);
console.log(` Sample Rate: ${model.sampleRate} Hz`);
}Voices
List Available Voices
// List all available voices
const voices = await client.voices.list();
for (const voice of voices) {
console.log(`${voice.id}: ${voice.name}`);
console.log(` Category: ${voice.category}`);
console.log(` Languages: ${voice.supportedLanguages.join(', ')}`);
}
// Filter by language
const germanVoices = await client.voices.list({ language: 'de' });
// Get only public voices
const publicVoices = await client.voices.list({ includePublic: true });
// Limit results
const first10 = await client.voices.list({ limit: 10 });Get a Specific Voice
const voice = await client.voices.get(123);
console.log(`Voice: ${voice.name}`);
console.log(`Sample text: ${voice.sampleText}`);Text-to-Speech Generation
Basic Generation (Non-Streaming)
Generate complete audio and receive it all at once:
const audio = await client.tts.generate({
text: 'Hello, this is a test of the KugelAudio text-to-speech system.',
model: 'kugel-1-turbo', // 'kugel-1-turbo' (fast) or 'kugel-1' (quality)
voiceId: 123, // Optional: specific voice ID
cfgScale: 2.0, // Guidance scale (1.0-5.0)
maxNewTokens: 2048, // Maximum tokens to generate
sampleRate: 24000, // Output sample rate
speakerPrefix: true, // Add speaker prefix for better quality
normalize: true, // Enable text normalization (see below)
language: 'en', // Language for normalization
});
// Audio properties
console.log(`Duration: ${audio.durationMs}ms`);
console.log(`Samples: ${audio.samples}`);
console.log(`Sample rate: ${audio.sampleRate} Hz`);
console.log(`Generation time: ${audio.generationMs}ms`);
console.log(`RTF: ${audio.rtf}`); // Real-time factor
// audio.audio is an ArrayBuffer with PCM16 dataPlaying Audio in Browser
import { createWavBlob } from 'kugelaudio';
const audio = await client.tts.generate({
text: 'Hello, world!',
model: 'kugel-1-turbo',
});
// Create WAV blob for playback
const wavBlob = createWavBlob(audio.audio, audio.sampleRate);
const url = URL.createObjectURL(wavBlob);
// Play with Audio element
const audioElement = new Audio(url);
audioElement.play();
// Or with Web Audio API
const audioContext = new AudioContext();
const arrayBuffer = await wavBlob.arrayBuffer();
const audioBuffer = await audioContext.decodeAudioData(arrayBuffer);
const source = audioContext.createBufferSource();
source.buffer = audioBuffer;
source.connect(audioContext.destination);
source.start();Streaming Audio Output
Receive audio chunks as they are generated for lower latency:
await client.tts.stream(
{
text: 'Hello, this is streaming audio.',
model: 'kugel-1-turbo',
},
{
onOpen: () => {
console.log('WebSocket connected');
},
onChunk: (chunk) => {
console.log(`Chunk ${chunk.index}: ${chunk.samples} samples`);
// chunk.audio is base64-encoded PCM16 data
// Use base64ToArrayBuffer() to decode
playAudioChunk(chunk);
},
onFinal: (stats) => {
console.log(`Total duration: ${stats.durationMs}ms`);
console.log(`Time to first audio: ${stats.ttfaMs}ms`);
console.log(`Generation time: ${stats.generationMs}ms`);
console.log(`RTF: ${stats.rtf}`);
},
onError: (error) => {
console.error('TTS error:', error);
},
onClose: () => {
console.log('WebSocket closed');
},
}
);Processing Audio Chunks
import { base64ToArrayBuffer, decodePCM16 } from 'kugelaudio';
// In streaming callback:
onChunk: (chunk) => {
// Decode base64 to ArrayBuffer
const pcmBuffer = base64ToArrayBuffer(chunk.audio);
// Convert PCM16 to Float32 for Web Audio API
const float32Data = decodePCM16(chunk.audio);
// Play with Web Audio API
const audioBuffer = audioContext.createBuffer(1, float32Data.length, chunk.sampleRate);
audioBuffer.copyToChannel(float32Data, 0);
const source = audioContext.createBufferSource();
source.buffer = audioBuffer;
source.connect(audioContext.destination);
source.start();
}Text Normalization
Text normalization converts numbers, dates, times, and other non-verbal text into spoken words. For example:
- "I have 3 apples" → "I have three apples"
- "The meeting is at 2:30 PM" → "The meeting is at two thirty PM"
- "€50.99" → "fifty euros and ninety-nine cents"
Usage
// With explicit language (recommended - fastest)
const audio = await client.tts.generate({
text: 'I bought 3 items for €50.99 on 01/15/2024.',
normalize: true,
language: 'en', // Specify language for best performance
});
// With auto-detection (adds ~150ms latency)
const audio = await client.tts.generate({
text: 'Ich habe 3 Artikel für 50,99€ gekauft.',
normalize: true,
// language not specified - will auto-detect
});Supported Languages
| Code | Language | Code | Language |
|------|----------|------|----------|
| de | German | nl | Dutch |
| en | English | pl | Polish |
| fr | French | sv | Swedish |
| es | Spanish | da | Danish |
| it | Italian | no | Norwegian |
| pt | Portuguese | fi | Finnish |
| cs | Czech | hu | Hungarian |
| ro | Romanian | el | Greek |
| uk | Ukrainian | bg | Bulgarian |
| tr | Turkish | vi | Vietnamese |
| ar | Arabic | hi | Hindi |
| zh | Chinese | ja | Japanese |
| ko | Korean | | |
Performance Warning
⚠️ Latency Warning: Using
normalize: truewithout specifyinglanguageadds approximately 150ms latency for language auto-detection. For best performance in latency-sensitive applications, always specify thelanguageparameter.
Error Handling
import { KugelAudio } from 'kugelaudio';
import {
KugelAudioError,
AuthenticationError,
RateLimitError,
InsufficientCreditsError,
ValidationError,
ConnectionError,
} from 'kugelaudio';
try {
const audio = await client.tts.generate({ text: 'Hello!' });
} catch (error) {
if (error instanceof AuthenticationError) {
console.error('Invalid API key');
} else if (error instanceof RateLimitError) {
console.error('Rate limit exceeded, please wait');
} else if (error instanceof InsufficientCreditsError) {
console.error('Not enough credits, please top up');
} else if (error instanceof ValidationError) {
console.error(`Invalid request: ${error.message}`);
} else if (error instanceof ConnectionError) {
console.error('Failed to connect to server');
} else if (error instanceof KugelAudioError) {
console.error(`API error: ${error.message}`);
}
}TypeScript Types
KugelAudioOptions
interface KugelAudioOptions {
apiKey: string; // Required
apiUrl?: string; // Default: 'https://api.kugelaudio.com'
ttsUrl?: string; // Default: same as apiUrl (backend proxies to TTS)
timeout?: number; // Default: 60000 (ms)
}GenerateOptions
interface GenerateOptions {
text: string; // Required: Text to synthesize
model?: string; // Default: 'kugel-1-turbo'
voiceId?: number; // Optional: Voice ID
cfgScale?: number; // Default: 2.0
maxNewTokens?: number; // Default: 2048
sampleRate?: number; // Default: 24000
speakerPrefix?: boolean; // Default: true
normalize?: boolean; // Default: false - Enable text normalization
language?: string; // ISO 639-1 code for normalization (e.g., 'en', 'de')
}⚠️ Note: Using
normalize: truewithoutlanguageadds ~150ms latency for auto-detection.
AudioChunk
interface AudioChunk {
audio: string; // Base64-encoded PCM16 audio
encoding: string; // 'pcm_s16le'
index: number; // Chunk index (0-based)
sampleRate: number; // Sample rate (24000)
samples: number; // Number of samples in chunk
}AudioResponse
interface AudioResponse {
audio: ArrayBuffer; // Complete PCM16 audio
sampleRate: number; // Sample rate (24000)
samples: number; // Total samples
durationMs: number; // Duration in milliseconds
generationMs: number; // Generation time in milliseconds
rtf: number; // Real-time factor
}GenerationStats
interface GenerationStats {
final: true;
chunks: number; // Number of chunks generated
totalSamples: number; // Total samples generated
durationMs: number; // Audio duration in ms
generationMs: number; // Generation time in ms
ttfaMs: number; // Time to first audio in ms
rtf: number; // Real-time factor
}StreamCallbacks
interface StreamCallbacks {
onOpen?: () => void;
onChunk?: (chunk: AudioChunk) => void;
onFinal?: (stats: GenerationStats) => void;
onError?: (error: Error) => void;
onClose?: () => void;
}Model
interface Model {
id: string; // 'kugel-1-turbo' or 'kugel-1'
name: string; // Human-readable name
description: string; // Model description
parameters: string; // Parameter count ('1.5B', '7B')
maxInputLength: number; // Maximum input characters
sampleRate: number; // Output sample rate
}Voice
interface Voice {
id: number; // Voice ID
name: string; // Voice name
description?: string; // Description
category?: VoiceCategory; // 'premade' | 'cloned' | 'generated'
sex?: VoiceSex; // 'male' | 'female' | 'neutral'
age?: VoiceAge; // 'young' | 'middle_aged' | 'old'
supportedLanguages: string[]; // ['en', 'de', ...]
sampleText?: string; // Sample text for preview
avatarUrl?: string; // Avatar image URL
sampleUrl?: string; // Sample audio URL
isPublic: boolean; // Whether voice is public
verified: boolean; // Whether voice is verified
}Utility Functions
base64ToArrayBuffer
Convert base64 string to ArrayBuffer:
import { base64ToArrayBuffer } from 'kugelaudio';
const buffer = base64ToArrayBuffer(chunk.audio);decodePCM16
Convert base64 PCM16 to Float32Array for Web Audio API:
import { decodePCM16 } from 'kugelaudio';
const floatData = decodePCM16(chunk.audio);createWavFile
Create a WAV file from PCM16 data:
import { createWavFile } from 'kugelaudio';
const wavBuffer = createWavFile(pcmArrayBuffer, 24000);createWavBlob
Create a playable Blob from PCM16 data:
import { createWavBlob } from 'kugelaudio';
const blob = createWavBlob(pcmArrayBuffer, 24000);
const url = URL.createObjectURL(blob);Complete Example
import { KugelAudio, createWavBlob } from 'kugelaudio';
async function main() {
// Initialize client
const client = new KugelAudio({ apiKey: 'your_api_key' });
// List available models
console.log('Available Models:');
const models = await client.models.list();
for (const model of models) {
console.log(` - ${model.id}: ${model.name} (${model.parameters})`);
}
// List available voices
console.log('\nAvailable Voices:');
const voices = await client.voices.list({ limit: 5 });
for (const voice of voices) {
console.log(` - ${voice.id}: ${voice.name}`);
}
// Generate audio with streaming
console.log('\nGenerating audio (streaming)...');
const chunks: ArrayBuffer[] = [];
let ttfa: number | undefined;
const startTime = Date.now();
await client.tts.stream(
{
text: 'Welcome to KugelAudio. This is an example of high-quality text-to-speech synthesis.',
model: 'kugel-1-turbo',
},
{
onChunk: (chunk) => {
if (!ttfa) {
ttfa = Date.now() - startTime;
console.log(`Time to first audio: ${ttfa}ms`);
}
chunks.push(base64ToArrayBuffer(chunk.audio));
},
onFinal: (stats) => {
console.log(`Generated ${stats.durationMs}ms of audio`);
console.log(`Generation time: ${stats.generationMs}ms`);
console.log(`RTF: ${stats.rtf}x`);
},
}
);
}
main();Browser Support
The SDK works in modern browsers with WebSocket support. For Node.js, ensure you have a WebSocket implementation available.
License
MIT
