kugelaudio

v0.1.7

Published

5 days ago

Official JavaScript/TypeScript SDK for KugelAudio TTS API

0High
0Medium
0Low

kajokr

tts text-to-speech audio streaming websocket kugelaudio

KugelAudio JavaScript/TypeScript SDK

Official JavaScript/TypeScript SDK for the KugelAudio Text-to-Speech API.

Installation

npm install kugelaudio

Or with yarn:

yarn add kugelaudio

Or with pnpm:

pnpm add kugelaudio

Quick Start

import { KugelAudio } from 'kugelaudio';

// Initialize the client - just needs an API key!
const client = new KugelAudio({ apiKey: 'your_api_key' });

// Generate speech
const audio = await client.tts.generate({
  text: 'Hello, world!',
  model: 'kugel-1-turbo',
});

// Create a playable blob (browser)
const blob = new Blob([audio.audio], { type: 'audio/wav' });
const url = URL.createObjectURL(blob);
const audioElement = new Audio(url);
audioElement.play();

Client Configuration

import { KugelAudio } from 'kugelaudio';

// Simple setup - single URL handles everything
const client = new KugelAudio({ apiKey: 'your_api_key' });

// Or with custom options
const client = new KugelAudio({
  apiKey: 'your_api_key',           // Required: Your API key
  apiUrl: 'https://api.kugelaudio.com',  // Optional: API base URL (default)
  timeout: 60000,                    // Optional: Request timeout in ms
});

Single URL Architecture

The SDK uses a single URL for both REST API and WebSocket streaming. The TTS server provides both REST endpoints (/v1/models, /v1/voices) and WebSocket (/ws/tts) - no proxy needed, minimal latency.

Local Development

For local development, point directly to your TTS server:

const client = new KugelAudio({
  apiKey: 'your_api_key',
  apiUrl: 'http://localhost:8000',   // TTS server handles everything
});

Or if you have separate backend and TTS servers:

const client = new KugelAudio({
  apiKey: 'your_api_key',
  apiUrl: 'http://localhost:8001',   // Backend for REST API
  ttsUrl: 'http://localhost:8000',   // TTS server for WebSocket streaming
});

Available Models

| Model ID | Name | Parameters | Description | |----------|------|------------|-------------| | kugel-1-turbo | Kugel 1 Turbo | 1.5B | Fast, low-latency model for real-time applications | | kugel-1 | Kugel 1 | 7B | Premium quality model for pre-recorded content |

List Available Models

const models = await client.models.list();

for (const model of models) {
  console.log(`${model.id}: ${model.name}`);
  console.log(`  Description: ${model.description}`);
  console.log(`  Parameters: ${model.parameters}`);
  console.log(`  Max Input: ${model.maxInputLength} characters`);
  console.log(`  Sample Rate: ${model.sampleRate} Hz`);
}

Voices

List Available Voices

// List all available voices
const voices = await client.voices.list();

for (const voice of voices) {
  console.log(`${voice.id}: ${voice.name}`);
  console.log(`  Category: ${voice.category}`);
  console.log(`  Languages: ${voice.supportedLanguages.join(', ')}`);
}

// Filter by language
const germanVoices = await client.voices.list({ language: 'de' });

// Get only public voices
const publicVoices = await client.voices.list({ includePublic: true });

// Limit results
const first10 = await client.voices.list({ limit: 10 });

Get a Specific Voice

const voice = await client.voices.get(123);
console.log(`Voice: ${voice.name}`);
console.log(`Sample text: ${voice.sampleText}`);

Text-to-Speech Generation

Basic Generation (Non-Streaming)

Generate complete audio and receive it all at once:

const audio = await client.tts.generate({
  text: 'Hello, this is a test of the KugelAudio text-to-speech system.',
  model: 'kugel-1-turbo',  // 'kugel-1-turbo' (fast) or 'kugel-1' (quality)
  voiceId: 123,              // Optional: specific voice ID
  cfgScale: 2.0,             // Guidance scale (1.0-5.0)
  maxNewTokens: 2048,        // Maximum tokens to generate
  sampleRate: 24000,         // Output sample rate
  speakerPrefix: true,       // Add speaker prefix for better quality
  normalize: true,           // Enable text normalization (see below)
  language: 'en',            // Language for normalization
});

// Audio properties
console.log(`Duration: ${audio.durationMs}ms`);
console.log(`Samples: ${audio.samples}`);
console.log(`Sample rate: ${audio.sampleRate} Hz`);
console.log(`Generation time: ${audio.generationMs}ms`);
console.log(`RTF: ${audio.rtf}`);  // Real-time factor

// audio.audio is an ArrayBuffer with PCM16 data

Playing Audio in Browser

import { createWavBlob } from 'kugelaudio';

const audio = await client.tts.generate({
  text: 'Hello, world!',
  model: 'kugel-1-turbo',
});

// Create WAV blob for playback
const wavBlob = createWavBlob(audio.audio, audio.sampleRate);
const url = URL.createObjectURL(wavBlob);

// Play with Audio element
const audioElement = new Audio(url);
audioElement.play();

// Or with Web Audio API
const audioContext = new AudioContext();
const arrayBuffer = await wavBlob.arrayBuffer();
const audioBuffer = await audioContext.decodeAudioData(arrayBuffer);
const source = audioContext.createBufferSource();
source.buffer = audioBuffer;
source.connect(audioContext.destination);
source.start();

Streaming Audio Output

Receive audio chunks as they are generated for lower latency:

await client.tts.stream(
  {
    text: 'Hello, this is streaming audio.',
    model: 'kugel-1-turbo',
  },
  {
    onOpen: () => {
      console.log('WebSocket connected');
    },
    onChunk: (chunk) => {
      console.log(`Chunk ${chunk.index}: ${chunk.samples} samples`);
      // chunk.audio is base64-encoded PCM16 data
      // Use base64ToArrayBuffer() to decode
      playAudioChunk(chunk);
    },
    onFinal: (stats) => {
      console.log(`Total duration: ${stats.durationMs}ms`);
      console.log(`Time to first audio: ${stats.ttfaMs}ms`);
      console.log(`Generation time: ${stats.generationMs}ms`);
      console.log(`RTF: ${stats.rtf}`);
    },
    onError: (error) => {
      console.error('TTS error:', error);
    },
    onClose: () => {
      console.log('WebSocket closed');
    },
  }
);

Processing Audio Chunks

import { base64ToArrayBuffer, decodePCM16 } from 'kugelaudio';

// In streaming callback:
onChunk: (chunk) => {
  // Decode base64 to ArrayBuffer
  const pcmBuffer = base64ToArrayBuffer(chunk.audio);
  
  // Convert PCM16 to Float32 for Web Audio API
  const float32Data = decodePCM16(chunk.audio);
  
  // Play with Web Audio API
  const audioBuffer = audioContext.createBuffer(1, float32Data.length, chunk.sampleRate);
  audioBuffer.copyToChannel(float32Data, 0);
  
  const source = audioContext.createBufferSource();
  source.buffer = audioBuffer;
  source.connect(audioContext.destination);
  source.start();
}

Text Normalization

Text normalization converts numbers, dates, times, and other non-verbal text into spoken words. For example:

"I have 3 apples" → "I have three apples"
"The meeting is at 2:30 PM" → "The meeting is at two thirty PM"
"€50.99" → "fifty euros and ninety-nine cents"

Usage

// With explicit language (recommended - fastest)
const audio = await client.tts.generate({
  text: 'I bought 3 items for €50.99 on 01/15/2024.',
  normalize: true,
  language: 'en',  // Specify language for best performance
});

// With auto-detection (adds ~150ms latency)
const audio = await client.tts.generate({
  text: 'Ich habe 3 Artikel für 50,99€ gekauft.',
  normalize: true,
  // language not specified - will auto-detect
});

Supported Languages

| Code | Language | Code | Language | |------|----------|------|----------| | de | German | nl | Dutch | | en | English | pl | Polish | | fr | French | sv | Swedish | | es | Spanish | da | Danish | | it | Italian | no | Norwegian | | pt | Portuguese | fi | Finnish | | cs | Czech | hu | Hungarian | | ro | Romanian | el | Greek | | uk | Ukrainian | bg | Bulgarian | | tr | Turkish | vi | Vietnamese | | ar | Arabic | hi | Hindi | | zh | Chinese | ja | Japanese | | ko | Korean | | |

Performance Warning

⚠️ Latency Warning: Using normalize: true without specifying language adds approximately 150ms latency for language auto-detection. For best performance in latency-sensitive applications, always specify the language parameter.

Error Handling

import { KugelAudio } from 'kugelaudio';
import {
  KugelAudioError,
  AuthenticationError,
  RateLimitError,
  InsufficientCreditsError,
  ValidationError,
  ConnectionError,
} from 'kugelaudio';

try {
  const audio = await client.tts.generate({ text: 'Hello!' });
} catch (error) {
  if (error instanceof AuthenticationError) {
    console.error('Invalid API key');
  } else if (error instanceof RateLimitError) {
    console.error('Rate limit exceeded, please wait');
  } else if (error instanceof InsufficientCreditsError) {
    console.error('Not enough credits, please top up');
  } else if (error instanceof ValidationError) {
    console.error(`Invalid request: ${error.message}`);
  } else if (error instanceof ConnectionError) {
    console.error('Failed to connect to server');
  } else if (error instanceof KugelAudioError) {
    console.error(`API error: ${error.message}`);
  }
}

TypeScript Types

KugelAudioOptions

interface KugelAudioOptions {
  apiKey: string;      // Required
  apiUrl?: string;     // Default: 'https://api.kugelaudio.com'
  ttsUrl?: string;     // Default: same as apiUrl (backend proxies to TTS)
  timeout?: number;    // Default: 60000 (ms)
}

GenerateOptions

interface GenerateOptions {
  text: string;            // Required: Text to synthesize
  model?: string;          // Default: 'kugel-1-turbo'
  voiceId?: number;        // Optional: Voice ID
  cfgScale?: number;       // Default: 2.0
  maxNewTokens?: number;   // Default: 2048
  sampleRate?: number;     // Default: 24000
  speakerPrefix?: boolean; // Default: true
  normalize?: boolean;     // Default: false - Enable text normalization
  language?: string;       // ISO 639-1 code for normalization (e.g., 'en', 'de')
}

⚠️ Note: Using normalize: true without language adds ~150ms latency for auto-detection.

AudioChunk

interface AudioChunk {
  audio: string;       // Base64-encoded PCM16 audio
  encoding: string;    // 'pcm_s16le'
  index: number;       // Chunk index (0-based)
  sampleRate: number;  // Sample rate (24000)
  samples: number;     // Number of samples in chunk
}

AudioResponse

interface AudioResponse {
  audio: ArrayBuffer;     // Complete PCM16 audio
  sampleRate: number;     // Sample rate (24000)
  samples: number;        // Total samples
  durationMs: number;     // Duration in milliseconds
  generationMs: number;   // Generation time in milliseconds
  rtf: number;           // Real-time factor
}

GenerationStats

interface GenerationStats {
  final: true;
  chunks: number;         // Number of chunks generated
  totalSamples: number;   // Total samples generated
  durationMs: number;     // Audio duration in ms
  generationMs: number;   // Generation time in ms
  ttfaMs: number;         // Time to first audio in ms
  rtf: number;           // Real-time factor
}

StreamCallbacks

interface StreamCallbacks {
  onOpen?: () => void;
  onChunk?: (chunk: AudioChunk) => void;
  onFinal?: (stats: GenerationStats) => void;
  onError?: (error: Error) => void;
  onClose?: () => void;
}

Model

interface Model {
  id: string;             // 'kugel-1-turbo' or 'kugel-1'
  name: string;           // Human-readable name
  description: string;    // Model description
  parameters: string;     // Parameter count ('1.5B', '7B')
  maxInputLength: number; // Maximum input characters
  sampleRate: number;     // Output sample rate
}

Voice

interface Voice {
  id: number;                    // Voice ID
  name: string;                  // Voice name
  description?: string;          // Description
  category?: VoiceCategory;      // 'premade' | 'cloned' | 'generated'
  sex?: VoiceSex;               // 'male' | 'female' | 'neutral'
  age?: VoiceAge;               // 'young' | 'middle_aged' | 'old'
  supportedLanguages: string[]; // ['en', 'de', ...]
  sampleText?: string;          // Sample text for preview
  avatarUrl?: string;           // Avatar image URL
  sampleUrl?: string;           // Sample audio URL
  isPublic: boolean;            // Whether voice is public
  verified: boolean;            // Whether voice is verified
}

Utility Functions

base64ToArrayBuffer

Convert base64 string to ArrayBuffer:

import { base64ToArrayBuffer } from 'kugelaudio';

const buffer = base64ToArrayBuffer(chunk.audio);

decodePCM16

Convert base64 PCM16 to Float32Array for Web Audio API:

import { decodePCM16 } from 'kugelaudio';

const floatData = decodePCM16(chunk.audio);

createWavFile

Create a WAV file from PCM16 data:

import { createWavFile } from 'kugelaudio';

const wavBuffer = createWavFile(pcmArrayBuffer, 24000);

createWavBlob

Create a playable Blob from PCM16 data:

import { createWavBlob } from 'kugelaudio';

const blob = createWavBlob(pcmArrayBuffer, 24000);
const url = URL.createObjectURL(blob);

Complete Example

import { KugelAudio, createWavBlob } from 'kugelaudio';

async function main() {
  // Initialize client
  const client = new KugelAudio({ apiKey: 'your_api_key' });

  // List available models
  console.log('Available Models:');
  const models = await client.models.list();
  for (const model of models) {
    console.log(`  - ${model.id}: ${model.name} (${model.parameters})`);
  }

  // List available voices
  console.log('\nAvailable Voices:');
  const voices = await client.voices.list({ limit: 5 });
  for (const voice of voices) {
    console.log(`  - ${voice.id}: ${voice.name}`);
  }

  // Generate audio with streaming
  console.log('\nGenerating audio (streaming)...');
  const chunks: ArrayBuffer[] = [];
  let ttfa: number | undefined;
  const startTime = Date.now();

  await client.tts.stream(
    {
      text: 'Welcome to KugelAudio. This is an example of high-quality text-to-speech synthesis.',
      model: 'kugel-1-turbo',
    },
    {
      onChunk: (chunk) => {
        if (!ttfa) {
          ttfa = Date.now() - startTime;
          console.log(`Time to first audio: ${ttfa}ms`);
        }
        chunks.push(base64ToArrayBuffer(chunk.audio));
      },
      onFinal: (stats) => {
        console.log(`Generated ${stats.durationMs}ms of audio`);
        console.log(`Generation time: ${stats.generationMs}ms`);
        console.log(`RTF: ${stats.rtf}x`);
      },
    }
  );
}

main();

Browser Support

The SDK works in modern browsers with WebSocket support. For Node.js, ensure you have a WebSocket implementation available.

License

MIT