qt-ai-gateway-sdk

v1.0.2

Published

8 months ago

A WebSocket-based TTS client with real-time audio streaming and playback

0High
0Medium
0Low

qt-zw

tts websocket audio speech real-time streaming

TTS WebSocket Client

A high-performance JavaScript/TypeScript package for real-time Text-to-Speech (TTS) over WebSocket connections with advanced audio streaming capabilities.

Features

🚀 Real-time TTS: WebSocket-based communication for low-latency text-to-speech
🎵 Advanced Audio Playback: AudioWorklet-based PCM audio streaming with beat control
🔄 Auto-reconnection: Robust WebSocket connection management with automatic reconnection
📱 Cross-platform: Works in all modern browsers with Web Audio API support
🎛️ Latency Control: Adaptive playback rate adjustment for optimal audio quality
🔒 JWT Authentication: Secure WebSocket connections with JWT token support
📊 Real-time Statistics: Audio buffer and connection monitoring
🎯 TypeScript Support: Full TypeScript definitions included

Installation

npm install qt-ai-gateway-sdk

Quick Start

import TTSClient from 'qt-ai-gateway-sdk';

// Initialize the client
const ttsClient = new TTSClient({
  websocket: {
    url: 'wss://your-tts-server.com/ws',
    jwtToken: 'your-jwt-token'
  },
  onTextMessage: (content) => {
    console.log('Text message:', content);
  },
  onBSMessage: (content) => {
    console.log('BS message:', content);
  },
  onError: (error) => {
    console.error('TTS Error:', error);
  }
});

// Enable audio and send TTS (must be called from user gesture - click, touch, etc.)
document.getElementById('speakBtn').addEventListener('click', async () => {
  await ttsClient.enableAudio(); // Enable audio first
  await ttsClient.tts('Hello, this is a test message!', {role:'role', speed:1.0}); // Auto-initializes if needed
  console.log('TTS request sent!');
});

Configuration

TTSClientConfig

interface TTSClientConfig {
  websocket: WebSocketConfig;
  audio?: AudioConfig;
  onTextMessage?: TextCallback;
  onBSMessage?: BSCallback;
  onError?: (error: Error) => void;
  onConnect?: () => void;
  onDisconnect?: () => void;
}

WebSocketConfig

interface WebSocketConfig {
  url: string;                    // WebSocket server URL
  jwtToken: string;              // JWT authentication token
  reconnectAttempts?: number;    // Max reconnection attempts (default: 5)
  reconnectDelay?: number;       // Delay between reconnections in ms (default: 3000)
}

AudioConfig

interface AudioConfig {
  sampleRate?: number;    // Audio sample rate (default: 16000)
  channels?: number;      // Number of audio channels (default: 1)
  bufferSize?: number;    // Audio buffer size (default: 4096)
}

角色列表

172956 - 32 - 小学机灵鬼
172946 - 32 - 萝莉女友
095622 - 16 - 调皮女孩
095706 - 8 - 可爱小女孩
095747 - 8 - 成熟女性
095852 - 8 - 大妈
100056 - 8 - 可爱的女精灵
100157 - 16 - 可爱男精灵
100837 - 16 - 猥琐大叔
102130 - 16 - 小狐妖1
102147 - 32 - 妩媚女人
102210 - 32 - 小绿茶
102555 - 8 - 清爽帅哥
102640 - 8 - 磁性男神
103059 - 16 - 贱贱的帅哥
103200 - 16 - 邪恶大反派
105217 - 32 - 阳光开朗大男孩
105233 - 16 - 慢热暖男
105323 - 16 - 慈祥老公公
105350 - 32 - 老太监
170710 - 16 - 阿飞
171105 - 32 - 小当家
171309 - 32 - 邪剑仙
172011 - 16 - 台湾傲娇妹
172241 - 16 - 台湾甜美
172510 - 16 - 广西表妹

API Reference

TTSClient

Methods

`initialize(): Promise<void>`

Initializes the WebSocket connection and audio system. Note: This is called automatically when needed, so manual calling is optional.

`tts(content: string, role: string): Promise<void>`

Sends a TTS request. Automatically interrupts any currently playing audio.

await ttsClient.tts('Your text to speak', 'user');

`enableAudio(): Promise<void>`

Enables audio playback. Must be called from a user gesture (click, touch, etc.).

button.addEventListener('click', async () => {
  await ttsClient.enableAudio();
});

`disconnect(): void`

Disconnects the WebSocket connection.

`dispose(): Promise<void>`

Cleans up all resources including WebSocket and audio context.

`setTextCallback(callback: TextCallback): void`

Sets the callback for text messages (type 1000).

`setBSCallback(callback: BSCallback): void`

Sets the callback for BS messages (type 1001).

Status Methods

`getConnectionState(): ConnectionState`

Returns the current WebSocket connection state.

`getAudioState(): AudioState`

Returns the current audio playback state.

`isConnected(): boolean`

Returns true if WebSocket is connected.

`isPlaying(): boolean`

Returns true if audio is currently playing.

`getAudioStats(): Promise<AudioStats>`

Returns detailed audio statistics.

`isAudioEnabled(): boolean`

Returns true if audio is enabled and ready for playback.

Message Protocol

Outgoing Messages (Client → Server)

TTS Request

{
  "type": "tts",
  "content": "Text to convert to speech"
}

Authentication

{
  "type": "auth",
  "token": "your-jwt-token"
}

Incoming Messages (Server → Client)

Audio Data

Type: ArrayBuffer
Format: PCM, 1 channel, 16000 Hz, 16-bit signed integers
Usage: Automatically played through AudioWorklet

Text Messages

Format: String starting with "1000"
Example: "1000Your text message here"
Callback: onTextMessage(content)

BS (Business Service) Messages

Format: String starting with "1001"
Example: "1001Your BS data here"
Callback: onBSMessage(content)

Audio Features

AudioWorklet Processing

Low Latency: Direct audio processing in dedicated thread
Smooth Playback: Advanced buffering with underrun protection
Beat Control: Adaptive playback rate for latency management
Interruption Support: Seamless audio interruption for new TTS requests

Streaming Audio Support

Continuous Playback: Handles rapid audio stream chunks without interruption
Smart Buffering: Automatically appends new audio data to existing stream
Buffer Management: Intelligent cleanup of played audio data
Stream Detection: Distinguishes between new TTS requests and streaming chunks

Latency Management

Target Latency: 100ms default target
Max Latency: 300ms before rate adjustment
Adaptive Rate: Automatic playback speed adjustment (0.9x - 1.1x)
Smoothing: Gradual rate changes to avoid audio artifacts

Error Handling

const ttsClient = new TTSClient({
  // ... config
  onError: (error) => {
    switch (error.message) {
      case 'WebSocket is not connected':
        // Handle connection issues
        break;
      case 'Failed to initialize audio':
        // Handle audio system issues
        break;
      default:
        console.error('TTS Error:', error);
    }
  }
});

Important: User Gesture Requirement

⚠️ Modern browsers require user interaction before audio can be played. You must call enableAudio() from a user gesture (click, touch, keypress) before using TTS functionality.

// ✅ Correct - called from user event
button.addEventListener('click', async () => {
  await ttsClient.enableAudio();
  await ttsClient.tts('Now I can speak!', 'assistant');
});

// ❌ Wrong - called without user gesture
await ttsClient.tts('This will fail!', 'user'); // AudioContext error

Browser Compatibility

Chrome: 66+ (AudioWorklet support)
Firefox: 76+ (AudioWorklet support)
Safari: 14.1+ (AudioWorklet support)
Edge: 79+ (AudioWorklet support)

Examples

Basic Usage

import TTSClient from 'qt-ai-gateway-sdk';

const client = new TTSClient({
  websocket: {
    url: 'wss://api.example.com/tts',
    jwtToken: 'eyJhbGciOiJIUzI1NiIs...'
  },
  onTextMessage: (content) => {
    console.log('Server message:', content);
  },
  onBSMessage: (content) => {
    console.log('Business service message:', content);
  }
});

// Direct usage - auto-initializes when needed
// Must be called from user gesture for audio to work
button.addEventListener('click', async () => {
  await client.enableAudio(); // Enable audio first
  await client.tts('Hello World!', 'user'); // Auto-connects and initializes
});

Advanced Configuration

const client = new TTSClient({
  websocket: {
    url: 'wss://api.example.com/tts',
    jwtToken: 'your-token',
    reconnectAttempts: 10,
    reconnectDelay: 5000
  },
  audio: {
    sampleRate: 22050,
    channels: 1,
    bufferSize: 8192
  },
  onConnect: () => console.log('Connected!'),
  onDisconnect: () => console.log('Disconnected!'),
  onTextMessage: (msg) => console.log('Text:', msg),
  onBSMessage: (msg) => console.log('BS:', msg),
  onError: (err) => console.error('Error:', err)
});

Monitoring Audio Statistics

setInterval(async () => {
  const stats = await client.getAudioStats();
  console.log('Buffer size:', stats.bufferSize);
  console.log('Playback rate:', stats.playbackRate);
  console.log('Buffered duration:', stats.bufferedDuration);
}, 1000);

Development

Building

npm run build

Testing

npm test

Development Mode

npm run dev

License

MIT License - see LICENSE file for details.

Contributing

Fork the repository
Create your feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'Add some amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

Support

For issues and questions, please open an issue on GitHub or contact support at [email protected].