qt-ai-gateway-sdk
v1.0.2
Published
A WebSocket-based TTS client with real-time audio streaming and playback
Maintainers
Readme
TTS WebSocket Client
A high-performance JavaScript/TypeScript package for real-time Text-to-Speech (TTS) over WebSocket connections with advanced audio streaming capabilities.
Features
- 🚀 Real-time TTS: WebSocket-based communication for low-latency text-to-speech
- 🎵 Advanced Audio Playback: AudioWorklet-based PCM audio streaming with beat control
- 🔄 Auto-reconnection: Robust WebSocket connection management with automatic reconnection
- 📱 Cross-platform: Works in all modern browsers with Web Audio API support
- 🎛️ Latency Control: Adaptive playback rate adjustment for optimal audio quality
- 🔒 JWT Authentication: Secure WebSocket connections with JWT token support
- 📊 Real-time Statistics: Audio buffer and connection monitoring
- 🎯 TypeScript Support: Full TypeScript definitions included
Installation
npm install qt-ai-gateway-sdkQuick Start
import TTSClient from 'qt-ai-gateway-sdk';
// Initialize the client
const ttsClient = new TTSClient({
websocket: {
url: 'wss://your-tts-server.com/ws',
jwtToken: 'your-jwt-token'
},
onTextMessage: (content) => {
console.log('Text message:', content);
},
onBSMessage: (content) => {
console.log('BS message:', content);
},
onError: (error) => {
console.error('TTS Error:', error);
}
});
// Enable audio and send TTS (must be called from user gesture - click, touch, etc.)
document.getElementById('speakBtn').addEventListener('click', async () => {
await ttsClient.enableAudio(); // Enable audio first
await ttsClient.tts('Hello, this is a test message!', {role:'role', speed:1.0}); // Auto-initializes if needed
console.log('TTS request sent!');
});Configuration
TTSClientConfig
interface TTSClientConfig {
websocket: WebSocketConfig;
audio?: AudioConfig;
onTextMessage?: TextCallback;
onBSMessage?: BSCallback;
onError?: (error: Error) => void;
onConnect?: () => void;
onDisconnect?: () => void;
}WebSocketConfig
interface WebSocketConfig {
url: string; // WebSocket server URL
jwtToken: string; // JWT authentication token
reconnectAttempts?: number; // Max reconnection attempts (default: 5)
reconnectDelay?: number; // Delay between reconnections in ms (default: 3000)
}AudioConfig
interface AudioConfig {
sampleRate?: number; // Audio sample rate (default: 16000)
channels?: number; // Number of audio channels (default: 1)
bufferSize?: number; // Audio buffer size (default: 4096)
}角色列表
172956 - 32 - 小学机灵鬼
172946 - 32 - 萝莉女友
095622 - 16 - 调皮女孩
095706 - 8 - 可爱小女孩
095747 - 8 - 成熟女性
095852 - 8 - 大妈
100056 - 8 - 可爱的女精灵
100157 - 16 - 可爱男精灵
100837 - 16 - 猥琐大叔
102130 - 16 - 小狐妖1
102147 - 32 - 妩媚女人
102210 - 32 - 小绿茶
102555 - 8 - 清爽帅哥
102640 - 8 - 磁性男神
103059 - 16 - 贱贱的帅哥
103200 - 16 - 邪恶大反派
105217 - 32 - 阳光开朗大男孩
105233 - 16 - 慢热暖男
105323 - 16 - 慈祥老公公
105350 - 32 - 老太监
170710 - 16 - 阿飞
171105 - 32 - 小当家
171309 - 32 - 邪剑仙
172011 - 16 - 台湾傲娇妹
172241 - 16 - 台湾甜美
172510 - 16 - 广西表妹API Reference
TTSClient
Methods
initialize(): Promise<void>
Initializes the WebSocket connection and audio system. Note: This is called automatically when needed, so manual calling is optional.
tts(content: string, role: string): Promise<void>
Sends a TTS request. Automatically interrupts any currently playing audio.
await ttsClient.tts('Your text to speak', 'user');enableAudio(): Promise<void>
Enables audio playback. Must be called from a user gesture (click, touch, etc.).
button.addEventListener('click', async () => {
await ttsClient.enableAudio();
});disconnect(): void
Disconnects the WebSocket connection.
dispose(): Promise<void>
Cleans up all resources including WebSocket and audio context.
setTextCallback(callback: TextCallback): void
Sets the callback for text messages (type 1000).
setBSCallback(callback: BSCallback): void
Sets the callback for BS messages (type 1001).
Status Methods
getConnectionState(): ConnectionState
Returns the current WebSocket connection state.
getAudioState(): AudioState
Returns the current audio playback state.
isConnected(): boolean
Returns true if WebSocket is connected.
isPlaying(): boolean
Returns true if audio is currently playing.
getAudioStats(): Promise<AudioStats>
Returns detailed audio statistics.
isAudioEnabled(): boolean
Returns true if audio is enabled and ready for playback.
Message Protocol
Outgoing Messages (Client → Server)
TTS Request
{
"type": "tts",
"content": "Text to convert to speech"
}Authentication
{
"type": "auth",
"token": "your-jwt-token"
}Incoming Messages (Server → Client)
Audio Data
- Type:
ArrayBuffer - Format: PCM, 1 channel, 16000 Hz, 16-bit signed integers
- Usage: Automatically played through AudioWorklet
Text Messages
- Format: String starting with "1000"
- Example:
"1000Your text message here" - Callback:
onTextMessage(content)
BS (Business Service) Messages
- Format: String starting with "1001"
- Example:
"1001Your BS data here" - Callback:
onBSMessage(content)
Audio Features
AudioWorklet Processing
- Low Latency: Direct audio processing in dedicated thread
- Smooth Playback: Advanced buffering with underrun protection
- Beat Control: Adaptive playback rate for latency management
- Interruption Support: Seamless audio interruption for new TTS requests
Streaming Audio Support
- Continuous Playback: Handles rapid audio stream chunks without interruption
- Smart Buffering: Automatically appends new audio data to existing stream
- Buffer Management: Intelligent cleanup of played audio data
- Stream Detection: Distinguishes between new TTS requests and streaming chunks
Latency Management
- Target Latency: 100ms default target
- Max Latency: 300ms before rate adjustment
- Adaptive Rate: Automatic playback speed adjustment (0.9x - 1.1x)
- Smoothing: Gradual rate changes to avoid audio artifacts
Error Handling
const ttsClient = new TTSClient({
// ... config
onError: (error) => {
switch (error.message) {
case 'WebSocket is not connected':
// Handle connection issues
break;
case 'Failed to initialize audio':
// Handle audio system issues
break;
default:
console.error('TTS Error:', error);
}
}
});Important: User Gesture Requirement
⚠️ Modern browsers require user interaction before audio can be played. You must call enableAudio() from a user gesture (click, touch, keypress) before using TTS functionality.
// ✅ Correct - called from user event
button.addEventListener('click', async () => {
await ttsClient.enableAudio();
await ttsClient.tts('Now I can speak!', 'assistant');
});
// ❌ Wrong - called without user gesture
await ttsClient.tts('This will fail!', 'user'); // AudioContext errorBrowser Compatibility
- Chrome: 66+ (AudioWorklet support)
- Firefox: 76+ (AudioWorklet support)
- Safari: 14.1+ (AudioWorklet support)
- Edge: 79+ (AudioWorklet support)
Examples
Basic Usage
import TTSClient from 'qt-ai-gateway-sdk';
const client = new TTSClient({
websocket: {
url: 'wss://api.example.com/tts',
jwtToken: 'eyJhbGciOiJIUzI1NiIs...'
},
onTextMessage: (content) => {
console.log('Server message:', content);
},
onBSMessage: (content) => {
console.log('Business service message:', content);
}
});
// Direct usage - auto-initializes when needed
// Must be called from user gesture for audio to work
button.addEventListener('click', async () => {
await client.enableAudio(); // Enable audio first
await client.tts('Hello World!', 'user'); // Auto-connects and initializes
});Advanced Configuration
const client = new TTSClient({
websocket: {
url: 'wss://api.example.com/tts',
jwtToken: 'your-token',
reconnectAttempts: 10,
reconnectDelay: 5000
},
audio: {
sampleRate: 22050,
channels: 1,
bufferSize: 8192
},
onConnect: () => console.log('Connected!'),
onDisconnect: () => console.log('Disconnected!'),
onTextMessage: (msg) => console.log('Text:', msg),
onBSMessage: (msg) => console.log('BS:', msg),
onError: (err) => console.error('Error:', err)
});Monitoring Audio Statistics
setInterval(async () => {
const stats = await client.getAudioStats();
console.log('Buffer size:', stats.bufferSize);
console.log('Playback rate:', stats.playbackRate);
console.log('Buffered duration:', stats.bufferedDuration);
}, 1000);Development
Building
npm run buildTesting
npm testDevelopment Mode
npm run devLicense
MIT License - see LICENSE file for details.
Contributing
- Fork the repository
- Create your feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add some amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
Support
For issues and questions, please open an issue on GitHub or contact support at [email protected].
