@tamnt-work/tts-stream

v1.0.0

Published

8 months ago

Framework-agnostic text-to-speech library with streaming support and multilingual capabilities

🎙️ TTS Stream

The ultimate AI-powered realtime text-to-speech library for streaming LLM responses with zero latency

Perfect for OpenAI, Claude, Gemini, and any streaming AI assistant

🎯 Client-Side Library: TTS Stream runs in the browser only. Speech synthesis requires Web Audio API. Your server streams text responses, and the browser speaks them in real-time.

Quick Start • Examples • API Reference • Demo

🚀 Why Choose TTS Stream for AI Applications?

⚡ Zero-Latency AI Speech

Transform your AI assistants with instant voice feedback. No more waiting for complete responses - users hear speech as your AI generates text, creating natural conversational experiences.

🎯 Built for Modern AI Workflows

✅ OpenAI GPT-4 streaming responses
✅ Claude realtime conversations
✅ Gemini voice interactions
✅ Custom LLM streaming outputs
✅ Chatbot voice integration

✨ Core Features

🤖 AI-First Design: Specifically optimized for streaming LLM responses
📡 Real-time Streaming: Speak text as it arrives - perfect for AI chat applications
🌍 Multilingual AI: Auto voice selection for 30+ languages with AI context switching
⚡ Smart Queueing: Prevents overlapping speech with intelligent buffer management
🧹 Auto Cleanup: Automatic cleanup on page unload/reload
🛡️ Error Handling: Robust error handling with AI response fallback strategies
⚛️ Framework Ready: Ready-to-use hooks/composables for React, Vue, Svelte
🎛️ AI-Tuned: Optimal speech settings for conversational AI (rate, pitch, timing)
📦 Zero Dependencies: Lightweight and fast - no external API calls needed
🔄 Event-driven: Subscribe to speech events for better UX integration

📦 Installation

Choose your preferred package manager:

npm install @tamnt-work/tts-stream

yarn add @tamnt-work/tts-stream

pnpm add @tamnt-work/tts-stream

bun add @tamnt-work/tts-stream

🚀 Quick Start

🤖 Client-Side AI Streaming (Real Architecture)

// 🎯 CLIENT-SIDE ONLY - Browser speech synthesis
import { TextToSpeechStream } from '@tamnt-work/tts-stream';

// Initialize TTS in browser
const tts = new TextToSpeechStream({
  defaultLanguage: 'en-US',
  rate: 1.1,        // Optimal for conversation
  pitch: 1.0,
  volume: 1.0,
  bufferTimeout: 800  // Reduced for faster AI response
});

// ✨ Fetch streaming AI response from your API
async function streamAIResponse(userMessage: string) {
  try {
    // Call your backend API that proxies to OpenAI/Claude
    const response = await fetch('/api/ai/chat', {
      method: 'POST',
      headers: {
        'Content-Type': 'application/json',
      },
      body: JSON.stringify({ 
        message: userMessage,
        stream: true 
      })
    });

    if (!response.body) throw new Error('No response body');

    const reader = response.body.getReader();
    const decoder = new TextDecoder();

    // 🔥 Read and speak each chunk immediately!
    while (true) {
      const { done, value } = await reader.read();
      if (done) break;

      const chunk = decoder.decode(value);
      
      // Parse SSE or JSON chunks from your API
      const lines = chunk.split('\n');
      for (const line of lines) {
        if (line.startsWith('data: ')) {
          const data = line.slice(6);
          if (data === '[DONE]') break;
          
          try {
            const parsed = JSON.parse(data);
            if (parsed.content) {
              // 🎤 Speak immediately as text arrives
              tts.speakStream(parsed.content, "en-US");
            }
          } catch (e) {
            // Handle text chunks directly
            if (data.trim()) {
              tts.speakStream(data, "en-US");
            }
          }
        }
      }
    }
  } catch (error) {
    console.error('Streaming error:', error);
    tts.speak("Sorry, there was an error with the AI response.", "en-US");
  }
}

// Usage - runs in browser only
streamAIResponse("Tell me about space exploration");
// User immediately hears: "Space exploration is fascinating..."

🌐 Universal Client Pattern

// Works with any streaming AI API endpoint
async function streamFromAnyAI(endpoint: string, prompt: string) {
  const tts = new TextToSpeechStream({
    defaultLanguage: 'en-US',
    rate: 1.1,
    bufferTimeout: 500
  });

  const response = await fetch(endpoint, {
    method: 'POST',
    headers: { 'Content-Type': 'application/json' },
    body: JSON.stringify({ 
      prompt, 
      stream: true,
      // Add your API specific parameters
    })
  });

  const reader = response.body?.getReader();
  if (!reader) return;

  while (true) {
    const { done, value } = await reader.read();
    if (done) break;

    const text = new TextDecoder().decode(value);
    
    // 🚀 Speak any text chunk immediately
    if (text.trim()) {
      tts.speakStream(text, "en-US");
    }
  }
}

// Examples of different API endpoints
streamFromAnyAI('/api/openai/stream', 'What is machine learning?');
streamFromAnyAI('/api/claude/stream', 'Explain quantum computing');
streamFromAnyAI('/api/gemini/stream', 'Tell me about space');

📱 Basic Usage

import { TextToSpeechStream } from '@tamnt-work/tts-stream';

const tts = new TextToSpeechStream({
  defaultLanguage: 'en-US',
  rate: 1.2,
  pitch: 1.0,
  volume: 1.0
});

// Basic speech
await tts.speak("Hello world!", "en-US");

// Streaming speech (perfect for any realtime text)
tts.speakStream("Hello ", "en-US");
tts.speakStream("from ", "en-US");
tts.speakStream("streaming!", "en-US");

// Event listeners for AI integration
tts.on('start', () => console.log('🎤 AI is speaking'));
tts.on('end', () => console.log('✅ AI finished speaking'));
tts.on('error', (error) => console.error('❌ Speech error:', error));

React Hook

import React from 'react';
import { useTextToSpeech } from '@tamnt-work/tts-stream/react';

function MyComponent() {
  const { isSpeaking, speak, speakStream, stop, voices } = useTextToSpeech({
    defaultLanguage: 'en-US',
    rate: 1.1
  });

  const handleSpeak = async () => {
    await speak("Hello from React! 🚀", "en-US");
  };

  const handleStream = () => {
    speakStream("Streaming ", "en-US");
    speakStream("text ", "en-US");
    speakStream("works great! ⚡", "en-US");
  };

  return (
    <div className="flex gap-4 p-6">
      <button 
        onClick={handleSpeak}
        className="px-4 py-2 bg-blue-500 text-white rounded"
      >
        Speak
      </button>
      <button 
        onClick={handleStream}
        className="px-4 py-2 bg-green-500 text-white rounded"
      >
        Stream
      </button>
      <button 
        onClick={stop} 
        disabled={!isSpeaking}
        className="px-4 py-2 bg-red-500 text-white rounded disabled:opacity-50"
      >
        Stop
      </button>
      <div className="flex items-center gap-2">
        <span className={`w-2 h-2 rounded-full ${isSpeaking ? 'bg-green-400 animate-pulse' : 'bg-gray-400'}`} />
        <p>{isSpeaking ? 'Speaking...' : 'Ready'}</p>
        <p className="text-sm text-gray-500">({voices.length} voices available)</p>
      </div>
    </div>
  );
}

Vue Composable

<template>
  <div class="flex gap-4 p-6">
    <button @click="handleSpeak" class="px-4 py-2 bg-blue-500 text-white rounded">
      Speak
    </button>
    <button @click="handleStream" class="px-4 py-2 bg-green-500 text-white rounded">
      Stream
    </button>
    <button @click="stop" :disabled="!isSpeaking" class="px-4 py-2 bg-red-500 text-white rounded disabled:opacity-50">
      Stop
    </button>
    <div class="flex items-center gap-2">
      <span :class="`w-2 h-2 rounded-full ${isSpeaking ? 'bg-green-400 animate-pulse' : 'bg-gray-400'}`" />
      <p>{{ isSpeaking ? 'Speaking...' : 'Ready' }}</p>
      <p class="text-sm text-gray-500">({{ voices.length }} voices available)</p>
    </div>
  </div>
</template>

<script setup>
import { useTextToSpeech } from '@tamnt-work/tts-stream/vue';

const { isSpeaking, speak, speakStream, stop, voices } = useTextToSpeech({
  defaultLanguage: 'en-US',
  rate: 1.1
});

const handleSpeak = async () => {
  await speak("Hello from Vue! 🌟", "en-US");
};

const handleStream = () => {
  speakStream("Streaming ", "en-US");
  speakStream("text ", "en-US");
  speakStream("in Vue! 🔥", "en-US");
};
</script>

Svelte Store

<script>
  import { createTextToSpeechStore } from '@tamnt-work/tts-stream/svelte';
  import { onDestroy } from 'svelte';

  const ttsStore = createTextToSpeechStore({
    defaultLanguage: 'en-US',
    rate: 1.1
  });

  const handleSpeak = async () => {
    await ttsStore.speak("Hello from Svelte! ⚡", "en-US");
  };

  const handleStream = () => {
    ttsStore.speakStream("Streaming ", "en-US");
    ttsStore.speakStream("text ", "en-US");
    ttsStore.speakStream("in Svelte! 🎯", "en-US");
  };

  // Cleanup on destroy
  onDestroy(() => {
    ttsStore.destroy();
  });
</script>

<div class="flex gap-4 p-6">
  <button on:click={handleSpeak} class="px-4 py-2 bg-blue-500 text-white rounded">
    Speak
  </button>
  <button on:click={handleStream} class="px-4 py-2 bg-green-500 text-white rounded">
    Stream
  </button>
  <button 
    on:click={() => ttsStore.stop()} 
    disabled={!$ttsStore.isSpeaking}
    class="px-4 py-2 bg-red-500 text-white rounded disabled:opacity-50"
  >
    Stop
  </button>
  <div class="flex items-center gap-2">
    <span class="w-2 h-2 rounded-full {$ttsStore.isSpeaking ? 'bg-green-400 animate-pulse' : 'bg-gray-400'}" />
    <p>{$ttsStore.isSpeaking ? 'Speaking...' : 'Ready'}</p>
    <p class="text-sm text-gray-500">({$ttsStore.voices.length} voices available)</p>
  </div>
</div>

💡 AI Integration Examples

🚀 Complete Client-Side AI Chatbot

// 🎯 CLIENT-SIDE ONLY - All speech happens in browser
import { TextToSpeechStream } from '@tamnt-work/tts-stream';

class VoiceAIChatbot {
  private tts: TextToSpeechStream;
  private apiEndpoint: string;

  constructor(apiEndpoint: string = '/api/ai/chat') {
    this.apiEndpoint = apiEndpoint;
    
    // Optimized settings for AI conversation
    this.tts = new TextToSpeechStream({
      defaultLanguage: 'en-US',
      rate: 1.15,           // Natural conversation speed
      pitch: 1.0,
      volume: 1.0,
      bufferTimeout: 600,   // Quick response time
      maxBufferLength: 50   // Shorter buffers for immediate speech
    });
    
    // Set up AI speech events
    this.tts.on('start', () => this.onAISpeechStart());
    this.tts.on('end', () => this.onAISpeechEnd());
  }

  async chat(userMessage: string): Promise<void> {
    console.log('🎤 User:', userMessage);
    
    try {
      // Call your backend API (which handles OpenAI/Claude/etc)
      const response = await fetch(this.apiEndpoint, {
        method: 'POST',
        headers: {
          'Content-Type': 'application/json',
        },
        body: JSON.stringify({
          message: userMessage,
          stream: true,
          model: "gpt-4" // Your backend decides which AI to use
        })
      });

      if (!response.body) throw new Error('No response stream');

      const reader = response.body.getReader();
      const decoder = new TextDecoder();

      // 🔥 Real-time speech generation in browser
      while (true) {
        const { done, value } = await reader.read();
        if (done) break;

        const chunk = decoder.decode(value);
        
        // Handle different streaming formats
        if (chunk.includes('data: ')) {
          const lines = chunk.split('\n');
          for (const line of lines) {
            if (line.startsWith('data: ')) {
              const data = line.slice(6).trim();
              if (data && data !== '[DONE]') {
                try {
                  const parsed = JSON.parse(data);
                  if (parsed.content || parsed.text || parsed.delta) {
                    const text = parsed.content || parsed.text || parsed.delta;
                    this.tts.speakStream(text, "en-US");
                  }
                } catch (e) {
                  // Handle plain text chunks
                  if (data.trim()) {
                    this.tts.speakStream(data, "en-US");
                  }
                }
              }
            }
          }
        } else if (chunk.trim()) {
          // Direct text streaming
          this.tts.speakStream(chunk, "en-US");
        }
      }
    } catch (error) {
      console.error('Chat error:', error);
      this.tts.speak("Sorry, I encountered an error. Please try again.", "en-US");
    }
  }

  private onAISpeechStart() {
    console.log('🤖 AI started speaking...');
    // Show speaking indicator in UI
    document.querySelector('.ai-speaking')?.classList.add('active');
  }

  private onAISpeechEnd() {
    console.log('✅ AI finished speaking');
    // Hide speaking indicator, enable user input
    document.querySelector('.ai-speaking')?.classList.remove('active');
    document.querySelector('input')?.removeAttribute('disabled');
  }

  stopSpeaking() {
    this.tts.stop();
  }

  destroy() {
    this.tts.destroy();
  }
}

// Usage in browser
const chatbot = new VoiceAIChatbot('/api/openai/stream');
chatbot.chat("What's the weather like today?");
// User immediately hears AI response as it streams from your API!

🌐 Multi-Model Client-Side Assistant

// 🎯 CLIENT-SIDE ONLY - Speech synthesis in browser
import { TextToSpeechStream } from '@tamnt-work/tts-stream';

class MultiAIVoiceAssistant {
  private tts: TextToSpeechStream;

  constructor() {
    this.tts = new TextToSpeechStream({
      defaultLanguage: 'en-US',
      rate: 1.1,
      bufferTimeout: 500 // Ultra-fast response
    });
  }

  // OpenAI GPT-4 with voice (via your API)
  async askGPT(question: string) {
    const response = await fetch('/api/openai/stream', {
      method: 'POST',
      headers: { 'Content-Type': 'application/json' },
      body: JSON.stringify({ 
        message: question,
        model: 'gpt-4',
        stream: true 
      })
    });

    await this.processStreamResponse(response);
  }

  // Claude with voice (via your API)
  async askClaude(question: string) {
    const response = await fetch('/api/claude/stream', {
      method: 'POST',
      headers: { 'Content-Type': 'application/json' },
      body: JSON.stringify({ 
        message: question,
        model: 'claude-3-sonnet-20240229',
        stream: true 
      })
    });

    await this.processStreamResponse(response);
  }

  // Gemini with voice (via your API)
  async askGemini(question: string) {
    const response = await fetch('/api/gemini/stream', {
      method: 'POST',
      headers: { 'Content-Type': 'application/json' },
      body: JSON.stringify({ 
        message: question,
        model: 'gemini-pro',
        stream: true 
      })
    });

    await this.processStreamResponse(response);
  }

  // Universal streaming processor
  private async processStreamResponse(response: Response) {
    if (!response.body) return;

    const reader = response.body.getReader();
    const decoder = new TextDecoder();

    while (true) {
      const { done, value } = await reader.read();
      if (done) break;

      const chunk = decoder.decode(value);
      
      // Handle SSE format
      if (chunk.includes('data: ')) {
        const lines = chunk.split('\n');
        for (const line of lines) {
          if (line.startsWith('data: ')) {
            const data = line.slice(6);
            if (data.trim() && data !== '[DONE]') {
              try {
                const parsed = JSON.parse(data);
                const text = parsed.content || parsed.text || parsed.delta || parsed.message;
                if (text) {
                  this.tts.speakStream(text, "en-US");
                }
              } catch (e) {
                // Handle plain text
                if (data.trim()) {
                  this.tts.speakStream(data, "en-US");
                }
              }
            }
          }
        }
      } else {
        // Direct text streaming
        if (chunk.trim()) {
          this.tts.speakStream(chunk, "en-US");
        }
      }
    }
  }

  stop() {
    this.tts.stop();
  }

  destroy() {
    this.tts.destroy();
  }
}

// Usage in browser
const assistant = new MultiAIVoiceAssistant();
assistant.askGPT("Explain machine learning");
assistant.askClaude("What is quantum computing?"); 
assistant.askGemini("Tell me about space exploration");

🎯 React AI Chat Component

import React, { useState, useCallback } from 'react';
import { useTextToSpeech } from '@tamnt-work/tts-stream/react';

function AIChatInterface() {
  const [messages, setMessages] = useState([]);
  const [isAISpeaking, setIsAISpeaking] = useState(false);
  const [userInput, setUserInput] = useState('');

  const { tts, isSpeaking } = useTextToSpeech({
    defaultLanguage: 'en-US',
    rate: 1.1,
    bufferTimeout: 500
  });

  const streamAIResponse = useCallback(async (userMessage: string) => {
    setMessages(prev => [...prev, { role: 'user', content: userMessage }]);
    setIsAISpeaking(true);

    try {
      const response = await fetch('/api/ai/stream', {
        method: 'POST',
        headers: { 'Content-Type': 'application/json' },
        body: JSON.stringify({ message: userMessage })
      });

      const reader = response.body?.getReader();
      if (!reader) return;

      let aiResponse = '';
      while (true) {
        const { done, value } = await reader.read();
        if (done) break;

        const chunk = new TextDecoder().decode(value);
        aiResponse += chunk;
        
        // 🎯 Speak immediately as text arrives
        tts?.speakStream(chunk, "en-US");
      }

      setMessages(prev => [...prev, { role: 'ai', content: aiResponse }]);
    } finally {
      setIsAISpeaking(false);
    }
  }, [tts]);

  return (
    <div className="ai-chat-interface">
      <div className="messages">
        {messages.map((msg, idx) => (
          <div key={idx} className={`message ${msg.role}`}>
            {msg.content}
          </div>
        ))}
      </div>
      
      <div className="input-area">
        <input
          value={userInput}
          onChange={(e) => setUserInput(e.target.value)}
          placeholder="Ask AI anything..."
          disabled={isAISpeaking}
        />
        <button 
          onClick={() => {
            streamAIResponse(userInput);
            setUserInput('');
          }}
          disabled={isAISpeaking}
        >
          {isAISpeaking ? 'AI Speaking...' : 'Send'}
        </button>
        <button onClick={() => tts?.stop()}>Stop Speech</button>
      </div>

      {isSpeaking && (
        <div className="speaking-indicator">
          🎤 AI is speaking...
        </div>
      )}
    </div>
  );
}

🏗️ Client-Server Architecture

┌─────────────────┐    HTTP Request     ┌──────────────────┐    API Call
│                 │ ──────────────────► │                  │ ──────────────► OpenAI/Claude
│   Browser       │                     │   Your Server    │                  Gemini/etc
│   (TTS Stream)  │ ◄────────────────── │   (API Proxy)    │ ◄────────────── 
│                 │   Streaming Text    │                  │   Streaming Text
└─────────────────┘                     └──────────────────┘

🎤 Speech synthesis happens ONLY in browser
📡 Your server just proxies/streams AI responses  
🔐 API keys stay secure on your server

💡 Any Streaming AI Integration

// 🎯 CLIENT-SIDE ONLY - Universal pattern for any AI
async function integrateAnyStreamingAI(apiEndpoint: string, prompt: string) {
  const tts = new TextToSpeechStream({
    defaultLanguage: 'en-US',
    rate: 1.0,
    bufferTimeout: 400
  });

  try {
    // Your server handles the AI API calls
    const response = await fetch(apiEndpoint, {
      method: 'POST',
      headers: { 'Content-Type': 'application/json' },
      body: JSON.stringify({
        prompt: prompt,
        stream: true,
        // Add any AI-specific parameters
      })
    });

    if (!response.body) return;

    const reader = response.body.getReader();
    const decoder = new TextDecoder();

    while (true) {
      const { done, value } = await reader.read();
      if (done) break;

      const chunk = decoder.decode(value);
      
      // 🚀 Speak immediately - works with any streaming format
      if (chunk.trim()) {
        // Handle JSON streaming
        if (chunk.startsWith('{')) {
          try {
            const data = JSON.parse(chunk);
            const text = data.response || data.content || data.text || data.message;
            if (text) tts.speakStream(text, "en-US");
          } catch (e) {
            tts.speakStream(chunk, "en-US");
          }
        } else {
          // Handle plain text streaming
          tts.speakStream(chunk, "en-US");
        }
      }
    }
  } catch (error) {
    console.error('AI streaming error:', error);
    tts.speak("Sorry, there was an error connecting to the AI.", "en-US");
  }
}

// Examples - all run in browser, call your server APIs
integrateAnyStreamingAI('/api/openai/stream', 'What is AI?');
integrateAnyStreamingAI('/api/claude/stream', 'Explain quantum physics');
integrateAnyStreamingAI('/api/ollama/stream', 'Tell me a story'); // Local LLM
integrateAnyStreamingAI('/api/custom-llm/stream', 'Help me code'); // Your custom AI

🔧 Server-Side Example (Node.js)

// Example server endpoint that your client calls
// This runs on your server and proxies to AI APIs

app.post('/api/openai/stream', async (req, res) => {
  try {
    const { message } = req.body;
    
    // Server calls OpenAI (API key stays secure)
    const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });
    
    const stream = await openai.chat.completions.create({
      model: "gpt-4",
      messages: [{ role: "user", content: message }],
      stream: true,
    });

    res.writeHead(200, {
      'Content-Type': 'text/plain',
      'Cache-Control': 'no-cache',
      'Connection': 'keep-alive',
    });

    // Stream AI response back to browser
    for await (const chunk of stream) {
      if (chunk.choices[0]?.delta?.content) {
        res.write(chunk.choices[0].delta.content);
      }
    }

    res.end();
  } catch (error) {
    res.status(500).json({ error: 'AI request failed' });
  }
});

// Browser receives text chunks and speaks them immediately!

Multi-language Support

const tts = new TextToSpeechStream();

// Mix different languages seamlessly
const multilingualGreeting = [
  { text: "Hello! ", language: "en-US" },
  { text: "Bonjour! ", language: "fr-FR" },
  { text: "¡Hola! ", language: "es-ES" },
  { text: "こんにちは！", language: "ja-JP" },
  { text: "Xin chào!", language: "vi-VN" }
];

multilingualGreeting.forEach(({ text, language }) => {
  tts.speakStream(text, language);
});

Custom Voice Selection

const tts = new TextToSpeechStream();

// Wait for voices to load
tts.on('voicesloaded', (voices) => {
  console.log('📢 Available voices:', voices.length);
  
  // Find specific voices
  const femaleVoices = voices.filter(v => v.name.toLowerCase().includes('female'));
  const maleVoices = voices.filter(v => v.name.toLowerCase().includes('male'));
  
  console.log('👩 Female voices:', femaleVoices.length);
  console.log('👨 Male voices:', maleVoices.length);
  
  // Group by language
  const voicesByLang = voices.reduce((acc, voice) => {
    const lang = voice.lang.split('-')[0];
    if (!acc[lang]) acc[lang] = [];
    acc[lang].push(voice);
    return acc;
  }, {});
  
  console.log('🌍 Languages available:', Object.keys(voicesByLang));
});

Advanced Error Handling

const tts = new TextToSpeechStream({
  defaultLanguage: 'en-US',
  rate: 1.0
});

tts.on('error', (error) => {
  console.error('🚨 Speech error:', error.error);
  
  switch (error.error) {
    case 'voice-unavailable':
      console.log('🔄 Voice not available, trying fallback...');
      tts.speak(error.utterance?.text || '', 'en-US');
      break;
      
    case 'synthesis-unavailable':
      console.log('❌ Speech synthesis not supported in this browser');
      // Show visual fallback (text display, etc.)
      break;
      
    case 'synthesis-failed':
      console.log('🔄 Synthesis failed, retrying...');
      setTimeout(() => {
        tts.speak(error.utterance?.text || '', 'en-US');
      }, 1000);
      break;
      
    default:
      console.log('❓ Unknown speech error:', error.error);
  }
});

// Graceful degradation
if (!window.speechSynthesis) {
  console.log('⚠️  Speech synthesis not supported, using visual fallback');
  // Implement visual text display or other fallback
}

📋 API Reference

TextToSpeechStream Class

Constructor Options

interface TextToSpeechOptions {
  defaultLanguage?: string;    // Default: 'vi-VN'
  rate?: number;              // Speech rate (0.1-10), Default: 1.1
  pitch?: number;             // Speech pitch (0-2), Default: 1.0
  volume?: number;            // Speech volume (0-1), Default: 1.0
  bufferTimeout?: number;     // Stream buffer timeout (ms), Default: 1000
  maxBufferLength?: number;   // Max buffer length, Default: 100
}

Methods

| Method | Description | Returns | |--------|-------------|---------| | speak(text, language?) | Speak text immediately | Promise<void> | | speakStream(text, language?) | Add text to streaming buffer | void | | stop() | Stop current speech and clear queue | void | | pause() | Pause current speech | void | | resume() | Resume paused speech | void | | getVoices() | Get available voices | SpeechSynthesisVoice[] | | isCurrentlySpeaking() | Check if currently speaking | boolean | | isPaused() | Check if speech is paused | boolean | | on(event, listener) | Add event listener | void | | off(event, listener) | Remove event listener | void | | destroy() | Cleanup and remove all listeners | void |

Events

| Event | Payload | Description | |-------|---------|-------------| | start | void | Speech started | | end | void | Speech ended | | pause | void | Speech paused | | resume | void | Speech resumed | | error | SpeechSynthesisErrorEvent | Speech error occurred | | voicesloaded | SpeechSynthesisVoice[] | Voices loaded/changed | | boundary | SpeechSynthesisEvent | Word/sentence boundary |

React Hook

interface UseTextToSpeechReturn {
  isSpeaking: boolean;
  isPaused: boolean;
  voices: SpeechSynthesisVoice[];
  speak: (text: string, language?: string) => Promise<void>;
  speakStream: (text: string, language?: string) => void;
  stop: () => void;
  pause: () => void;
  resume: () => void;
  tts: TextToSpeechStream | null;
}

const useTextToSpeech = (options?: TextToSpeechOptions): UseTextToSpeechReturn

Vue Composable

interface UseTextToSpeechReturn {
  isSpeaking: Ref<boolean>;
  isPaused: Ref<boolean>;
  voices: Ref<SpeechSynthesisVoice[]>;
  speak: (text: string, language?: string) => Promise<void>;
  speakStream: (text: string, language?: string) => void;
  stop: () => void;
  pause: () => void;
  resume: () => void;
  tts: Ref<TextToSpeechStream | null>;
}

const useTextToSpeech = (options?: TextToSpeechOptions): UseTextToSpeechReturn

Svelte Store

interface TextToSpeechStore {
  isSpeaking: boolean;
  isPaused: boolean;
  voices: SpeechSynthesisVoice[];
  tts: TextToSpeechStream | null;
}

interface TextToSpeechStoreReturn {
  subscribe: (fn: (value: TextToSpeechStore) => void) => () => void;
  speak: (text: string, language?: string) => Promise<void>;
  speakStream: (text: string, language?: string) => void;
  stop: () => void;
  pause: () => void;
  resume: () => void;
  destroy: () => void;
}

const createTextToSpeechStore = (options?: TextToSpeechOptions): TextToSpeechStoreReturn

🌍 Supported Languages

The library automatically selects appropriate voices for 30+ languages:

| Language | Code | Language | Code | |----------|------|----------|------| | English (US) | en-US | Japanese | ja-JP | | English (UK) | en-GB | Korean | ko-KR | | Spanish | es-ES | Chinese (Simplified) | zh-CN | | French | fr-FR | Chinese (Traditional) | zh-TW | | German | de-DE | Vietnamese | vi-VN | | Italian | it-IT | Thai | th-TH | | Portuguese | pt-PT | Russian | ru-RU | | Dutch | nl-NL | Arabic | ar-SA | | Polish | pl-PL | Hindi | hi-IN | | Swedish | sv-SE | Indonesian | id-ID |

And many more! The exact voices available depend on your operating system and browser.

🌐 Browser Support

Uses the Web Speech API, supported in:

| Browser | Support | Notes | |---------|---------|-------| | ✅ Chrome | 33+ | Full support | | ✅ Safari | 7+ | Full support | | ✅ Edge | 14+ | Full support | | ⚠️ Firefox | Limited | Basic support | | ❌ IE | None | Not supported |

Note: On mobile devices, speech synthesis may require user interaction to start.

🎯 AI-Specific Benefits

⚡ Instant User Feedback

Traditional TTS: Wait 5-10 seconds for complete AI response, then speak
TTS Stream: User hears speech in 0.1 seconds as AI generates text
Result: 50x faster perceived response time!

🧠 Enhanced AI Conversations

Natural Flow: Speech starts immediately, feels like talking to a human
Reduced Latency: No awkward pauses waiting for complete responses
Better UX: Users can interrupt or respond while AI is still thinking
Engagement: Voice keeps users engaged during long AI responses

💰 Cost & Performance Benefits

Bandwidth Efficient: Stream text immediately, no waiting for full response
Memory Friendly: Process text chunks instead of storing full responses
Battery Optimized: Continuous small operations vs large batch processing
Scalable: Works with any streaming AI service without modifications

🔧 Perfect For These AI Use Cases

1. 🤖 Conversational AI Assistants

OpenAI GPT, Claude, Gemini voice interfaces with zero-delay response.

2. 📞 AI Customer Support

Real-time voice responses for support bots and virtual agents.

3. 🎓 AI Tutors & Education

Interactive learning with immediate voice feedback from AI teachers.

4. 🏥 Healthcare AI

Medical assistant AI with natural voice interactions for patient care.

5. 🏠 Smart Home Voice

IoT devices with streaming AI responses for home automation.

6. 🎮 Gaming AI NPCs

Game characters with real-time voice generation from AI dialogue systems.

7. 📰 AI News Readers

Streaming news summaries with immediate voice narration.

8. 🌐 Multilingual AI

AI translators with instant voice output in multiple languages.

🎯 Best Practices

Performance Tips

// ✅ Good: Reuse TTS instance
const tts = new TextToSpeechStream();

// ❌ Avoid: Creating new instances repeatedly
function speakText(text) {
  const tts = new TextToSpeechStream(); // Don't do this
  tts.speak(text);
}

// ✅ Good: Clean up when done
useEffect(() => {
  return () => {
    tts.destroy();
  };
}, []);

Streaming Best Practices

// ✅ Good: Small chunks for natural flow
tts.speakStream("Hello ");
tts.speakStream("world ");
tts.speakStream("today!");

// ❌ Avoid: Very small fragments
tts.speakStream("H");
tts.speakStream("e");
tts.speakStream("l");

// ✅ Good: Handle language switches gracefully
const sentences = [
  { text: "Hello there! ", lang: "en-US" },
  { text: "Comment allez-vous? ", lang: "fr-FR" },
  { text: "¿Cómo está usted?", lang: "es-ES" }
];

sentences.forEach(({text, lang}) => {
  tts.speakStream(text, lang);
});

🤝 Contributing

We welcome contributions! Please see our Contributing Guide for details.

Development Setup

# Clone the repository
git clone https://github.com/tamnt-work/tts-stream.git
cd tts-stream

# Install dependencies (choose your preferred manager)
npm install
# or
yarn install
# or
pnpm install
# or
bun install

# Build the project
npm run build

# Run tests
npm test

📄 License

MIT License - feel free to use this project commercially.

🙏 Acknowledgments

Built on the Web Speech API
Inspired by the need for better AI voice interfaces
Thanks to all contributors and users!

Made with ❤️ by tamnt-work

⭐ Star this project if you find it useful!

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

🎙️ TTS Stream

🚀 Why Choose TTS Stream for AI Applications?

⚡ Zero-Latency AI Speech

🎯 Built for Modern AI Workflows

✨ Core Features

📦 Installation

🚀 Quick Start

🤖 Client-Side AI Streaming (Real Architecture)

🌐 Universal Client Pattern

📱 Basic Usage

React Hook

Vue Composable

Svelte Store

💡 AI Integration Examples

🚀 Complete Client-Side AI Chatbot

🌐 Multi-Model Client-Side Assistant

🎯 React AI Chat Component

🏗️ Client-Server Architecture

💡 Any Streaming AI Integration

🔧 Server-Side Example (Node.js)

Multi-language Support

Custom Voice Selection

Advanced Error Handling

📋 API Reference

TextToSpeechStream Class

Constructor Options

Methods

Events

React Hook

Vue Composable

Svelte Store

🌍 Supported Languages

🌐 Browser Support

🎯 AI-Specific Benefits

⚡ Instant User Feedback

🧠 Enhanced AI Conversations

💰 Cost & Performance Benefits

🔧 Perfect For These AI Use Cases

1. 🤖 Conversational AI Assistants

2. 📞 AI Customer Support

3. 🎓 AI Tutors & Education

4. 🏥 Healthcare AI

5. 🏠 Smart Home Voice

6. 🎮 Gaming AI NPCs

7. 📰 AI News Readers

8. 🌐 Multilingual AI

🎯 Best Practices

Performance Tips

Streaming Best Practices

🤝 Contributing

Development Setup

📄 License

🙏 Acknowledgments