tz-voice-chat

v1.0.2

Published

17 days ago

React hook for a full voice-chat pipeline: VAD → Whisper STT → custom AI processor → Kokoro/OpenAI/WebSpeech TTS.

0High
0Medium
0Low

yuriyo

react hook voice chat speech stt tts whisper kokoro vad ai

tz-voice-chat

A self-contained React hook and UI component that wire together a full voice-chat pipeline:

Microphone → VAD → Whisper STT → your AI function → Kokoro / OpenAI / WebSpeech TTS

You provide the AI logic. The hook handles everything else.

Install

npm install tz-voice-chat

Peer dependencies (install separately):

npm install react @huggingface/transformers @ricky0123/vad-web
# Optional — only for ttsProvider: 'kokoro' (browser-side synthesis):
npm install kokoro-js

Quick start

import { TZVoiceChat, AIAssistantButton } from 'tz-voice-chat';
import 'tz-voice-chat/style.css';

function MyApp() {
  const {
    status,          // 'idle' | 'listening' | 'transcribing' | 'thinking' | 'speaking'
    loading,         // true while Whisper + TTS models are loading
    progress,        // { whisper: 0–1, kokoro: 0–1 }
    error,
    transcript,      // [{ role, content, time }]
    streamingMessage,// partial assistant response, or null
    micEnabled,
    startListening,
    stopListening,
    speak,           // speak(text) — directly synthesise any text
    liveAnalyser,    // AnalyserNode for waveform UI, or null
    devices,         // MediaDeviceInfo[] — audio inputs
    selectedMicId,
    setSelectedMicId,
  } = TZVoiceChat({
    // ── Required ─────────────────────────────────────────────────────────
    // Async generator that receives the user utterance, full conversation
    // history, and an AbortSignal. Yield response tokens/chunks.
    processInput: async function* (text, history, signal) {
      const res = await fetch('/api/chat', {
        method: 'POST',
        headers: { 'Content-Type': 'application/json' },
        body: JSON.stringify({ messages: history }),
        signal,
      });
      const reader = res.body.getReader();
      const dec = new TextDecoder();
      let buf = '';
      while (true) {
        const { value, done } = await reader.read();
        if (done) break;
        buf += dec.decode(value, { stream: true });
        const lines = buf.split('\n');
        buf = lines.pop();
        for (const line of lines) {
          const data = line.replace(/^data:\s*/, '');
          if (data === '[DONE]') return;
          try { const token = JSON.parse(data).choices?.[0]?.delta?.content; if (token) yield token; }
          catch (_) {}
        }
      }
    },

    // ── Optional ─────────────────────────────────────────────────────────
    ttsProvider: 'server-kokoro',   // 'server-kokoro' | 'kokoro' | 'openai' | 'webspeech'
    ttsVoice: 'af_bella',
    ttsServerUrl: '/api/tts',       // POST endpoint for server-side TTS
    whisperModel: 'onnx-community/whisper-tiny.en',
    whisperDtype: 'q8',
    whisperDevice: 'cpu',           // 'cpu' | 'webgpu'
    modelProxyUrl: '/api/models/',  // proxies HuggingFace model downloads
  });

  return (
    <div>
      <p>Status: {status}</p>
      <button onClick={micEnabled ? stopListening : startListening} disabled={loading}>
        {micEnabled ? 'Stop' : 'Start'}
      </button>
      <button onClick={() => speak('Hello! How can I help you?')}>Say hello</button>
      {transcript.map((msg, i) => (
        <p key={i}><b>{msg.role}</b>: {msg.content}</p>
      ))}
    </div>
  );
}

API

`TZVoiceChat(options)`

Required option

| Option | Type | Description | |---|---|---| | processInput | async function*(text, history, signal) | Your AI processor. Called after each user utterance. history is the full [{role,content}] array. Yield response tokens. |

Optional options

| Option | Default | Description | |---|---|---| | systemPrompt | friendly voice assistant prompt | System message prepended to all conversations | | ttsProvider | 'server-kokoro' | 'server-kokoro' | 'kokoro' | 'openai' | 'webspeech' | | ttsVoice | 'af_bella' | Voice name for Kokoro / OpenAI TTS | | ttsServerUrl | '/api/tts' | POST endpoint for server-side TTS (receives {text, voice}, returns audio) | | ttsSpeechRate | 1 | WebSpeech rate | | ttsSpeechPitch | 1 | WebSpeech pitch | | whisperModel | 'onnx-community/whisper-tiny.en' | HuggingFace model ID | | whisperDtype | 'q8' | 'q4' | 'q8' | 'fp32' | | whisperDevice | 'cpu' | 'cpu' | 'webgpu' | | kokoroDtype | 'q8' | Browser Kokoro dtype (when ttsProvider: 'kokoro') | | kokoroBrowserDevice | 'wasm' | 'wasm' | 'webgpu' | | modelProxyUrl | '/api/models/' | Base URL for model downloads (proxies HuggingFace) |

Return value

| Field | Type | Description | |---|---|---| | status | string | 'idle' | 'listening' | 'transcribing' | 'thinking' | 'speaking' | | loading | boolean | true while Whisper + TTS are loading | | progress | {whisper:number, kokoro:number} | Per-model load progress (0–1) | | error | string\|null | Last error message | | transcript | {role,content,time}[] | Completed conversation turns | | streamingMessage | string\|null | Partial assistant response being streamed | | micEnabled | boolean | Whether microphone is open | | startListening() | () => Promise<void> | Open microphone and start pipeline | | stopListening() | () => void | Close microphone and cancel in-progress pipeline | | speak(text) | (text:string) => void | Synthesise and play text immediately | | liveAnalyser | AnalyserNode\|null | Real-time frequency data for waveform UI | | devices | MediaDeviceInfo[] | Available audio input devices | | selectedMicId | string | Selected device ID ('' = system default) | | setSelectedMicId | (id:string) => void | Select an audio input device |

Server requirements

For ttsProvider: 'server-kokoro' or 'openai':

POST /api/tts — accepts { text: string, voice?: string }, returns audio (audio/wav or audio/mpeg)

For model downloads (Whisper STT, browser Kokoro TTS):

GET /api/models/* — proxies HuggingFace Hub downloads (or set modelProxyUrl to a direct URL)

The included example app (src/App.jsx) demonstrates a complete Express server setup for all of the above.

Infrastructure diagram

TZVoiceChat hook
├── VAD             (@ricky0123/vad-web, IIFE — loaded via <script> in index.html)
├── Whisper Worker  (Web Worker — @huggingface/transformers, auto-speech-recognition)
├── TTS Queue       (server-kokoro | kokoro | openai | webspeech)
└── processInput    ← YOU provide this (async generator)

AIAssistantButton component
└── 5-state animated button with waveform visualiser and mic selector

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

tz-voice-chat

Install

Quick start

API

TZVoiceChat(options)

Required option

Optional options

Return value

Server requirements

Infrastructure diagram

`TZVoiceChat(options)`