tz-voice-chat
v1.0.2
Published
React hook for a full voice-chat pipeline: VAD → Whisper STT → custom AI processor → Kokoro/OpenAI/WebSpeech TTS.
Maintainers
Readme
tz-voice-chat
A self-contained React hook and UI component that wire together a full voice-chat pipeline:
Microphone → VAD → Whisper STT → your AI function → Kokoro / OpenAI / WebSpeech TTSYou provide the AI logic. The hook handles everything else.
Install
npm install tz-voice-chatPeer dependencies (install separately):
npm install react @huggingface/transformers @ricky0123/vad-web
# Optional — only for ttsProvider: 'kokoro' (browser-side synthesis):
npm install kokoro-jsQuick start
import { TZVoiceChat, AIAssistantButton } from 'tz-voice-chat';
import 'tz-voice-chat/style.css';
function MyApp() {
const {
status, // 'idle' | 'listening' | 'transcribing' | 'thinking' | 'speaking'
loading, // true while Whisper + TTS models are loading
progress, // { whisper: 0–1, kokoro: 0–1 }
error,
transcript, // [{ role, content, time }]
streamingMessage,// partial assistant response, or null
micEnabled,
startListening,
stopListening,
speak, // speak(text) — directly synthesise any text
liveAnalyser, // AnalyserNode for waveform UI, or null
devices, // MediaDeviceInfo[] — audio inputs
selectedMicId,
setSelectedMicId,
} = TZVoiceChat({
// ── Required ─────────────────────────────────────────────────────────
// Async generator that receives the user utterance, full conversation
// history, and an AbortSignal. Yield response tokens/chunks.
processInput: async function* (text, history, signal) {
const res = await fetch('/api/chat', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ messages: history }),
signal,
});
const reader = res.body.getReader();
const dec = new TextDecoder();
let buf = '';
while (true) {
const { value, done } = await reader.read();
if (done) break;
buf += dec.decode(value, { stream: true });
const lines = buf.split('\n');
buf = lines.pop();
for (const line of lines) {
const data = line.replace(/^data:\s*/, '');
if (data === '[DONE]') return;
try { const token = JSON.parse(data).choices?.[0]?.delta?.content; if (token) yield token; }
catch (_) {}
}
}
},
// ── Optional ─────────────────────────────────────────────────────────
ttsProvider: 'server-kokoro', // 'server-kokoro' | 'kokoro' | 'openai' | 'webspeech'
ttsVoice: 'af_bella',
ttsServerUrl: '/api/tts', // POST endpoint for server-side TTS
whisperModel: 'onnx-community/whisper-tiny.en',
whisperDtype: 'q8',
whisperDevice: 'cpu', // 'cpu' | 'webgpu'
modelProxyUrl: '/api/models/', // proxies HuggingFace model downloads
});
return (
<div>
<p>Status: {status}</p>
<button onClick={micEnabled ? stopListening : startListening} disabled={loading}>
{micEnabled ? 'Stop' : 'Start'}
</button>
<button onClick={() => speak('Hello! How can I help you?')}>Say hello</button>
{transcript.map((msg, i) => (
<p key={i}><b>{msg.role}</b>: {msg.content}</p>
))}
</div>
);
}API
TZVoiceChat(options)
Required option
| Option | Type | Description |
|---|---|---|
| processInput | async function*(text, history, signal) | Your AI processor. Called after each user utterance. history is the full [{role,content}] array. Yield response tokens. |
Optional options
| Option | Default | Description |
|---|---|---|
| systemPrompt | friendly voice assistant prompt | System message prepended to all conversations |
| ttsProvider | 'server-kokoro' | 'server-kokoro' | 'kokoro' | 'openai' | 'webspeech' |
| ttsVoice | 'af_bella' | Voice name for Kokoro / OpenAI TTS |
| ttsServerUrl | '/api/tts' | POST endpoint for server-side TTS (receives {text, voice}, returns audio) |
| ttsSpeechRate | 1 | WebSpeech rate |
| ttsSpeechPitch | 1 | WebSpeech pitch |
| whisperModel | 'onnx-community/whisper-tiny.en' | HuggingFace model ID |
| whisperDtype | 'q8' | 'q4' | 'q8' | 'fp32' |
| whisperDevice | 'cpu' | 'cpu' | 'webgpu' |
| kokoroDtype | 'q8' | Browser Kokoro dtype (when ttsProvider: 'kokoro') |
| kokoroBrowserDevice | 'wasm' | 'wasm' | 'webgpu' |
| modelProxyUrl | '/api/models/' | Base URL for model downloads (proxies HuggingFace) |
Return value
| Field | Type | Description |
|---|---|---|
| status | string | 'idle' | 'listening' | 'transcribing' | 'thinking' | 'speaking' |
| loading | boolean | true while Whisper + TTS are loading |
| progress | {whisper:number, kokoro:number} | Per-model load progress (0–1) |
| error | string\|null | Last error message |
| transcript | {role,content,time}[] | Completed conversation turns |
| streamingMessage | string\|null | Partial assistant response being streamed |
| micEnabled | boolean | Whether microphone is open |
| startListening() | () => Promise<void> | Open microphone and start pipeline |
| stopListening() | () => void | Close microphone and cancel in-progress pipeline |
| speak(text) | (text:string) => void | Synthesise and play text immediately |
| liveAnalyser | AnalyserNode\|null | Real-time frequency data for waveform UI |
| devices | MediaDeviceInfo[] | Available audio input devices |
| selectedMicId | string | Selected device ID ('' = system default) |
| setSelectedMicId | (id:string) => void | Select an audio input device |
Server requirements
For ttsProvider: 'server-kokoro' or 'openai':
POST /api/tts— accepts{ text: string, voice?: string }, returns audio (audio/wavoraudio/mpeg)
For model downloads (Whisper STT, browser Kokoro TTS):
GET /api/models/*— proxies HuggingFace Hub downloads (or setmodelProxyUrlto a direct URL)
The included example app (src/App.jsx) demonstrates a complete Express server setup for all of the above.
Infrastructure diagram
TZVoiceChat hook
├── VAD (@ricky0123/vad-web, IIFE — loaded via <script> in index.html)
├── Whisper Worker (Web Worker — @huggingface/transformers, auto-speech-recognition)
├── TTS Queue (server-kokoro | kokoro | openai | webspeech)
└── processInput ← YOU provide this (async generator)
AIAssistantButton component
└── 5-state animated button with waveform visualiser and mic selector