voice-flow-x
v0.2.0
Published
A voice stream processing framework unrelated to the environment
Downloads
61
Maintainers
Readme
voice-flow-x
🎙️ Environment-agnostic voice stream processing — a small, testable API for consuming audio chunks → async ASR text streams, merging segments, debouncing deltas, and matching commands.
- 🌊 Streaming-first —
Voiceconsumes incremental recognition asAsyncIterable/ReadableStream, handles segment switches and candidate merging - 🎯 UI-friendly — dedupes snapshots, short-window throttling, and a recent-output window to tame ASR flicker
- 🧩 Pluggable commands — match on
delta/finalwith strings, predicates, or regex;stop,clear, and customhandler - 📘 TypeScript-native — generic
Chunk<T>andVoiceOptions<T>for any audio payload type
📌 Note This library is only a stream + text state machine. It does not ship recording, WebSocket, or a specific ASR SDK — wire
streamto your service.
📦 Install
pnpm add voice-flow-xnpm install voice-flow-x🚀 Usage
Minimal example
Use createVoice (or new Voice) and provide stream: for each Chunk, return the async text stream for that audio segment.
import { createVoice } from 'voice-flow-x'
const voice = createVoice({
stream: async ({ data, id }) => {
// Your ASR: return AsyncIterable<string> or ReadableStream<string>
return yourAsrStream(data, id)
},
onDelta: (text) => {
// Debounced full snapshot (good for the current recognition line)
},
onFinal: (text) => {
// Fires when `done()` completes (e.g. silence timeout with `finalIdleMs`, or manual `done()`)
},
onClear: () => {
// clear or reset your vad recording
},
deltaIdleMs: 50, // debounce for onDelta, default 50
finalIdleMs: 2000, // optional: silence window before auto-finalize when no new chunk
debug: false,
})
// After you get audio from the mic or elsewhere:
voice.feed({ data: wavBase64_1, id: segmentId })
voice.feed({ data: wavBase64_2, id: segmentId })
voice.feed({ data: wavBase64_3, id: segmentId })
voice.feed({ data: wavBase64_4, id: segmentId_2 })
voice.feed({ data: wavBase64_5, id: segmentId_2 })
voice.feed({ data: wavBase64_6, id: segmentId_2 })Register match-and-run rules on streaming text — handy for keyword interrupts or clearing the UI.
voice.addCommand({
match: ['stop', '停止'],
stage: 'delta', // 'delta' or 'final'
stop: true, // skip further onDelta/onFinal handling when matched
clear: true, // pass empty string to callbacks to clear the view
handler: (text) => { /* side effects */ },
})match may be string[] (includes), (text: string) => boolean, or RegExp.
Web Speech API (browser)
Use the Web Speech API so the browser does ASR; on each result, feed the transcript as chunk.data and implement stream to turn that payload into an async text stream (e.g. one yield per chunk). Requires a secure context (HTTPS or localhost) and usually a user gesture before start().
import { createVoice } from 'voice-flow-x'
const SpeechRecognition = window.SpeechRecognition ?? window.webkitSpeechRecognition
const voice = createVoice({
stream: async ({ data, id }) => { /* request asr api and return the async text stream */ },
onDelta: (text) => { /* UI */ },
onFinal: (text) => { /* committed line */ },
finalIdleMs: 2000,
})
const recognition = new SpeechRecognition()
recognition.onresult = (event) => {
const result = event.results[0][0].transcript
voice.feed({ data: result, id: event.resultIndex })
}
recognition.start()State & concurrency
feed(chunk)— queued processing; continues with the next chunk when the current run finisheslock(ms)/unlock()— pause processing; auto-unlock after timeout;unlockclears internal text statefinalize()— whenfinalIdleMsis set, schedulesdone()after silence (no new audio)clear()— resets prefix, merged text, and dedupe window (often via internal oronClearpaths)
Types
Exporting from voice-flow-x: Voice, createVoice, Chunk, Command, VoiceOptions, AsyncIterableStream, and more — see JSDocs.
🔧 For package maintainers
If you use npm Trusted Publisher, run pnpm publish once locally to create the package and link the GitHub repo on npm; later, pnpm run release can drive releases via CI. See npm docs and scripts in this repo.
