@tekyzinc/stt-component
v0.3.3
Published
Framework-agnostic speech-to-text with real-time streaming transcription and mid-recording Whisper correction
Downloads
1,191
Readme
STT-Component
A framework-agnostic, browser-first speech-to-text package with real-time streaming transcription and mid-recording Whisper correction, powered by @huggingface/transformers.
Features
- Streaming transcription -- real-time interim text as you speak
- Mid-recording Whisper correction -- automatic correction cycles triggered by speech pauses or forced intervals
- Configurable Whisper models -- tiny, base, small, medium (ONNX via transformers.js)
- WebGPU + WASM -- GPU-accelerated inference in Chrome/Edge with automatic WASM fallback for Firefox/Safari
- Event-driven API -- subscribe to
transcript,correction,error, andstatusevents - Framework-agnostic -- works with React, Vue, Svelte, vanilla JS, or any framework
- Web Worker inference -- non-blocking model loading and transcription via dedicated worker thread
- Configurable correction timing -- pause threshold, forced interval, or disable entirely
- Audio chunking -- configurable chunk length and stride for long-form audio
- Node.js support -- compatible with Node.js >= 18 via @huggingface/transformers
Quick Start
npm install @tekyzinc/stt-componentimport { STTEngine } from '@tekyzinc/stt-component';
const engine = new STTEngine({ model: 'tiny' });
engine.on('transcript', (text) => console.log('Interim:', text));
engine.on('correction', (text) => console.log('Corrected:', text));
await engine.init();
await engine.start();
// ... user speaks ...
const finalText = await engine.stop();API Reference
STTEngine
The main class. Extends TypedEventEmitter<STTEvents>.
constructor(config?: STTConfig, workerUrl?: URL)
Creates a new engine instance. All config fields are optional -- sensible defaults are applied.
init(): Promise<void>
Spawns the Web Worker and loads the Whisper model. Emits status events with download progress. Throws on model load failure.
start(): Promise<void>
Requests microphone access and begins recording. Enables mid-recording correction cycles. Engine must be in ready state (call init() first).
stop(): Promise<string>
Stops recording, runs a final Whisper transcription on the full audio, emits the final correction event, and returns the transcribed text.
destroy(): void
Terminates the worker, releases the microphone and AudioContext, removes all event listeners. Call when done with the engine.
getState(): Readonly<STTState>
Returns a snapshot of the current engine state.
notifyPause(): void
Manually signals a speech pause to the correction orchestrator, which may trigger an early correction cycle.
on(event, listener): void
Subscribe to an event. Type-safe -- TypeScript enforces correct callback signatures.
off(event, listener): void
Unsubscribe a specific listener.
Events
| Event | Callback Signature | Description |
|-------|-------------------|-------------|
| transcript | (text: string) => void | Real-time streaming text via Web Speech API (display in italics) |
| correction | (text: string) => void | Whisper-corrected text replacing interim text (display in normal style) |
| error | (error: STTError) => void | Actionable error ({ code: string, message: string }) |
| status | (state: STTState) => void | Engine state changes |
| debug | (message: string) => void | Internal diagnostic logs (Speech API lifecycle, errors, results) |
Error Codes
| Code | When |
|------|------|
| MIC_DENIED | Microphone access denied or unavailable |
| MODEL_LOAD_FAILED | Whisper model download or initialization failed |
| TRANSCRIPTION_FAILED | Whisper inference failed (recording continues) |
| WORKER_ERROR | Web Worker encountered an error |
| STREAMING_ERROR | Web Speech API streaming error |
Engine States (STTStatus)
idle -> loading -> ready -> recording -> processing -> ready
| Status | Meaning |
|--------|---------|
| idle | Engine created but not initialized |
| loading | Model downloading / initializing |
| ready | Model loaded, ready to record |
| recording | Actively capturing audio |
| processing | Running final transcription after stop |
Configuration
All fields are optional. Defaults shown in the table.
| Option | Type | Default | Description |
|--------|------|---------|-------------|
| model | 'tiny' \| 'base' \| 'small' \| 'medium' | 'tiny' | Whisper model size |
| backend | 'webgpu' \| 'wasm' \| 'auto' | 'auto' | Compute backend (auto = WebGPU with WASM fallback) |
| language | string | 'en' | Transcription language |
| dtype | string | 'q4' | Model quantization dtype |
| correction.enabled | boolean | true | Enable mid-recording Whisper correction |
| correction.provider | 'whisper' | 'whisper' | Correction engine provider |
| correction.pauseThreshold | number (ms) | 3000 | Silence duration before triggering correction |
| correction.forcedInterval | number (ms) | 5000 | Maximum interval between forced corrections |
| streaming.enabled | boolean | true | Enable real-time streaming transcript via Web Speech API |
| streaming.provider | 'web-speech-api' | 'web-speech-api' | Streaming provider (Chrome/Edge) |
| chunking.chunkLengthS | number (seconds) | 30 | Chunk length for Whisper processing |
| chunking.strideLengthS | number (seconds) | 5 | Stride length for overlapping chunks |
STTState
Returned by getState() and emitted with status events.
interface STTState {
status: STTStatus;
isModelLoaded: boolean;
loadProgress: number; // 0-100
backend: 'webgpu' | 'wasm' | null;
error: string | null;
}Usage Examples
Vanilla JavaScript
<script type="module">
import { STTEngine } from '@tekyzinc/stt-component';
const engine = new STTEngine({ model: 'tiny' });
const output = document.getElementById('output');
engine.on('correction', (text) => {
output.textContent = text;
});
engine.on('error', (err) => {
console.error(`[${err.code}] ${err.message}`);
});
await engine.init();
document.getElementById('start').onclick = () => engine.start();
document.getElementById('stop').onclick = async () => {
const final = await engine.stop();
output.textContent = final;
};
</script>React Pattern
No React dependency required -- this just shows the integration pattern.
import { useEffect, useRef, useState } from 'react';
import { STTEngine } from '@tekyzinc/stt-component';
function VoiceInput() {
const engineRef = useRef<STTEngine | null>(null);
const [text, setText] = useState('');
useEffect(() => {
const engine = new STTEngine({ model: 'tiny' });
engineRef.current = engine;
engine.on('correction', setText);
engine.on('error', (err) => console.error(err.code, err.message));
engine.init();
return () => engine.destroy();
}, []);
return (
<div>
<button onClick={() => engineRef.current?.start()}>Record</button>
<button onClick={() => engineRef.current?.stop()}>Stop</button>
<p>{text}</p>
</div>
);
}Error Handling
import { STTEngine } from '@tekyzinc/stt-component';
const engine = new STTEngine();
engine.on('error', (err) => {
switch (err.code) {
case 'MIC_DENIED':
alert('Please allow microphone access.');
break;
case 'MODEL_LOAD_FAILED':
console.error('Model failed to load:', err.message);
break;
case 'TRANSCRIPTION_FAILED':
// Non-fatal: recording continues, correction will retry
console.warn('Transcription error:', err.message);
break;
}
});
engine.on('status', (state) => {
console.log(`Status: ${state.status}, progress: ${state.loadProgress}%`);
});
await engine.init();Browser Compatibility
| Browser | Backend | Notes | |---------|---------|-------| | Chrome 113+ | WebGPU | Full GPU acceleration | | Edge 113+ | WebGPU | Full GPU acceleration | | Firefox | WASM | Automatic fallback, slower inference | | Safari 18+ | WASM | Automatic fallback, slower inference |
When backend is set to 'auto' (default), the engine attempts WebGPU first and falls back to WASM silently.
Node.js
Compatible with Node.js >= 18 via @huggingface/transformers. In Node.js, the engine uses the WASM backend (no WebGPU). Audio capture (startCapture) requires browser APIs (navigator.mediaDevices), so in Node.js you would provide pre-recorded audio to the worker directly or use a Node.js audio library for capture.
Exports
The package exports all public types and utilities:
// Main API
import { STTEngine } from '@tekyzinc/stt-component';
// Types
import type {
STTConfig,
STTState,
STTEvents,
STTError,
STTModelSize,
STTBackend,
STTStatus,
STTCorrectionProvider,
STTStreamingProvider,
STTStreamingConfig,
} from '@tekyzinc/stt-component';
// Utilities (advanced usage)
import {
DEFAULT_STT_CONFIG,
resolveConfig,
TypedEventEmitter,
WorkerManager,
CorrectionOrchestrator,
SpeechStreamingManager,
} from '@tekyzinc/stt-component';Troubleshooting
Vite: New features not working after upgrading
Symptom: After running npm install @tekyzinc/stt-component@latest, new features (like streaming transcription) don't appear. The old behavior persists despite the upgrade.
Cause: Vite pre-bundles dependencies into node_modules/.vite/deps/ for faster dev server startup. When you upgrade a package, Vite may continue serving the stale cached bundle instead of the updated code. The files in node_modules/@tekyzinc/stt-component/dist/ are correct, but Vite's dev server never reads them — it serves the pre-bundled copy from .vite/deps/.
Fix (immediate):
# Delete the Vite dependency cache
rm -rf node_modules/.vite
# Restart the dev server
npm run devFix (permanent — recommended):
Add this to your vite.config.ts to exclude the package from pre-bundling entirely. Since @tekyzinc/stt-component ships as ESM, pre-bundling is unnecessary:
// vite.config.ts
export default defineConfig({
optimizeDeps: {
exclude: ['@tekyzinc/stt-component'],
},
// ... rest of your config
});How to verify: After clearing the cache and restarting, open the browser DevTools Network tab and confirm the module is loaded directly from node_modules/@tekyzinc/stt-component/dist/ rather than from .vite/deps/.
Why this happens: Vite's dependency pre-bundling uses esbuild to convert packages into optimized ESM bundles on first run. This cache is keyed by the package.json lockfile hash, but certain upgrade scenarios (especially with scoped private packages) may not trigger cache invalidation. The result is that npm install updates the source files but Vite keeps serving the old pre-bundled version.
Web Speech API streaming not working
Symptom: correction events fire (Whisper is working) but transcript events never fire (no real-time streaming text).
Check these in order:
Vite cache (most common) — see the section above. If you recently upgraded the package, this is almost certainly the issue.
Browser support — Web Speech API streaming requires Chrome or Edge. Firefox and Safari do not support
SpeechRecognition. Check with:import { SpeechStreamingManager } from '@tekyzinc/stt-component'; console.log('Supported:', SpeechStreamingManager.isSupported());Streaming disabled — Streaming is enabled by default, but verify your config:
const engine = new STTEngine({ streaming: { enabled: true }, // default: true });Debug events — Subscribe to
debugfor internal diagnostics:engine.on('debug', (msg) => console.log(msg));Look for messages starting with
[SSM](SpeechStreamingManager) — they show whether the Speech API initialized, received results, or encountered errors.
Other bundlers (Webpack, Rollup, esbuild)
If using a bundler other than Vite, similar caching issues can occur:
- Webpack: Delete
.cache/or thenode_modules/.cachedirectory and restart - Turbopack: Delete
.next/cacheand restart - General rule: If an upgrade doesn't seem to take effect, clear your bundler's cache directory and rebuild
License
MIT
