@tekyzinc/stt-component

v0.3.3

Published

3 months ago

Framework-agnostic speech-to-text with real-time streaming transcription and mid-recording Whisper correction

0High
0Medium
0Low

dmhirschfeld

gborker

STT-Component

A framework-agnostic, browser-first speech-to-text package with real-time streaming transcription and mid-recording Whisper correction, powered by @huggingface/transformers.

Features

Streaming transcription -- real-time interim text as you speak
Mid-recording Whisper correction -- automatic correction cycles triggered by speech pauses or forced intervals
Configurable Whisper models -- tiny, base, small, medium (ONNX via transformers.js)
WebGPU + WASM -- GPU-accelerated inference in Chrome/Edge with automatic WASM fallback for Firefox/Safari
Event-driven API -- subscribe to transcript, correction, error, and status events
Framework-agnostic -- works with React, Vue, Svelte, vanilla JS, or any framework
Web Worker inference -- non-blocking model loading and transcription via dedicated worker thread
Configurable correction timing -- pause threshold, forced interval, or disable entirely
Audio chunking -- configurable chunk length and stride for long-form audio
Node.js support -- compatible with Node.js >= 18 via @huggingface/transformers

Quick Start

npm install @tekyzinc/stt-component

import { STTEngine } from '@tekyzinc/stt-component';

const engine = new STTEngine({ model: 'tiny' });

engine.on('transcript', (text) => console.log('Interim:', text));
engine.on('correction', (text) => console.log('Corrected:', text));

await engine.init();
await engine.start();
// ... user speaks ...
const finalText = await engine.stop();

API Reference

STTEngine

The main class. Extends TypedEventEmitter<STTEvents>.

`constructor(config?: STTConfig, workerUrl?: URL)`

Creates a new engine instance. All config fields are optional -- sensible defaults are applied.

`init(): Promise<void>`

Spawns the Web Worker and loads the Whisper model. Emits status events with download progress. Throws on model load failure.

`start(): Promise<void>`

Requests microphone access and begins recording. Enables mid-recording correction cycles. Engine must be in ready state (call init() first).

`stop(): Promise<string>`

Stops recording, runs a final Whisper transcription on the full audio, emits the final correction event, and returns the transcribed text.

`destroy(): void`

Terminates the worker, releases the microphone and AudioContext, removes all event listeners. Call when done with the engine.

`getState(): Readonly<STTState>`

Returns a snapshot of the current engine state.

`notifyPause(): void`

Manually signals a speech pause to the correction orchestrator, which may trigger an early correction cycle.

`on(event, listener): void`

Subscribe to an event. Type-safe -- TypeScript enforces correct callback signatures.

`off(event, listener): void`

Unsubscribe a specific listener.

Events

| Event | Callback Signature | Description | |-------|-------------------|-------------| | transcript | (text: string) => void | Real-time streaming text via Web Speech API (display in italics) | | correction | (text: string) => void | Whisper-corrected text replacing interim text (display in normal style) | | error | (error: STTError) => void | Actionable error ({ code: string, message: string }) | | status | (state: STTState) => void | Engine state changes | | debug | (message: string) => void | Internal diagnostic logs (Speech API lifecycle, errors, results) |

Error Codes

| Code | When | |------|------| | MIC_DENIED | Microphone access denied or unavailable | | MODEL_LOAD_FAILED | Whisper model download or initialization failed | | TRANSCRIPTION_FAILED | Whisper inference failed (recording continues) | | WORKER_ERROR | Web Worker encountered an error | | STREAMING_ERROR | Web Speech API streaming error |

Engine States (`STTStatus`)

idle -> loading -> ready -> recording -> processing -> ready

| Status | Meaning | |--------|---------| | idle | Engine created but not initialized | | loading | Model downloading / initializing | | ready | Model loaded, ready to record | | recording | Actively capturing audio | | processing | Running final transcription after stop |

Configuration

All fields are optional. Defaults shown in the table.

| Option | Type | Default | Description | |--------|------|---------|-------------| | model | 'tiny' \| 'base' \| 'small' \| 'medium' | 'tiny' | Whisper model size | | backend | 'webgpu' \| 'wasm' \| 'auto' | 'auto' | Compute backend (auto = WebGPU with WASM fallback) | | language | string | 'en' | Transcription language | | dtype | string | 'q4' | Model quantization dtype | | correction.enabled | boolean | true | Enable mid-recording Whisper correction | | correction.provider | 'whisper' | 'whisper' | Correction engine provider | | correction.pauseThreshold | number (ms) | 3000 | Silence duration before triggering correction | | correction.forcedInterval | number (ms) | 5000 | Maximum interval between forced corrections | | streaming.enabled | boolean | true | Enable real-time streaming transcript via Web Speech API | | streaming.provider | 'web-speech-api' | 'web-speech-api' | Streaming provider (Chrome/Edge) | | chunking.chunkLengthS | number (seconds) | 30 | Chunk length for Whisper processing | | chunking.strideLengthS | number (seconds) | 5 | Stride length for overlapping chunks |

STTState

Returned by getState() and emitted with status events.

interface STTState {
  status: STTStatus;
  isModelLoaded: boolean;
  loadProgress: number;       // 0-100
  backend: 'webgpu' | 'wasm' | null;
  error: string | null;
}

Usage Examples

Vanilla JavaScript

<script type="module">
  import { STTEngine } from '@tekyzinc/stt-component';

  const engine = new STTEngine({ model: 'tiny' });
  const output = document.getElementById('output');

  engine.on('correction', (text) => {
    output.textContent = text;
  });

  engine.on('error', (err) => {
    console.error(`[${err.code}] ${err.message}`);
  });

  await engine.init();

  document.getElementById('start').onclick = () => engine.start();
  document.getElementById('stop').onclick = async () => {
    const final = await engine.stop();
    output.textContent = final;
  };
</script>

React Pattern

No React dependency required -- this just shows the integration pattern.

import { useEffect, useRef, useState } from 'react';
import { STTEngine } from '@tekyzinc/stt-component';

function VoiceInput() {
  const engineRef = useRef<STTEngine | null>(null);
  const [text, setText] = useState('');

  useEffect(() => {
    const engine = new STTEngine({ model: 'tiny' });
    engineRef.current = engine;

    engine.on('correction', setText);
    engine.on('error', (err) => console.error(err.code, err.message));

    engine.init();
    return () => engine.destroy();
  }, []);

  return (
    <div>
      <button onClick={() => engineRef.current?.start()}>Record</button>
      <button onClick={() => engineRef.current?.stop()}>Stop</button>
      <p>{text}</p>
    </div>
  );
}

Error Handling

import { STTEngine } from '@tekyzinc/stt-component';

const engine = new STTEngine();

engine.on('error', (err) => {
  switch (err.code) {
    case 'MIC_DENIED':
      alert('Please allow microphone access.');
      break;
    case 'MODEL_LOAD_FAILED':
      console.error('Model failed to load:', err.message);
      break;
    case 'TRANSCRIPTION_FAILED':
      // Non-fatal: recording continues, correction will retry
      console.warn('Transcription error:', err.message);
      break;
  }
});

engine.on('status', (state) => {
  console.log(`Status: ${state.status}, progress: ${state.loadProgress}%`);
});

await engine.init();

Browser Compatibility

| Browser | Backend | Notes | |---------|---------|-------| | Chrome 113+ | WebGPU | Full GPU acceleration | | Edge 113+ | WebGPU | Full GPU acceleration | | Firefox | WASM | Automatic fallback, slower inference | | Safari 18+ | WASM | Automatic fallback, slower inference |

When backend is set to 'auto' (default), the engine attempts WebGPU first and falls back to WASM silently.

Node.js

Compatible with Node.js >= 18 via @huggingface/transformers. In Node.js, the engine uses the WASM backend (no WebGPU). Audio capture (startCapture) requires browser APIs (navigator.mediaDevices), so in Node.js you would provide pre-recorded audio to the worker directly or use a Node.js audio library for capture.

Exports

The package exports all public types and utilities:

// Main API
import { STTEngine } from '@tekyzinc/stt-component';

// Types
import type {
  STTConfig,
  STTState,
  STTEvents,
  STTError,
  STTModelSize,
  STTBackend,
  STTStatus,
  STTCorrectionProvider,
  STTStreamingProvider,
  STTStreamingConfig,
} from '@tekyzinc/stt-component';

// Utilities (advanced usage)
import {
  DEFAULT_STT_CONFIG,
  resolveConfig,
  TypedEventEmitter,
  WorkerManager,
  CorrectionOrchestrator,
  SpeechStreamingManager,
} from '@tekyzinc/stt-component';

Troubleshooting

Vite: New features not working after upgrading

Symptom: After running npm install @tekyzinc/stt-component@latest, new features (like streaming transcription) don't appear. The old behavior persists despite the upgrade.

Cause: Vite pre-bundles dependencies into node_modules/.vite/deps/ for faster dev server startup. When you upgrade a package, Vite may continue serving the stale cached bundle instead of the updated code. The files in node_modules/@tekyzinc/stt-component/dist/ are correct, but Vite's dev server never reads them — it serves the pre-bundled copy from .vite/deps/.

Fix (immediate):

# Delete the Vite dependency cache
rm -rf node_modules/.vite

# Restart the dev server
npm run dev

Fix (permanent — recommended):

Add this to your vite.config.ts to exclude the package from pre-bundling entirely. Since @tekyzinc/stt-component ships as ESM, pre-bundling is unnecessary:

// vite.config.ts
export default defineConfig({
  optimizeDeps: {
    exclude: ['@tekyzinc/stt-component'],
  },
  // ... rest of your config
});

How to verify: After clearing the cache and restarting, open the browser DevTools Network tab and confirm the module is loaded directly from node_modules/@tekyzinc/stt-component/dist/ rather than from .vite/deps/.

Why this happens: Vite's dependency pre-bundling uses esbuild to convert packages into optimized ESM bundles on first run. This cache is keyed by the package.json lockfile hash, but certain upgrade scenarios (especially with scoped private packages) may not trigger cache invalidation. The result is that npm install updates the source files but Vite keeps serving the old pre-bundled version.

Web Speech API streaming not working

Symptom: correction events fire (Whisper is working) but transcript events never fire (no real-time streaming text).

Check these in order:

Vite cache (most common) — see the section above. If you recently upgraded the package, this is almost certainly the issue.
Browser support — Web Speech API streaming requires Chrome or Edge. Firefox and Safari do not support SpeechRecognition. Check with:
```
import { SpeechStreamingManager } from '@tekyzinc/stt-component';
console.log('Supported:', SpeechStreamingManager.isSupported());
```

Streaming disabled — Streaming is enabled by default, but verify your config:

const engine = new STTEngine({
  streaming: { enabled: true },  // default: true
});

Debug events — Subscribe to debug for internal diagnostics:
```
engine.on('debug', (msg) => console.log(msg));
```
Look for messages starting with [SSM] (SpeechStreamingManager) — they show whether the Speech API initialized, received results, or encountered errors.

Other bundlers (Webpack, Rollup, esbuild)

If using a bundler other than Vite, similar caching issues can occur:

Webpack: Delete .cache/ or the node_modules/.cache directory and restart
Turbopack: Delete .next/cache and restart
General rule: If an upgrade doesn't seem to take effect, clear your bundler's cache directory and rebuild

License

MIT

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme