@davidmokos/react-use-transcription
v0.0.4
Published
A zero-configuration React hook that captures microphone audio, streams it to a Cloudflare Worker, and delivers real-time partial and final transcripts powered by OpenAI Whisper.
Downloads
34
Readme
@scope/react-use-transcription
A zero-configuration React hook that captures microphone audio, streams it to a Cloudflare Worker, and delivers real-time partial and final transcripts powered by OpenAI Whisper.
- ✅ Works over a single secure WebSocket
- ✅ Handles microphone permissions, buffering, and clean-up for you
- ✅ Ships with sensible fallbacks and human-friendly error messages
- ✅ Pairable with the included Cloudflare Worker for drop-in backend transcription
Installation
# using bun
bun add @scope/react-use-transcription
# using npm
npm install @scope/react-use-transcription
# using pnpm
pnpm add @scope/react-use-transcriptionThe audio worklet is bundled with the package—no extra files to copy. The hook automatically injects it via a blob URL when you call startTranscribing.
Quick Start
import { useTranscription } from '@scope/react-use-transcription';
export function TranscriptionDemo() {
const {
transcriptionStatus,
isRecording,
isProcessing,
startTranscribing,
stopTranscribing,
transcription,
partial,
permissionState,
error,
levels, // 10 normalized audio levels for visualisations
} = useTranscription({ wsUrl: 'wss://your-worker.example.com/ws' });
const busy = transcriptionStatus === 'connecting' || isProcessing;
return (
<div>
<button
onClick={isRecording ? stopTranscribing : startTranscribing}
disabled={busy || permissionState === 'denied' || permissionState === 'unsupported'}
>
{isRecording ? 'Stop Recording' : busy ? 'Processing…' : 'Start Recording'}
</button>
<p>Status: {transcriptionStatus}</p>
<pre>{transcription}{partial ? `\n${partial}` : ''}</pre>
<div style={{ display: 'flex', gap: 4, alignItems: 'flex-end', height: 40 }}>
{levels.map((value, index) => (
<div
key={index}
style={{
flex: 1,
background: '#0af',
opacity: 0.6,
height: `${Math.max(value, 0.05) * 100}%`,
transition: 'height 80ms ease-out'
}}
/>
))}
</div>
{error && (
<p style={{ color: 'crimson' }}>
{error.userMessage} <small>({error.code})</small>
</p>
)}
</div>
);
}Serve the page over HTTPS or localhost, otherwise browsers will block microphone access.
Hook API
useTranscription(options) accepts:
| Option | Type | Default | Description |
|-------------|----------------------|---------|----------------------------------------------|
| wsUrl | string | — | WebSocket endpoint exposed by the worker. |
| sampleRate| 16000 \| 48000 | 16000 | Desired PCM sample rate; must match backend. |
It returns:
| Field | Type | Notes |
|------------------|-----------------------------------------|---------------------------------------------------------------------------------------|
| transcriptionStatus | 'idle' \| 'connecting' \| 'recording' \| 'processing' | High-level phase for UI state machines. |
| status | 'idle' \| 'connecting' \| 'recording' \| 'processing' | Alias for transcriptionStatus (kept for backwards compatibility). |
| isTranscribing | boolean | true while a session is active (connecting, recording, or finalising). |
| isRecording | boolean | true while the microphone is open and frames are being streamed. |
| isProcessing | boolean | true after stopTranscribing until the backend sends its final transcript. |
| startTranscribing | () => Promise<void> | Opens mic, worklet, WebSocket connection. Safe to call repeatedly. |
| stopTranscribing | () => Promise<void> | Flushes buffers and lets the worker close the socket after the final transcript. |
| transcription | string | Accumulated final transcripts. |
| partial | string \| undefined | Latest interim status ("Listening…" or "Processing transcription…"). |
| permissionState| 'granted' \| 'denied' \| … | Mirrors PermissionStatus. |
| error | TranscriptionError \| null | Rich error with type, code, message, and userMessage for UI display. |
| levels | number[] | 10-sample rolling audio intensity (0–1) for animated meters or visualisers. |
transcriptionStatus, isRecording, and isProcessing make it easy to tailor your UI (e.g. show a spinner while finalising or disable buttons during setup) without guessing from partial transcript strings.
Error Handling
The hook normalises browser-specific microphone errors. Display error.userMessage to end users and inspect error.code for programmatic flows (retry prompts, custom tooltips, etc.).
Deploying the Cloudflare Worker
The worker in apps/worker exposes the /ws endpoint consumed by the hook. Deploy it to your Cloudflare account with the following steps:
Install dependencies (once per repo)
bun installConfigure Wrangler
cd apps/worker wrangler loginProvide secrets Use Wrangler secrets for production and
apps/worker/.dev.varsfor local development:wrangler secret put OPENAI_API_KEY # required for OpenAI Whisper wrangler secret put ELEVEN_API_KEY # required when TRANSCRIBER=elevenlabs wrangler secret put SILENCE_THRESHOLD # optional (defaults to 0.012) wrangler secret put ELEVEN_STT_MODEL # optional (defaults to eleven_multilingual_v2)In the Cloudflare dashboard: Workers → your worker → Settings → Variables → Add variable → choose Secret and enter the same keys.
For local development create
apps/worker/.dev.varswith matching entries:OPENAI_API_KEY=sk-... ELEVEN_API_KEY=sk-... SILENCE_THRESHOLD=0.01Wrangler automatically loads
.dev.varswhen you runbun run devinsideapps/worker.Deploy
bun run deployThat script invokes
wrangler deploywith the bundled worker.Verify locally (optional)
bun run devWrangler will expose the worker on
http://localhost:8787/ws, perfect for local testing with the example app.
Once deployed, grab the live WebSocket URL from the Wrangler output (something like wss://asr-ws.your-account.workers.dev/ws) and feed it to the hook's wsUrl option.
Silence Trimming
The worker removes frames that fall below a configurable energy threshold before batching audio for Whisper. Tune it via the SILENCE_THRESHOLD secret:
wrangler secret put SILENCE_THRESHOLD # e.g. 0.01 keeps quiet speech, 0 disables trimming- Default:
0.012(≈1.2 % of full-scale amplitude). - Set to
0orfalseto disable trimming entirely.
Serving the Audio Worklet
Nothing extra to host—the package registers the audio worklet dynamically and streams PCM16 frames straight to the worker.
Transcription Providers
Set the TRANSCRIBER variable (defaults to openai) in wrangler.toml or as a plain text variable in the Cloudflare dashboard to switch between providers. When you specify a provider, the corresponding API key must be present—otherwise the worker throws a configuration error. If you leave it blank, the worker prefers OpenAI when that key is available, otherwise it uses ElevenLabs.
| Provider value | Requirements | Notes |
|------------------|---------------------------------------|----------------------------------------------------|
| openai (default)| OPENAI_API_KEY secret | Uses Whisper (model=whisper-1). |
| elevenlabs | ELEVEN_API_KEY secret | Uses ElevenLabs STT (model_id=eleven_multilingual_v2 by default). |
Optional secrets:
DEFAULT_LANG– hint language for both providers.ELEVEN_STT_MODEL– override the ElevenLabs model ID if needed.
Development Tips
- Use
bun run buildat the repo root to rebuild all packages, including this hook. - The example app in
examples/textarea-basicdemonstrates a minimal integration and makes a great starting point for UI experiments. - If you extend the protocol, update both this hook and the worker to keep the frame schema aligned.
License
MIT © Scope
