@davidmokos/react-use-transcription

v0.0.4

Published

9 months ago

A zero-configuration React hook that captures microphone audio, streams it to a Cloudflare Worker, and delivers real-time partial and final transcripts powered by OpenAI Whisper.

Downloads

0High
0Medium
0Low

davidmokos

@scope/react-use-transcription

A zero-configuration React hook that captures microphone audio, streams it to a Cloudflare Worker, and delivers real-time partial and final transcripts powered by OpenAI Whisper.

✅ Works over a single secure WebSocket
✅ Handles microphone permissions, buffering, and clean-up for you
✅ Ships with sensible fallbacks and human-friendly error messages
✅ Pairable with the included Cloudflare Worker for drop-in backend transcription

Installation

# using bun
bun add @scope/react-use-transcription

# using npm
npm install @scope/react-use-transcription

# using pnpm
pnpm add @scope/react-use-transcription

The audio worklet is bundled with the package—no extra files to copy. The hook automatically injects it via a blob URL when you call startTranscribing.

Quick Start

import { useTranscription } from '@scope/react-use-transcription';

export function TranscriptionDemo() {
  const {
    transcriptionStatus,
    isRecording,
    isProcessing,
    startTranscribing,
    stopTranscribing,
    transcription,
    partial,
    permissionState,
    error,
    levels, // 10 normalized audio levels for visualisations
  } = useTranscription({ wsUrl: 'wss://your-worker.example.com/ws' });

  const busy = transcriptionStatus === 'connecting' || isProcessing;

  return (
    <div>
      <button
        onClick={isRecording ? stopTranscribing : startTranscribing}
        disabled={busy || permissionState === 'denied' || permissionState === 'unsupported'}
      >
        {isRecording ? 'Stop Recording' : busy ? 'Processing…' : 'Start Recording'}
      </button>

      <p>Status: {transcriptionStatus}</p>

      <pre>{transcription}{partial ? `\n${partial}` : ''}</pre>

      <div style={{ display: 'flex', gap: 4, alignItems: 'flex-end', height: 40 }}>
        {levels.map((value, index) => (
          <div
            key={index}
            style={{
              flex: 1,
              background: '#0af',
              opacity: 0.6,
              height: `${Math.max(value, 0.05) * 100}%`,
              transition: 'height 80ms ease-out'
            }}
          />
        ))}
      </div>

      {error && (
        <p style={{ color: 'crimson' }}>
          {error.userMessage} <small>({error.code})</small>
        </p>
      )}
    </div>
  );
}

Serve the page over HTTPS or localhost, otherwise browsers will block microphone access.

Hook API

useTranscription(options) accepts:

| Option | Type | Default | Description | |-------------|----------------------|---------|----------------------------------------------| | wsUrl | string | — | WebSocket endpoint exposed by the worker. | | sampleRate| 16000 \| 48000 | 16000 | Desired PCM sample rate; must match backend. |

It returns:

| Field | Type | Notes | |------------------|-----------------------------------------|---------------------------------------------------------------------------------------| | transcriptionStatus | 'idle' \| 'connecting' \| 'recording' \| 'processing' | High-level phase for UI state machines. | | status | 'idle' \| 'connecting' \| 'recording' \| 'processing' | Alias for transcriptionStatus (kept for backwards compatibility). | | isTranscribing | boolean | true while a session is active (connecting, recording, or finalising). | | isRecording | boolean | true while the microphone is open and frames are being streamed. | | isProcessing | boolean | true after stopTranscribing until the backend sends its final transcript. | | startTranscribing | () => Promise<void> | Opens mic, worklet, WebSocket connection. Safe to call repeatedly. | | stopTranscribing | () => Promise<void> | Flushes buffers and lets the worker close the socket after the final transcript. | | transcription | string | Accumulated final transcripts. | | partial | string \| undefined | Latest interim status ("Listening…" or "Processing transcription…"). | | permissionState| 'granted' \| 'denied' \| … | Mirrors PermissionStatus. | | error | TranscriptionError \| null | Rich error with type, code, message, and userMessage for UI display. | | levels | number[] | 10-sample rolling audio intensity (0–1) for animated meters or visualisers. |

transcriptionStatus, isRecording, and isProcessing make it easy to tailor your UI (e.g. show a spinner while finalising or disable buttons during setup) without guessing from partial transcript strings.

Error Handling

The hook normalises browser-specific microphone errors. Display error.userMessage to end users and inspect error.code for programmatic flows (retry prompts, custom tooltips, etc.).

Deploying the Cloudflare Worker

The worker in apps/worker exposes the /ws endpoint consumed by the hook. Deploy it to your Cloudflare account with the following steps:

Install dependencies (once per repo)
```
bun install
```
Configure Wrangler
```
cd apps/worker
wrangler login
```

Provide secrets Use Wrangler secrets for production and apps/worker/.dev.vars for local development:

wrangler secret put OPENAI_API_KEY        # required for OpenAI Whisper
wrangler secret put ELEVEN_API_KEY       # required when TRANSCRIBER=elevenlabs
wrangler secret put SILENCE_THRESHOLD    # optional (defaults to 0.012)
wrangler secret put ELEVEN_STT_MODEL     # optional (defaults to eleven_multilingual_v2)

In the Cloudflare dashboard: Workers → your worker → Settings → Variables → Add variable → choose Secret and enter the same keys.

For local development create apps/worker/.dev.vars with matching entries:

OPENAI_API_KEY=sk-...
ELEVEN_API_KEY=sk-...
SILENCE_THRESHOLD=0.01

Wrangler automatically loads .dev.vars when you run bun run dev inside apps/worker.

Deploy
```
bun run deploy
```
That script invokes wrangler deploy with the bundled worker.
Verify locally (optional)
```
bun run dev
```
Wrangler will expose the worker on http://localhost:8787/ws, perfect for local testing with the example app.

Once deployed, grab the live WebSocket URL from the Wrangler output (something like wss://asr-ws.your-account.workers.dev/ws) and feed it to the hook's wsUrl option.

Silence Trimming

The worker removes frames that fall below a configurable energy threshold before batching audio for Whisper. Tune it via the SILENCE_THRESHOLD secret:

wrangler secret put SILENCE_THRESHOLD  # e.g. 0.01 keeps quiet speech, 0 disables trimming

Default: 0.012 (≈1.2 % of full-scale amplitude).
Set to 0 or false to disable trimming entirely.

Serving the Audio Worklet

Nothing extra to host—the package registers the audio worklet dynamically and streams PCM16 frames straight to the worker.

Transcription Providers

Set the TRANSCRIBER variable (defaults to openai) in wrangler.toml or as a plain text variable in the Cloudflare dashboard to switch between providers. When you specify a provider, the corresponding API key must be present—otherwise the worker throws a configuration error. If you leave it blank, the worker prefers OpenAI when that key is available, otherwise it uses ElevenLabs.

| Provider value | Requirements | Notes | |------------------|---------------------------------------|----------------------------------------------------| | openai (default)| OPENAI_API_KEY secret | Uses Whisper (model=whisper-1). | | elevenlabs | ELEVEN_API_KEY secret | Uses ElevenLabs STT (model_id=eleven_multilingual_v2 by default). |

Optional secrets:

DEFAULT_LANG – hint language for both providers.
ELEVEN_STT_MODEL – override the ElevenLabs model ID if needed.

Development Tips

Use bun run build at the repo root to rebuild all packages, including this hook.
The example app in examples/textarea-basic demonstrates a minimal integration and makes a great starting point for UI experiments.
If you extend the protocol, update both this hook and the worker to keep the frame schema aligned.

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme