npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

@gdnaio/react-transcribe-streaming

v0.2.0

Published

React hook for real-time speech-to-text using AWS Transcribe Streaming

Readme

@gdnaio/react-transcribe-streaming

A lightweight React hook for real-time speech-to-text using AWS Transcribe Streaming. Captures microphone audio, streams it to AWS Transcribe, and returns a live-updating transcript — all in one hook.

Features

  • Real-time transcription — partial results update as you speak, final results accumulate
  • Single hook APIuseTranscribe() gives you everything: transcript, listening, startListening, stopListening
  • Cognito authentication — exchanges a Cognito ID token for temporary AWS credentials automatically, no backend needed
  • Multi-language support — 17+ languages with built-in BCP-47 code mapping
  • Auto cleanup — microphone and AWS resources are released automatically when you stop or when the component unmounts
  • Minimal footprint — just two runtime dependencies (@aws-sdk/client-transcribe-streaming, @aws-sdk/credential-providers), tree-shakeable
  • TypeScript-first — full type definitions included out of the box
  • React 18+ — works with React 18 and React 19
  • Framework agnostic — works with Vite, Next.js, Create React App, and any React setup

Table of Contents

Installation

npm install @gdnaio/react-transcribe-streaming
# or
pnpm add @gdnaio/react-transcribe-streaming
# or
yarn add @gdnaio/react-transcribe-streaming

Both ESM and CJS builds are included. TypeScript definitions ship with the package — no separate @types/ install needed.

AWS Setup (One-Time)

Before using the hook, you need three things in AWS: a User Pool, an Identity Pool, and an IAM role. You likely already have a User Pool if your app has Cognito auth. The Identity Pool and IAM role are new.

Step 1: Create a Cognito Identity Pool

  1. Open the Amazon Cognito console
  2. Click Identity pools in the left sidebar, then Create identity pool
  3. Give it a name (e.g., my-app-identity-pool)
  4. Under Authentication providers > Cognito, enter:
    • User Pool ID — your existing User Pool ID (e.g., us-east-1_XXXXXXXXX)
    • App Client ID — the app client ID from your User Pool
  5. Click Create pool
  6. AWS will prompt you to create two IAM roles (authenticated and unauthenticated). Accept the defaults
  7. Note the Identity Pool ID — you'll need it in your app (e.g., us-east-1:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx)

Step 2: Add Transcribe Permission to the Authenticated Role

  1. Open the IAM console
  2. Find the authenticated role that was created with the Identity Pool (e.g., Cognito_MyAppAuth_Role)
  3. Click Add permissions > Create inline policy
  4. Switch to the JSON tab and paste:
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": "transcribe:StartStreamTranscription",
      "Resource": "*"
    }
  ]
}
  1. Name the policy (e.g., TranscribeStreamingAccess) and save

Step 3: Verify Your User Pool

Ensure your app is already obtaining a Cognito ID token for authenticated users. This package needs the ID token (not the access token) to exchange for temporary AWS credentials.

Common auth libraries that provide this:

  • AWS AmplifyfetchAuthSession() returns tokens.idToken
  • @gdnaio/cognito-authgetIdToken() returns the ID token
  • amazon-cognito-identity-jsgetIdToken().getJwtToken()
  • Any OIDC library — the id_token from the token response

Step 4: HTTPS (Production)

Microphone access (getUserMedia) requires a secure context. Your app must be served over HTTPS in production. localhost is exempt for development.

Quick Start

import { useState, useEffect } from 'react'
import { useTranscribe } from '@gdnaio/react-transcribe-streaming'

function SpeechInput({ idToken }: { idToken: string }) {
  const { transcript, listening, startListening, stopListening } = useTranscribe({
    config: {
      region: 'us-east-1',
      identityPoolId: 'us-east-1:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx',
      userPoolId: 'us-east-1_XXXXXXXXX',
      idToken,
    },
    languageCode: 'en-US',
  })

  return (
    <div>
      <button onClick={listening ? stopListening : startListening}>
        {listening ? 'Stop' : 'Start'} Listening
      </button>
      <p>{transcript || 'Click the button and start speaking...'}</p>
    </div>
  )
}

Framework Guides

Vite

Vite exposes environment variables via import.meta.env with a VITE_ prefix.

1. Add environment variables to your .env or .env.local:

VITE_AWS_REGION=us-east-1
VITE_AWS_USER_POOL_ID=us-east-1_XXXXXXXXX
VITE_AWS_IDENTITY_POOL_ID=us-east-1:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx

2. Create a voice input component:

// src/components/VoiceInput.tsx
import { useState, useEffect } from 'react'
import { useTranscribe } from '@gdnaio/react-transcribe-streaming'
import { useAuth } from 'your-auth-library'

export default function VoiceInput() {
  const { getIdToken } = useAuth()
  const [idToken, setIdToken] = useState('')

  useEffect(() => {
    getIdToken().then((token) => {
      if (token) setIdToken(token)
    })
  }, [getIdToken])

  const { transcript, listening, startListening, stopListening, resetTranscript, isMicrophoneAvailable } =
    useTranscribe({
      config: {
        region: import.meta.env.VITE_AWS_REGION,
        identityPoolId: import.meta.env.VITE_AWS_IDENTITY_POOL_ID,
        userPoolId: import.meta.env.VITE_AWS_USER_POOL_ID,
        idToken,
      },
      languageCode: 'en-US',
    })

  return (
    <div>
      <textarea value={transcript} readOnly rows={4} placeholder="Speak into your microphone..." />
      <div>
        <button onClick={listening ? stopListening : startListening} disabled={!isMicrophoneAvailable}>
          {listening ? 'Stop' : 'Record'}
        </button>
        <button onClick={resetTranscript}>Clear</button>
      </div>
    </div>
  )
}

3. Use it in your app:

// src/App.tsx
import VoiceInput from './components/VoiceInput'

function App() {
  return (
    <div>
      <h1>Voice Input Demo</h1>
      <VoiceInput />
    </div>
  )
}

Next.js

This package uses browser-only APIs (getUserMedia, AudioContext). In Next.js, you must mark the component as client-side.

1. Add environment variables to your .env.local:

NEXT_PUBLIC_AWS_REGION=us-east-1
NEXT_PUBLIC_AWS_USER_POOL_ID=us-east-1_XXXXXXXXX
NEXT_PUBLIC_AWS_IDENTITY_POOL_ID=us-east-1:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx

2. Create a client component (note the 'use client' directive):

// components/VoiceInput.tsx
'use client'

import { useState, useEffect } from 'react'
import { useTranscribe } from '@gdnaio/react-transcribe-streaming'

interface VoiceInputProps {
  idToken: string
  languageCode?: string
}

export default function VoiceInput({ idToken, languageCode = 'en-US' }: VoiceInputProps) {
  const { transcript, listening, startListening, stopListening, resetTranscript, isMicrophoneAvailable } =
    useTranscribe({
      config: {
        region: process.env.NEXT_PUBLIC_AWS_REGION!,
        identityPoolId: process.env.NEXT_PUBLIC_AWS_IDENTITY_POOL_ID!,
        userPoolId: process.env.NEXT_PUBLIC_AWS_USER_POOL_ID!,
        idToken,
      },
      languageCode,
    })

  return (
    <div>
      <p>{transcript || 'Click the button and start speaking...'}</p>
      <button onClick={listening ? stopListening : startListening} disabled={!isMicrophoneAvailable}>
        {listening ? 'Stop Recording' : 'Start Recording'}
      </button>
      <button onClick={resetTranscript}>Clear</button>
    </div>
  )
}

3. Use it in a page (App Router):

// app/page.tsx
import dynamic from 'next/dynamic'

// Dynamic import with SSR disabled — the hook uses browser APIs
const VoiceInput = dynamic(() => import('@/components/VoiceInput'), { ssr: false })

export default function Page() {
  const idToken = '...' // Get from your auth layer (server component, cookie, etc.)

  return (
    <main>
      <h1>Voice Input</h1>
      <VoiceInput idToken={idToken} />
    </main>
  )
}

Important: If you see ReferenceError: navigator is not defined or AudioContext is not defined, it means the component is being rendered on the server. Either:

  • Add 'use client' at the top of the file, or
  • Use dynamic(() => import(...), { ssr: false }) to disable SSR for that component

Create React App

CRA exposes environment variables via process.env with a REACT_APP_ prefix.

1. Add environment variables to your .env:

REACT_APP_AWS_REGION=us-east-1
REACT_APP_AWS_USER_POOL_ID=us-east-1_XXXXXXXXX
REACT_APP_AWS_IDENTITY_POOL_ID=us-east-1:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx

2. Create the component:

// src/components/VoiceInput.tsx
import { useState, useEffect } from 'react'
import { useTranscribe } from '@gdnaio/react-transcribe-streaming'

export default function VoiceInput({ idToken }: { idToken: string }) {
  const { transcript, listening, startListening, stopListening, resetTranscript, isMicrophoneAvailable } =
    useTranscribe({
      config: {
        region: process.env.REACT_APP_AWS_REGION!,
        identityPoolId: process.env.REACT_APP_AWS_IDENTITY_POOL_ID!,
        userPoolId: process.env.REACT_APP_AWS_USER_POOL_ID!,
        idToken,
      },
      languageCode: 'en-US',
    })

  return (
    <div>
      <p>{transcript || 'Click the button and start speaking...'}</p>
      <button onClick={listening ? stopListening : startListening} disabled={!isMicrophoneAvailable}>
        {listening ? 'Stop' : 'Record'}
      </button>
      <button onClick={resetTranscript}>Clear</button>
    </div>
  )
}

API Reference

useTranscribe(options)

The main hook. Call it at the top level of your React component.

Parameters

interface UseTranscribeOptions {
  config: TranscribeConfig
  languageCode?: string  // Default: "en-US"
}

interface TranscribeConfig {
  region: string          // AWS region, e.g. "us-east-1"
  identityPoolId: string  // Cognito Identity Pool ID
  userPoolId: string      // Cognito User Pool ID
  idToken: string         // Cognito ID token from your auth layer
}

| Param | Type | Required | Description | |-------|------|----------|-------------| | config.region | string | Yes | AWS region where your Identity Pool and Transcribe are available | | config.identityPoolId | string | Yes | Cognito Identity Pool ID (e.g., us-east-1:xxxxxxxx-xxxx-...) | | config.userPoolId | string | Yes | Cognito User Pool ID (e.g., us-east-1_XXXXXXXXX) | | config.idToken | string | Yes | A valid Cognito ID token for the authenticated user | | languageCode | string | No | BCP-47 language code. Defaults to "en-US". See Supported Languages |

Return Value

interface UseTranscribeReturn {
  transcript: string
  listening: boolean
  isMicrophoneAvailable: boolean
  startListening: () => Promise<void>
  stopListening: () => Promise<void>
  abortListening: () => void
  resetTranscript: () => void
}

| Property | Type | Description | |----------|------|-------------| | transcript | string | The current transcription text. Updates in real-time with partial (interim) results and accumulates final results. Resets when startListening is called again | | listening | boolean | true while the microphone is active and audio is being streamed to Transcribe | | isMicrophoneAvailable | boolean | Starts as true. Becomes false if the user denies microphone permission | | startListening | () => Promise<void> | Requests mic access, starts audio capture, connects to Transcribe, and begins streaming. Clears any previous transcript. If already listening, this is a no-op | | stopListening | () => Promise<void> | Gracefully stops audio capture, closes the Transcribe WebSocket stream, and releases the microphone. Safe to call even if not listening | | abortListening | () => void | Synchronous, fire-and-forget version of stopListening. Useful in cleanup code or event handlers where you can't await | | resetTranscript | () => void | Clears the transcript to an empty string without stopping the mic |

Exported Types

import type {
  TranscribeConfig,
  UseTranscribeOptions,
  UseTranscribeReturn,
} from '@gdnaio/react-transcribe-streaming'

Usage Examples

Getting the ID Token

The hook requires a Cognito ID token. Here's how to get it from common auth libraries:

AWS Amplify v6:

import { fetchAuthSession } from 'aws-amplify/auth'

const session = await fetchAuthSession()
const idToken = session.tokens?.idToken?.toString() ?? ''

AWS Amplify v5:

import { Auth } from 'aws-amplify'

const session = await Auth.currentSession()
const idToken = session.getIdToken().getJwtToken()

@gdnaio/cognito-auth:

import { useAuth } from '@gdnaio/cognito-auth'

const { getIdToken } = useAuth()
const idToken = await getIdToken()

amazon-cognito-identity-js:

const cognitoUser = userPool.getCurrentUser()
cognitoUser.getSession((err, session) => {
  const idToken = session.getIdToken().getJwtToken()
})

Complete Working Example

A full component with token fetching, error states, and visual feedback:

import { useState, useEffect } from 'react'
import { useTranscribe } from '@gdnaio/react-transcribe-streaming'

const CONFIG = {
  region: 'us-east-1',
  identityPoolId: 'us-east-1:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx',
  userPoolId: 'us-east-1_XXXXXXXXX',
}

interface Props {
  getIdToken: () => Promise<string | null>
  language?: string
}

export default function VoiceTranscriber({ getIdToken, language = 'en-US' }: Props) {
  const [idToken, setIdToken] = useState('')

  useEffect(() => {
    getIdToken().then((token) => {
      if (token) setIdToken(token)
    })
  }, [getIdToken])

  const {
    transcript,
    listening,
    isMicrophoneAvailable,
    startListening,
    stopListening,
    resetTranscript,
  } = useTranscribe({
    config: { ...CONFIG, idToken },
    languageCode: language,
  })

  if (!isMicrophoneAvailable) {
    return <p>Microphone access was denied. Please allow microphone access and reload the page.</p>
  }

  return (
    <div>
      <div style={{ minHeight: 60, padding: 12, border: '1px solid #ccc', borderRadius: 8 }}>
        {transcript || <span style={{ color: '#999' }}>Click Record and start speaking...</span>}
      </div>

      <div style={{ marginTop: 8, display: 'flex', gap: 8 }}>
        <button
          onClick={listening ? stopListening : startListening}
          style={{
            padding: '8px 16px',
            background: listening ? '#ea4335' : '#1a73e8',
            color: 'white',
            border: 'none',
            borderRadius: 4,
            cursor: 'pointer',
          }}
        >
          {listening ? 'Stop Recording' : 'Start Recording'}
        </button>
        <button onClick={resetTranscript} style={{ padding: '8px 16px' }}>
          Clear
        </button>
      </div>

      {listening && (
        <p style={{ color: '#ea4335', marginTop: 8 }}>
          Listening...
        </p>
      )}
    </div>
  )
}

Dynamic Language Switching

function MultiLanguageInput() {
  const [language, setLanguage] = useState('en-US')
  const { transcript, listening, startListening, stopListening } = useTranscribe({
    config: { /* ... */ },
    languageCode: language,
  })

  return (
    <div>
      <select value={language} onChange={(e) => setLanguage(e.target.value)}>
        <option value="en-US">English</option>
        <option value="ar-SA">Arabic</option>
        <option value="fr-FR">French</option>
        <option value="es-US">Spanish</option>
        <option value="hi-IN">Hindi</option>
      </select>
      <button onClick={listening ? stopListening : startListening}>
        {listening ? 'Stop' : 'Record'}
      </button>
      <p>{transcript}</p>
    </div>
  )
}

Note: Changing languageCode while actively listening does not restart the stream automatically. Stop and start listening again to switch languages mid-session.

Appending Speech to Existing Text

function ChatInput() {
  const [text, setText] = useState('')
  const baseTextRef = useRef(text)
  const { transcript, listening, startListening, stopListening } =
    useTranscribe({ config: { /* ... */ } })

  // When starting, capture the current text as the base
  const handleStart = async () => {
    baseTextRef.current = text
    await startListening()
  }

  // Append transcript to the base text
  useEffect(() => {
    if (listening && transcript) {
      setText(baseTextRef.current + transcript)
    }
  }, [transcript, listening])

  return (
    <div>
      <input value={text} onChange={(e) => setText(e.target.value)} />
      <button onClick={listening ? stopListening : handleStart}>
        {listening ? 'Stop' : 'Mic'}
      </button>
    </div>
  )
}

Supported Languages

The hook includes a built-in language code mapper. Pass any of these BCP-47 codes as languageCode:

| Code | Language | Transcribe Code | |------|----------|-----------------| | en-US | English (US) | en-US | | en-GB | English (UK) | en-GB | | en-AU | English (Australia) | en-AU | | ar-SA | Arabic (Saudi Arabia) | ar-SA | | ar-AE | Arabic (UAE) | ar-AE | | fr-FR | French (France) | fr-FR | | fr-CA | French (Canada) | fr-CA | | es-ES | Spanish (Spain) | es-ES | | es-US | Spanish (US) | es-US | | es-LA | Spanish (Latin America) | es-US * | | de-DE | German | de-DE | | it-IT | Italian | it-IT | | pt-BR | Portuguese (Brazil) | pt-BR | | ja-JP | Japanese | ja-JP | | ko-KR | Korean | ko-KR | | zh-CN | Chinese (Mandarin) | zh-CN | | hi-IN | Hindi | hi-IN |

* es-LA is mapped to es-US since AWS Transcribe doesn't have a dedicated Latin American Spanish code.

Any unlisted code is passed through to Transcribe as-is. See AWS Transcribe Streaming supported languages for the full list of supported codes.

How It Works

User clicks "Start"
      |
      v
getUserMedia() ---- requests microphone access
      |
      v
AudioContext + ScriptProcessorNode ---- captures raw audio at native sample rate
      |
      v
Float32 -> Int16 PCM (little-endian) ---- converts to Transcribe-compatible format
      |
      v
fromCognitoIdentityPool() ---- exchanges Cognito ID token for temporary AWS credentials
      |
      v
TranscribeStreamingClient.send() ---- opens WebSocket to AWS Transcribe
      |
      v
Audio chunks streamed as async generator ---- yields { AudioEvent: { AudioChunk } }
      |
      v
TranscriptResultStream (async iterable) ---- receives partial and final transcript events
      |
      v
React state updates ---- transcript updates in real-time
  1. Microphone capture — calls getUserMedia with echo cancellation and noise suppression enabled, creates an AudioContext and ScriptProcessorNode to capture raw audio frames
  2. PCM encoding — converts Float32 audio samples to 16-bit signed integer PCM in little-endian byte order, the format AWS Transcribe expects
  3. Credential exchange — uses fromCognitoIdentityPool from @aws-sdk/credential-providers to exchange the Cognito ID token for temporary AWS credentials scoped to transcribe:StartStreamTranscription
  4. WebSocket streaming — creates a TranscribeStreamingClient and sends a StartStreamTranscriptionCommand with the audio stream as an async generator. Transcribe opens a WebSocket connection under the hood
  5. Transcript processing — iterates the TranscriptResultStream async iterable. Partial results update the transcript state immediately (giving real-time feedback). Final results are accumulated so the full transcript builds up over the session

Cleanup and Resource Management

The hook automatically cleans up resources when:

  • You call stopListening() or abortListening()
  • An error occurs (network drop, credential expiry, etc.)
  • The Transcribe stream ends naturally

Cleanup includes:

  • Disconnecting the ScriptProcessorNode and AudioContext
  • Stopping all microphone MediaStream tracks (the browser's mic indicator turns off)
  • Aborting the Transcribe WebSocket stream
  • Setting listening to false

Note: The hook does not auto-stop on component unmount. If you need that, call abortListening in a cleanup effect:

useEffect(() => {
  return () => abortListening()
}, [abortListening])

Browser Compatibility

| Browser | Minimum Version | Notes | |---------|----------------|-------| | Chrome | 66+ | Full support | | Firefox | 76+ | Full support | | Safari | 14.1+ | Full support | | Edge | 79+ | Full support (Chromium-based) | | Mobile Chrome | 66+ | Full support | | Mobile Safari | 14.5+ | Full support |

Requires navigator.mediaDevices.getUserMedia and AudioContext APIs. Both are available in all modern browsers. HTTPS is required in production (localhost is exempt for development).

Error Handling

The hook handles errors internally and never throws. All errors are logged to console.error with a [useTranscribe] prefix.

| Scenario | What Happens | |----------|-------------| | Microphone permission denied | isMicrophoneAvailable becomes false. You can use this to show a message or disable the button | | Microphone already in use | startListening fails silently, error logged. listening stays false | | Network disconnection | The Transcribe WebSocket closes. listening becomes false. Last transcript is preserved. User can click to restart | | Expired or invalid ID token | Credential exchange fails, error logged. listening becomes false. Refresh the token and try again | | AWS service error | Error logged. listening becomes false. Check the console for details | | Unsupported browser | getUserMedia throws, caught and logged. listening stays false |

Bundle Size

The package itself is small (~6 KB). However, the AWS SDK dependencies add to the bundle:

| Dependency | Approximate Size (gzipped) | |------------|---------------------------| | @aws-sdk/client-transcribe-streaming | ~30 KB | | @aws-sdk/credential-providers | ~20 KB | | Total addition | ~50 KB gzipped |

The AWS SDK v3 is tree-shakeable — only the Transcribe Streaming client and Cognito credential provider are included in your bundle, not the entire SDK.

Troubleshooting

"Microphone not available" / isMicrophoneAvailable is false

  • The user denied microphone permission in the browser
  • Fix: Click the lock/camera icon in the browser's address bar, allow microphone access, and reload the page

"NotAllowedError" in console

  • Your app is served over HTTP (not HTTPS) in production
  • Fix: Serve your app over HTTPS. localhost is exempt for development

"No audio is being captured" / transcript stays empty

  • Check that your AWS credentials are valid — look for errors in the browser console
  • Verify the Identity Pool ID, User Pool ID, and region are correct
  • Ensure the IAM role has transcribe:StartStreamTranscription permission
  • Try a different microphone or check system audio settings

"CredentialsProviderError" or "Not authorized"

  • The Cognito ID token may be expired. Refresh it before calling startListening
  • The Identity Pool may not be linked to your User Pool. Verify the authentication provider in the Identity Pool settings
  • The authenticated IAM role may be missing the Transcribe permission

Next.js: "navigator is not defined" or "AudioContext is not defined"

  • The component is being server-side rendered. Browser APIs don't exist on the server
  • Fix: Add 'use client' at the top of the file, or use dynamic(() => import(...), { ssr: false })

Transcript has long delays

  • This is usually a network latency issue. AWS Transcribe Streaming requires a stable connection
  • Partial results should appear within 200-500ms of speaking. If they don't, check your network connection
  • The sample rate is auto-detected from your AudioContext. Higher sample rates (48kHz) produce better quality but more data

Language not recognized correctly

  • Ensure you're passing the correct BCP-47 code (e.g., ar-SA, not ar or arabic)
  • Some languages require region-specific codes. Check the Supported Languages table
  • Changing languageCode while listening does not take effect until you stop and restart

License

MIT