npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

@gdnaio/react-polly-text-to-speech

v1.0.2

Published

React hook for text-to-speech using Amazon Polly with secure Cognito-based authentication

Readme

@gdnaio/react-polly-text-to-speech

React hook for text-to-speech using Amazon Polly with secure Cognito-based authentication.

No API keys in the browser. No third-party TTS services. Just AWS Polly called securely through Cognito Identity Pool credentials — the same pattern used by @gdnaio/react-transcribe-streaming.

Features

  • Single-hook APIusePollyTextToSpeech returns speak/stop controls, loading/playing state, and audio data
  • Secure by default — Uses Cognito Identity Pool for temporary AWS credentials (no keys in frontend)
  • All Polly engines — Standard, Neural, Long-form, and Generative
  • SSML builder — Built-in ssml utility for pauses, emphasis, prosody, whispering, and more
  • Voice catalogue — Curated voice map with getVoicesByLanguage() and getVoiceInfo() helpers
  • Configurable output — MP3, OGG, PCM with custom sample rates
  • No hidden audio — The hook does NOT create its own Audio object; you control playback via your own <audio> element and the ref callback
  • Client caching — The PollyClient and AWS credentials are cached at module level across all hook instances; only the very first call triggers Cognito round-trips
  • Audio Blob exposed — Use audioUrl with your own <audio> player or download the blob
  • TypeScript — Full type definitions included
  • Lightweight — Only depends on @aws-sdk/client-polly and @aws-sdk/credential-providers

Installation

npm install @gdnaio/react-polly-text-to-speech

Both ESM and CommonJS builds are included. TypeScript declarations ship with the package.

AWS Prerequisites

You need three things (identical setup to @gdnaio/react-transcribe-streaming):

1. Cognito User Pool

Your existing User Pool for authenticating users.

2. Cognito Identity Pool

Link it to your User Pool. This provides temporary AWS credentials to the browser.

3. IAM Role for Authenticated Users

Attach this inline policy to the Identity Pool's authenticated role:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": ["polly:SynthesizeSpeech"],
      "Resource": "*"
    }
  ]
}

Tip: If you already have a Transcribe Identity Pool stack, you can add the polly:SynthesizeSpeech permission to the same IAM role and reuse the Identity Pool.

Quick Start

import { usePollyTextToSpeech } from '@gdnaio/react-polly-text-to-speech'

function TextToSpeechButton({ idToken }: { idToken: string }) {
  const { speak, stop, ref, loading, playing, error, audioUrl } = usePollyTextToSpeech({
    config: {
      region: 'us-east-1',
      identityPoolId: 'us-east-1:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx',
      userPoolId: 'us-east-1_XXXXXXXXX',
      idToken, // from your auth provider
    },
  })

  return (
    <div>
      <button onClick={() => speak('Hello! Welcome to our application.')} disabled={loading}>
        {loading ? 'Generating...' : 'Speak'}
      </button>
      {playing && <button onClick={stop}>Stop</button>}
      {error && <p style={{ color: 'red' }}>{error}</p>}

      {/* IMPORTANT: attach ref so the hook can track play/pause/duration */}
      {audioUrl && <audio ref={ref} autoPlay controls src={audioUrl} />}
    </div>
  )
}

API Reference

usePollyTextToSpeech(options)

Options

{
  // Required — AWS Cognito credentials
  config: {
    region: string            // AWS region (e.g. "us-east-1")
    identityPoolId: string    // Cognito Identity Pool ID
    userPoolId: string        // Cognito User Pool ID
    idToken: string           // JWT ID token from your auth provider
  },

  // Optional — voice settings
  voice: {
    voiceId?: string          // Polly voice (default: "Joanna")
    engine?: PollyEngine      // "standard" | "neural" | "long-form" | "generative" (default: "neural")
    languageCode?: string     // Only needed for bilingual voices (e.g. "hi-IN" for Aditi)
  },

  // Optional — audio output settings
  audio: {
    format?: PollyOutputFormat    // "mp3" | "ogg_vorbis" | "pcm" (default: "mp3")
    sampleRate?: PollySampleRate  // "8000" | "16000" | "22050" | "24000" | "44100" | "48000"
    lexiconNames?: string[]       // Custom pronunciation lexicons (max 5)
    speechMarkTypes?: PollySpeechMarkType[]  // "sentence" | "ssml" | "viseme" | "word"
  }
}

Return Value

{
  speak: (text: string, textType?: 'text' | 'ssml') => Promise<void>
  stop: () => void
  ref: (el: HTMLAudioElement | null) => void  // callback ref for your <audio> element
  loading: boolean          // true while API call is in-flight
  playing: boolean          // true while audio is playing
  error: string | null      // last error message, or null
  audioBlob: Blob | null    // raw audio Blob from last synthesis
  audioUrl: string | null   // Object URL for <audio src> usage
  duration: number | null   // audio duration in seconds (after metadata loads)
}

Configuring Voices

// Neural voice (default) — natural sounding
const tts = usePollyTextToSpeech({
  config,
  voice: { voiceId: 'Matthew', engine: 'neural' },
})

// Generative voice — most expressive
const tts = usePollyTextToSpeech({
  config,
  voice: { voiceId: 'Ruth', engine: 'generative' },
})

// Long-form voice — optimised for articles/stories
const tts = usePollyTextToSpeech({
  config,
  voice: { voiceId: 'Danielle', engine: 'long-form' },
})

// Spanish voice
const tts = usePollyTextToSpeech({
  config,
  voice: { voiceId: 'Lupe', engine: 'neural', languageCode: 'es-US' },
})

Using SSML

For fine-grained control over speech output, use the built-in ssml builder:

import { usePollyTextToSpeech, ssml } from '@gdnaio/react-polly-text-to-speech'

function SsmlExample() {
  const { speak } = usePollyTextToSpeech({ config })

  const handleSpeak = () => {
    const text = ssml.speak(
      ssml.sentence('Hello there!') +
      ssml.pause('500ms') +
      ssml.emphasis('This is really important.', 'strong') +
      ssml.pause('300ms') +
      ssml.prosody('And this part is spoken slowly.', { rate: 'slow' }) +
      ssml.pause('200ms') +
      ssml.whisper('This is a secret.')
    )

    speak(text, 'ssml')
  }

  return <button onClick={handleSpeak}>Speak with SSML</button>
}

SSML Builder Methods

| Method | Description | Example | |--------|-------------|---------| | ssml.speak(content) | Wrap in <speak> root | ssml.speak('Hello') | | ssml.pause(time) | Insert a break | ssml.pause('500ms') | | ssml.emphasis(text, level) | Emphasise text | ssml.emphasis('wow', 'strong') | | ssml.prosody(text, opts) | Control rate/pitch/volume | ssml.prosody('slow', { rate: 'slow' }) | | ssml.paragraph(text) | Paragraph with natural pause | ssml.paragraph('First para.') | | ssml.sentence(text) | Sentence boundary | ssml.sentence('A sentence.') | | ssml.sayAs(text, type) | Interpret as date/number/etc | ssml.sayAs('2025', 'cardinal') | | ssml.phoneme(text, ph) | Phonemic pronunciation | ssml.phoneme('pecan', 'pɪˈkɑːn') | | ssml.sub(text, alias) | Substitution | ssml.sub('AWS', 'Amazon Web Services') | | ssml.lang(text, lang) | Switch language mid-speech | ssml.lang('Bonjour', 'fr-FR') | | ssml.whisper(text) | Whispering voice | ssml.whisper('secret') | | ssml.amazonEffect(text, name) | Polly-specific effects | ssml.amazonEffect('news', 'drc') |

Voice Discovery

Browse available voices with the included catalogue:

import { getVoicesByLanguage, getVoiceInfo, POLLY_VOICES } from '@gdnaio/react-polly-text-to-speech'

// Get all English (US) voices
const usVoices = getVoicesByLanguage('en-US')
// → [{ voiceId: 'Joanna', name: 'Joanna', gender: 'Female', engines: [...] }, ...]

// Look up a specific voice
const info = getVoiceInfo('Matthew')
// → { voiceId: 'Matthew', name: 'Matthew', gender: 'Male', engines: ['neural', 'standard'] }

// Access the full catalogue
console.log(Object.keys(POLLY_VOICES))
// → ['en-US', 'en-GB', 'en-AU', 'en-IN', 'es-US', 'es-ES', 'fr-FR', ...]

Using ref with Your Own Audio Player

The hook does not play audio internally. It synthesises audio and returns audioUrl — you render the <audio> element and pass the ref callback so the hook can track playing, duration, and stop() state.

function CustomPlayer() {
  const { speak, stop, ref, audioUrl, playing, duration, loading } = usePollyTextToSpeech({ config })

  return (
    <div>
      <button onClick={() => speak('Hello world')} disabled={loading}>
        Generate Audio
      </button>
      {playing && <button onClick={stop}>Stop</button>}

      {/* ref is required — without it, playing/duration/stop() won't work */}
      {audioUrl && <audio ref={ref} autoPlay controls src={audioUrl} />}

      {duration && <p>Duration: {duration.toFixed(1)}s</p>}
    </div>
  )
}

Why ref matters

The ref callback is how the hook connects to the actual <audio> DOM element. Without it:

  • playing will always be false
  • duration will always be null
  • stop() will have no effect (there's no element to pause)
// WRONG — hook can't track the audio element
{audioUrl && <audio autoPlay controls src={audioUrl} />}

// CORRECT — hook is connected to the element
{audioUrl && <audio ref={ref} autoPlay controls src={audioUrl} />}

Using with Vite

const { speak } = usePollyTextToSpeech({
  config: {
    region: import.meta.env.VITE_AWS_REGION,
    identityPoolId: import.meta.env.VITE_AWS_IDENTITY_POOL_ID,
    userPoolId: import.meta.env.VITE_AWS_USER_POOL_ID,
    idToken: token, // from your auth hook
  },
  voice: { voiceId: 'Joanna', engine: 'neural' },
  audio: { format: 'mp3' },
})

Using with Next.js

Since this hook uses browser APIs (Audio, URL.createObjectURL), use dynamic import with SSR disabled:

// components/TtsButton.tsx
'use client'

import { usePollyTextToSpeech } from '@gdnaio/react-polly-text-to-speech'

export function TtsButton({ idToken }: { idToken: string }) {
  const { speak, ref, audioUrl, loading } = usePollyTextToSpeech({
    config: {
      region: process.env.NEXT_PUBLIC_AWS_REGION!,
      identityPoolId: process.env.NEXT_PUBLIC_AWS_IDENTITY_POOL_ID!,
      userPoolId: process.env.NEXT_PUBLIC_AWS_USER_POOL_ID!,
      idToken,
    },
  })

  return (
    <div>
      <button onClick={() => speak('Hello from Next.js')} disabled={loading}>
        Speak
      </button>
      {audioUrl && <audio ref={ref} autoPlay controls src={audioUrl} />}
    </div>
  )
}

Token Retrieval

The hook needs a Cognito ID token (not access token). Common patterns:

// @gdnaio/cognito-auth
const { getIdToken } = useAuth()
const token = await getIdToken()

// AWS Amplify v6
import { fetchAuthSession } from 'aws-amplify/auth'
const { tokens } = await fetchAuthSession()
const token = tokens?.idToken?.toString()

// amazon-cognito-identity-js
cognitoUser.getSession((err, session) => {
  const token = session.getIdToken().getJwtToken()
})

Audio Output Formats

| Format | MIME Type | Use Case | |--------|-----------|----------| | mp3 (default) | audio/mpeg | Best browser compatibility, small file size | | ogg_vorbis | audio/ogg | Open format, good quality-to-size ratio | | pcm | audio/pcm | Raw audio for processing pipelines |

Supported Voices

The built-in POLLY_VOICES catalogue includes the following voices. You can pass any valid Polly VoiceId directly to the hook — the catalogue is a convenience helper, not a restriction.

For the full and most up-to-date list of all voices, engines, and languages, see the Amazon Polly Voice List.

English (US) — en-US

| Voice | Gender | Engines | | --- | --- | --- | | Joanna | Female | neural, standard, long-form | | Matthew | Male | neural, standard | | Ruth | Female | neural, long-form, generative | | Stephen | Male | neural, long-form, generative | | Danielle | Female | neural, long-form, generative | | Gregory | Male | neural, long-form, generative | | Ivy | Female | neural, standard | | Kendra | Female | neural, standard | | Kimberly | Female | neural, standard | | Salli | Female | neural, standard | | Joey | Male | neural, standard | | Justin | Male | neural, standard | | Kevin | Male | neural, standard |

English (GB) — en-GB

| Voice | Gender | Engines | | --- | --- | --- | | Amy | Female | neural, standard | | Emma | Female | neural, standard | | Brian | Male | neural, standard | | Arthur | Male | neural |

English (AU) — en-AU

| Voice | Gender | Engines | | --- | --- | --- | | Olivia | Female | neural | | Nicole | Female | standard | | Russell | Male | standard |

English (IN) — en-IN

| Voice | Gender | Engines | | --- | --- | --- | | Kajal | Female | neural | | Aditi | Female | standard | | Raveena | Female | standard |

Spanish (US) — es-US

| Voice | Gender | Engines | | --- | --- | --- | | Lupe | Female | neural, standard | | Pedro | Male | neural | | Penelope | Female | standard | | Miguel | Male | standard |

Spanish (ES) — es-ES

| Voice | Gender | Engines | | --- | --- | --- | | Lucia | Female | neural, standard | | Sergio | Male | neural | | Enrique | Male | standard | | Conchita | Female | standard |

Spanish (MX) — es-MX

| Voice | Gender | Engines | | --- | --- | --- | | Mia | Female | standard | | Andres | Male | neural |

French (FR) — fr-FR

| Voice | Gender | Engines | | --- | --- | --- | | Léa | Female | neural, standard | | Rémi | Male | neural | | Mathieu | Male | standard | | Céline | Female | standard |

French (CA) — fr-CA

| Voice | Gender | Engines | | --- | --- | --- | | Gabrielle | Female | neural | | Chantal | Female | standard |

German — de-DE

| Voice | Gender | Engines | | --- | --- | --- | | Vicki | Female | neural, standard | | Daniel | Male | neural | | Hans | Male | standard | | Marlene | Female | standard |

Italian — it-IT

| Voice | Gender | Engines | | --- | --- | --- | | Bianca | Female | neural, standard | | Adriano | Male | neural | | Carla | Female | standard | | Giorgio | Male | standard |

Portuguese (BR) — pt-BR

| Voice | Gender | Engines | | --- | --- | --- | | Camila | Female | neural, standard | | Vitória | Female | neural, standard | | Thiago | Male | neural | | Ricardo | Male | standard |

Japanese — ja-JP

| Voice | Gender | Engines | | --- | --- | --- | | Kazuha | Female | neural, long-form | | Tomoko | Female | neural, long-form | | Takumi | Male | neural, standard | | Mizuki | Female | standard |

Korean — ko-KR

| Voice | Gender | Engines | | --- | --- | --- | | Seoyeon | Female | neural, standard |

Chinese (Mandarin) — cmn-CN

| Voice | Gender | Engines | | --- | --- | --- | | Zhiyu | Female | neural, standard |

Hindi — hi-IN

| Voice | Gender | Engines | | --- | --- | --- | | Kajal | Female | neural | | Aditi | Female | standard |

Arabic (UAE) — ar-AE

| Voice | Gender | Engines | | --- | --- | --- | | Hala | Female | neural | | Zayd | Male | neural |

Arabic (Standard) — arb

| Voice | Gender | Engines | | --- | --- | --- | | Zeina | Female | standard |

Browser Support

| Browser | Minimum Version | |---------|----------------| | Chrome | 66+ | | Firefox | 76+ | | Safari | 14.1+ | | Edge | 79+ |

Gotchas and Common Mistakes

1. Forgetting ref on the <audio> element

This is the most common mistake. Without ref={ref}, the hook has no connection to the DOM audio element. playing, duration, and stop() will all be broken. Always pass the ref.

2. Multiple components = one shared client

The PollyClient and AWS credentials are cached at module level. If you render multiple usePollyTextToSpeech instances (e.g. one per chat message), they all share the same client. The Cognito GetId + GetCredentialsForIdentity API calls only happen once — every subsequent speak() call goes directly to Polly.

3. Pass idToken as state, not a stale string

The hook needs a valid Cognito ID token. If you pass an empty string or a stale token, the credential provider will fail. Fetch the token asynchronously and pass it via React state:

// WRONG — token is empty on first render, hook will error
const token = '' // or some stale value
const { speak } = usePollyTextToSpeech({ config: { ...rest, idToken: token } })

// CORRECT — fetch token, set state, then call speak()
const [idToken, setIdToken] = useState('')

useEffect(() => {
  getIdToken().then(setIdToken)
}, [])

const { speak } = usePollyTextToSpeech({ config: { ...rest, idToken } })

// Only call speak() after idToken is set
useEffect(() => {
  if (idToken) speak('Hello')
}, [idToken])

4. Conditionally rendered <audio> loses the ref

If your <audio> element is inside a conditional ({audioUrl && <audio ref={ref} ... />}), the ref detaches when audioUrl becomes null (e.g. on a new speak() call). This is expected — the hook handles ref attach/detach cleanly.

5. SSML text must use textType: 'ssml'

If your text contains SSML tags like <break> or <prosody>, you must pass 'ssml' as the second argument to speak(). Otherwise Polly will read the tags as literal text.

// WRONG — tags read out loud as text
speak('<speak>Hello <break time="500ms"/> world</speak>')

// CORRECT
speak('<speak>Hello <break time="500ms"/> world</speak>', 'ssml')

Error Handling

The hook catches errors and exposes them via the error state. It never throws. Common scenarios:

  • Invalid credentials — expired token or misconfigured Identity Pool
  • Invalid voice/engine combo — e.g. using long-form engine with a voice that doesn't support it
  • Text too long — Polly has a 3,000 character limit for SynthesizeSpeech (6,000 for SSML including tags)
  • Autoplay blocked — some browsers block audio.play() without user interaction

License

MIT