@mazka/react-speech-to-text

v1.1.0

Published

a year ago

A powerful, TypeScript-first React hook for speech recognition using the Web Speech API. This library provides a simple yet comprehensive interface for converting speech to text in React applications.

0High
0Medium
0Low

mazka

react speech-to-text speech-recognition web-speech-api hooks typescript react-hooks react-speech-to-text react-speech-recognition react-web-speech-api

React Speech-to-Text

Features

🎤 Easy-to-use React hook for speech recognition
🔧 TypeScript support with full type definitions
🌐 Browser compatibility detection with detailed error messages
⏱️ Auto-stop on silence with customizable duration and callback
🎯 Real-time interim results for better user experience
🔄 Continuous recognition support
🚨 Comprehensive error handling with specific error codes
🎨 Multiple language support
📱 Cross-platform compatibility (Chrome, Edge, Safari, Opera)

Important Note

The Web Speech API implementation varies by browser, with important implications for privacy and functionality:

⚠️ Server-based processing: On some browsers, such as Chrome, using Speech Recognition involves a server-based recognition engine where your audio is sent to a web service for recognition processing, so it won't work offline
🔒 Chrome/Chromium implementation: If you are using Chrome or Chromium-based browsers, the Web Speech API recognition feature currently relies on Google's servers
🌐 Internal browser detail: Chrome (and some other Chromium-based browsers) implement the recognition half of the Web Speech API by sending audio to Google's online speech service unless the user is offline. This is an internal detail of Chrome, not a requirement of the Web Speech API specification
📡 Internet connection required for most implementations
🛡️ Privacy considerations: Not all browsers support this API due to privacy and implementation concerns

Installation

npm install @mazka/react-speech-to-text

yarn add @mazka/react-speech-to-text

pnpm add @mazka/react-speech-to-text

Quick Start

import React from 'react';
import { useSpeechToText } from 'react-speech-to-text';

function App() {
  const {
    isListening,
    isSupported,
    transcript,
    startListening,
    stopListening,
    resetTranscript,
  } = useSpeechToText();

  if (!isSupported) {
    return <div>Speech recognition is not supported in your browser.</div>;
  }

  return (
    <div>
      <h1>Speech to Text Demo</h1>
      
      <div>
        <button 
          onClick={startListening} 
          disabled={isListening}
        >
          {isListening ? 'Listening...' : 'Start Listening'}
        </button>
        
        <button 
          onClick={stopListening} 
          disabled={!isListening}
        >
          Stop
        </button>
        
        <button onClick={resetTranscript}>
          Reset
        </button>
      </div>
      
      <div>
        <p><strong>Transcript:</strong> {transcript}</p>
      </div>
    </div>
  );
}

export default App;

API Reference

`useSpeechToText(options?)`

The main hook that provides speech-to-text functionality.

Parameters

options (optional): Partial<SpeechToTextOptions> - Configuration options

Returns

Returns a UseSpeechToTextReturn object with the following properties:

State Properties

| Property | Type | Description | |----------|------|-------------| | isListening | boolean | Whether speech recognition is currently active | | isSupported | boolean | Whether speech recognition is supported in the browser | | isInitializing | boolean | Whether the speech recognition is initializing | | transcript | string | Combined final and interim transcript | | interimTranscript | string | Current interim (temporary) transcript | | finalTranscript | string | Accumulated final transcript | | results | SpeechToTextResult[] | Array of all recognition results | | error | SpeechToTextStateError \| null | Current error state |

Action Methods

| Method | Type | Description | |--------|------|-------------| | startListening | (options?: Partial<SpeechToTextOptions>) => void | Start speech recognition | | stopListening | () => void | Stop speech recognition | | abortListening | () => void | Abort speech recognition immediately | | resetTranscript | () => void | Clear all transcripts and results | | clearError | () => void | Clear the current error state |

Configuration Options

`SpeechToTextOptions`

interface SpeechToTextOptions {
  continuous?: boolean;           // Default: true
  interimResults?: boolean;       // Default: true
  maxAlternatives?: number;       // Default: 1
  language?: string;              // Default: "en-US"
  autoStopOnSilence?: {
    enabled: boolean;             // Default: false
    silenceDuration?: number;     // Default: 3000 (ms)
    onAutoStop?: (transcript: string) => void; // Callback when auto-stop occurs
  };
}

Option Details

continuous: Whether to continue listening after the user stops speaking
interimResults: Whether to return interim results while the user is speaking
maxAlternatives: Maximum number of alternative transcripts to return
language: Language code for speech recognition (e.g., "en-US", "es-ES", "fr-FR")
autoStopOnSilence: Automatically stop listening after a period of silence
- enabled: Enable/disable the feature
- silenceDuration: Duration in milliseconds to wait before stopping
- onAutoStop: Callback function called when auto-stop is triggered

Advanced Examples

With Custom Options

import { useSpeechToText } from 'react-speech-to-text';

function AdvancedExample() {
  const {
    isListening,
    transcript,
    interimTranscript,
    finalTranscript,
    error,
    startListening,
    stopListening,
  } = useSpeechToText({
    continuous: true,
    interimResults: true,
    language: 'en-US',
    autoStopOnSilence: {
      enabled: true,
      silenceDuration: 5000, // 5 seconds
      onAutoStop: (transcript) => {
        console.log('Auto-stopped with transcript:', transcript);
      },
    },
  });

  return (
    <div>
      <button onClick={() => startListening()}>
        Start Listening
      </button>
      
      <button onClick={stopListening}>
        Stop Listening
      </button>
      
      {error && (
        <div style={{ color: 'red' }}>
          Error: {error.message}
        </div>
      )}
      
      <div>
        <p><strong>Final:</strong> {finalTranscript}</p>
        <p><strong>Interim:</strong> {interimTranscript}</p>
        <p><strong>Combined:</strong> {transcript}</p>
      </div>
    </div>
  );
}

Dynamic Language Switching

import { useSpeechToText } from 'react-speech-to-text';

function MultiLanguageExample() {
  const { startListening, stopListening, transcript, isListening } = useSpeechToText();

  const languages = [
    { code: 'en-US', name: 'English (US)' },
    { code: 'es-ES', name: 'Spanish (Spain)' },
    { code: 'fr-FR', name: 'French (France)' },
    { code: 'de-DE', name: 'German (Germany)' },
  ];

  const handleLanguageChange = (language: string) => {
    if (isListening) {
      stopListening();
    }
    startListening({ language });
  };

  return (
    <div>
      <div>
        {languages.map((lang) => (
          <button
            key={lang.code}
            onClick={() => handleLanguageChange(lang.code)}
          >
            {lang.name}
          </button>
        ))}
      </div>
      
      <div>
        <p>Transcript: {transcript}</p>
      </div>
    </div>
  );
}

Auto-Stop on Silence with Callback

import { useSpeechToText } from 'react-speech-to-text';

function AutoStopExample() {
  const { 
    isListening, 
    transcript, 
    startListening, 
    stopListening,
    resetTranscript 
  } = useSpeechToText({
    autoStopOnSilence: {
      enabled: true,
      silenceDuration: 2000, // Stop after 2 seconds of silence
      onAutoStop: (finalTranscript) => {
        console.log('Auto-stopped! Final transcript:', finalTranscript);
        // You can perform actions here like saving the transcript
      },
    },
  });

  return (
    <div>
      <h2>Auto-Stop Demo</h2>
      <p>Listening will automatically stop after 2 seconds of silence.</p>
      
      <button onClick={() => startListening()}>
        Start Listening
      </button>
      
      <button onClick={stopListening} disabled={!isListening}>
        Stop Manually
      </button>
      
      <button onClick={resetTranscript}>
        Clear
      </button>
      
      <div>
        <p>Status: {isListening ? 'Listening...' : 'Not listening'}</p>
        <p>Transcript: {transcript}</p>
      </div>
    </div>
  );
}

Browser Compatibility

The Web Speech API has limited browser support because it relies on their web speech API integration (chrome/chromium integrates with Google Cloud Service). Here's the current support status:

| Browser | Support | Notes | |---------|---------|-------| | Chrome | ✅ Full | Best support, all features available | | Edge | ✅ Full | Full Web Speech API support | | Safari | ✅ Full | iOS 14.5+, macOS Safari 14.1+ | | Opera | ⚠️ Limited | Basic support, some features may be limited | | Firefox | ❌ No | Does not support Web Speech API | | Brave | ❌ No | Disabled for privacy reasons |

📖 For detailed browser compatibility information, see the MDN Web Speech API Browser Compatibility page.

Why Limited Support?

Privacy Concerns: Some browsers (like Brave and Firefox) don't support the API due to privacy implications of server-based audio processing or lack of integration to Web Speech API.
Browser Implementation Differences: Chrome and Chromium-based browsers send audio to Google's servers as an internal implementation detail, while other browsers may handle it differently or not at all
Offline Limitations: Server-based recognition engines require internet connectivity, making the feature unavailable offline
Vendor Dependencies: Some browser vendors prefer to avoid dependencies on external speech services

The library automatically detects browser compatibility and provides detailed error messages for unsupported browsers.

Error Handling

The library provides comprehensive error handling with specific error codes:

interface SpeechToTextStateError {
  code: string;
  message: string;
  name: string;
  browserInfo?: {
    browserName: string;
    reason: string;
  };
}

Common Error Codes

NOT_SUPPORTED: Browser doesn't support Web Speech API
NOT_ALLOWED: Microphone permission denied
NO_SPEECH: No speech detected
AUDIO_CAPTURE: Audio capture failed
NETWORK: Network error occurred (server-based recognition requires internet connection)
INITIALIZATION_ERROR: Failed to initialize speech recognition
START_ERROR: Failed to start speech recognition

Privacy and Security Considerations

⚠️ Important Privacy Information:

Server-based processing: In Chrome and Chromium-based browsers, audio is sent to Google's servers for processing as an internal implementation detail
No offline support: Server-based recognition engines require an active internet connection
Browser-specific behavior: Different browsers may implement the Web Speech API differently, with varying privacy implications
User awareness: Always inform users that their audio may be processed by external services (particularly Google's servers in Chrome)
Consent mechanisms: Consider implementing user consent mechanisms before enabling speech recognition
Data handling: Be aware that audio data transmission and processing policies may vary by browser implementation

TypeScript Support

This library is written in TypeScript and provides full type definitions. All interfaces and types are exported for use in your application:

import { 
  useSpeechToText,
  SpeechToTextOptions,
  SpeechToTextResult,
  SpeechToTextState,
  UseSpeechToTextReturn 
} from 'react-speech-to-text';

Requirements

React 18.0.0 or higher
Modern browser with Web Speech API support
Internet connection (required for server-based speech recognition implementations)
User permission for microphone access

Contributing

Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

Built with the Web Speech API
Chrome implementation utilizes Google's cloud-based speech recognition service
Inspired by the need for a robust, TypeScript-first speech recognition solution for React

Support

If you encounter any issues or have questions, please open an issue on GitHub.