@superapp_men/speech-to-text

v1.1.2

Published

a month ago

Real-time speech recognition for SuperApp Partner Apps

0High
0Medium
0Low

hihsame_afifi

speech-to-text voice-recognition speech-recognition transcription superapp partner-apps capacitor

@superapp_men/speech-to-text

Speech recognition for SuperApp partner apps. The partner app starts listening, the user speaks, the partner app stops listening and receives the transcribed text.

Install

npm install @superapp_men/speech-to-text

How It Works

Partner App (iframe)             SuperApp (Capacitor host)
       |                                  |
       |-- startListening(config) ------> |-- starts native mic
       |                                  |
       |   (user speaks...)               |   (Android recognizes speech)
       |                                  |
       |-- stopListening() -------------> |-- stops native mic
       |                                  |-- waits for final result
       | <-- final transcript ----------- |

The partner app controls when to start and stop
The SuperApp handles native speech recognition via Android's SpeechRecognizer
When the user pauses, the recognizer captures that segment and silently restarts
All segments are accumulated into one final transcript returned when you stop

Quick Start

import {
  SpeechToText,
  RecognitionState,
  Language,
} from "@superapp_men/speech-to-text";

const stt = new SpeechToText({ timeout: 10000, debug: true });

// Request permission
const permission = await stt.requestPermission();
if (permission !== "granted") return;

// Start listening — mic opens, user speaks
await stt.startListening({
  language: Language.AR_MA,
  stopMode: "manual",
  partialResults: false,
  popup: false,
  maxAlternatives: 1,
});

// ... user speaks for as long as they want ...

// Stop listening — returns the full transcript
const result = await stt.stopListening();
console.log(result.transcript); // "everything the user said"

Config

await stt.startListening({
  language: "ar-MA",        // Language code (default: "en-US")
  stopMode: "manual",       // "manual" — you call stopListening() when done
  partialResults: false,     // false — no interim text, only final result on stop
  popup: false,              // false — no native OS popup, partner app manages UI
  maxAlternatives: 1,        // Number of alternative transcriptions (default: 1)
  maxDuration: 30000,        // Safety timeout in ms (optional, default: none)
});

React Example

import { useEffect, useState, useMemo } from "react";
import {
  SpeechToText,
  RecognitionState,
  Language,
  type RecognitionResult,
} from "@superapp_men/speech-to-text";

function SpeechRecorder() {
  const [instanceKey, setInstanceKey] = useState(0);
  const stt = useMemo(
    () => new SpeechToText({ timeout: 10000, debug: true }),
    [instanceKey]
  );

  const [state, setState] = useState(RecognitionState.IDLE);
  const [transcript, setTranscript] = useState("");
  const [error, setError] = useState<string | null>(null);

  useEffect(() => {
    const unsubs = [
      stt.on("stateChange", ({ state }) => setState(state)),
      stt.on("result", ({ result }) => setTranscript(result.transcript)),
      stt.on("error", ({ message }) => setError(message)),
      stt.on("listeningStopped", () => setInstanceKey((k) => k + 1)),
    ];
    return () => { unsubs.forEach((u) => u()); stt.destroy(); };
  }, [stt]);

  const isListening =
    state === RecognitionState.LISTENING ||
    state === RecognitionState.STARTING;

  const handleStart = async () => {
    setError(null);
    setTranscript("");
    const p = await stt.requestPermission();
    if (p !== "granted") { setError("Permission denied"); return; }

    await stt.startListening({
      language: Language.AR_MA,
      stopMode: "manual",
      partialResults: false,
      popup: false,
    });
  };

  const handleStop = async () => {
    const result = await stt.stopListening();
    if (result) setTranscript(result.transcript);
  };

  return (
    <div>
      <p>State: {state}</p>
      {error && <p style={{ color: "red" }}>{error}</p>}

      <button onClick={handleStart} disabled={isListening}>
        {isListening ? "Listening..." : "Start"}
      </button>
      <button onClick={handleStop} disabled={!isListening}>
        Stop
      </button>

      {transcript && (
        <div>
          <p>Transcript:</p>
          <p dir="auto"><strong>{transcript}</strong></p>
        </div>
      )}
    </div>
  );
}

API

`new SpeechToText(config?)`

| Option | Type | Default | Description | | -------- | --------- | ------- | ---------------------- | | timeout | number | 5000 | Request timeout (ms) | | debug | boolean | false | Enable console logging |

Methods

| Method | Returns | Description | | -------------------- | --------------------------- | ----------------------------------- | | isAvailable() | Promise<boolean> | Can the device do speech recognition? | | checkPermission() | Promise<PermissionStatus> | Current mic permission status | | requestPermission()| Promise<PermissionStatus> | Ask the user for mic access | | getSupportedLanguages() | Promise<string[]> | Available language codes | | startListening(config) | Promise<void> | Start the mic | | stopListening() | Promise<RecognitionResult> | Stop the mic, get the transcript | | getState() | RecognitionState | Current state | | isListening() | boolean | Quick check | | on(event, callback)| () => void | Subscribe (returns unsubscribe fn) | | destroy() | void | Cleanup |

Events

| Event | Payload | Description | | ------------------ | ------------------------------ | ------------------------ | | stateChange | { state, previousState } | State changed | | listeningStarted | { sessionId, config } | Mic opened | | listeningStopped | { sessionId, duration } | Mic closed | | result | { result: RecognitionResult }| Final transcript arrived | | error | { code, message } | Something went wrong |

`RecognitionResult`

{
  transcript: string;    // The transcribed text
  confidence: number;    // 0-1 confidence score
  isFinal: boolean;      // Always true
  timestamp: number;     // When the result was generated
}

Languages

enum Language {
  EN_US = "en-US",
  ES_ES = "es-ES",
  FR_FR = "fr-FR",
  AR_SA = "ar-SA",
  AR_MA = "ar-MA",
}

You can also pass any language code string (e.g. "zgh-MA").

States

enum RecognitionState {
  IDLE, STARTING, LISTENING, PROCESSING, STOPPED, ERROR
}

Important Notes

After each stopListening(), create a new SpeechToText instance for the next session. The native mic needs a fresh acquisition on Android.
The maxDuration config is a safety timeout — if the user forgets to stop, recognition ends automatically after that time.
Long speech works: the native recognizer silently restarts on pauses and accumulates all text into one final result.

For SuperApp Developers

Import from the /superapp entry point:

import {
  MessageType,
  RecognitionState,
  type SuperAppMessage,
  type StartListeningPayload,
} from "@superapp_men/speech-to-text/superapp";

See SpeechToTextPackageService.ts for the reference implementation.

License

MIT

Support

Email: [email protected]

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

@superapp_men/speech-to-text

Install

How It Works

Quick Start

Config

React Example

API

new SpeechToText(config?)

Methods

Events

RecognitionResult

Languages

States

Important Notes

For SuperApp Developers

License

Support

`new SpeechToText(config?)`

`RecognitionResult`