azure-speech-utilities

v1.0.0

Published

3 months ago

Provides a convenient abstraction layer over the Microsoft Cognitive Services Speech SDK, simplifying the integration of speech-to-text functionality into client applications. Using this npm package, developers can quickly integrate speech-to-text capabil

Downloads

0High
0Medium
0Low

utkarsh212

azure speech speechtotext cognitive multilingual

azure-speech-utilities

Provides a convenient abstraction layer over the Microsoft Cognitive Services Speech SDK, simplifying the integration of speech-to-text and text-to-speech functionality into client/browser applications. Using this package, developers can quickly integrate basic STT and TTS capabilities into their applications without the need to write intricate code.

Features:

Perform a single speech recognition operation with ease.
Enable continuous speech recognition for real-time applications.
Multilingual speech recognition.
Text to speech synthesis.
SSML/Text input for TTS.

Installing

Using npm:

npm install azure-speech-utilities

Function Description

CreateRecognizer

Creates a new speech recognizer instance.

RecognizeOnceAsync

Used for single-shot recognition, which recognizes a single utterance. The end of a single utterance is determined by listening for silence at the end or until a maximum of 15 seconds of audio is processed.

ContinuousRecognitionAsync

The previous function performs single-shot recognition, which recognizes a single utterance. In contrast, you can use continuous recognition to get a real-time recognized text stream. Make a call to StopContinuousRecognitionAsync() at some point to stop recognition

StopContinuousRecognitionAsync

Stops ongoing continuous speech recognition.

Note: Use the same recognizer instance which you are using for ContinuousRecognitionAsync() as an argument to this function.

CreateSynthesizer

Creates a new speech synthesizer instance.

Note: The voice that speaks is determined in order of priority as follows:

Passing false for createAudioConfig, doesn't play the audio by default on the current active output device.
If you only set synthesisLang, the default voice for the specified locale speaks.
If both synthesisVoiceName and synthesisLang are set, the synthesisLang setting is ignored. The voice that you specify by using synthesisVoiceName speaks.
If the voice element is set by using Speech Synthesis Markup Language (SSML), the synthesisVoiceName and synthesisLang settings are ignored.

SpeakAsync

Performs speech synthesis and returns the result (synthesized audio) in form of arrayBuffer.

| Parameter | Type | Default Value | Description | | :-------: | :--: | :-----------: | :----------: | | synthesizer | sdk.SpeechSynthesizer | undefined | undefined | The speech synthesizer instance to use. | | inputString |string | "I'm excited to try text to speech" | The text to be synthesized. | | inputType | string | "text" | The format of the input text. (Optional, default is "text") | | callback | (result: sdk.SynthesisResult, error?: Error) => void | (result, error) => {} | A callback function called with the synthesis result or an error. |

Example

Recognize Once

import { CreateRecognizer, RecognizeOnceAsync  } from "azure-speech-utilities"

const CGV_KEY = "AZURE_SPEECH_SERVICE_KEY"
const CGV_REGION = "AZURE_SPEECH_SERVICE_REGION"

async function recognizeSpeech() {
    const recognizer = CreateRecognizer(CGV_KEY, CGV_REGION, ["hi-IN"])
    try {
            const recognizedText = await RecognizeOnceAsync(recognizer)
            if (recognizedText.type === "text") {
                console.log(recognizedText.message)
            } else {
                console.log(recognizedText.message)
            }
    } catch (error) {
      console.error(error)
    }
}

Continuous Recognition

import { CreateRecognizer, ContinuousRecognitionAsync } from "azure-speech-utilities"

const CGV_KEY = "AZURE_SPEECH_SERVICE_KEY"
const CGV_REGION = "AZURE_SPEECH_SERVICE_REGION"

// As there are 2 or more recognition languages "hi-IN" and "en-US" so it will be multilingual recognition.
const recognizer = CreateRecognizer(CGV_KEY, CGV_REGION, ["hi-IN", "en-US"])

function callbackRecognized(text) {
    console.log("RECOGNIZED: ", text)
}

function callbackRecognizing(text) {
    console.log("RECOGNIZING: ", text)
}

async function recognizeSpeech() {
    try {
            const response = await ContinuousRecognitionAsync(recognizer, callbackRecognized, callbackRecognizing)
            if (response.type === "success") {
                console.log(response.message)
            } else {
                console.error(response.message)
            }
    } catch (error) {
      console.error(error)
    }
}

function stopContinuousRecognition() {
    StopContinuousRecognitionAsync(recognizer)
}

Speak Async

import { CreateSynthesizer, SpeakAsync } from "azure-speech-utilities"

const CGV_KEY = "AZURE_SPEECH_SERVICE_KEY"
const CGV_REGION = "AZURE_SPEECH_SERVICE_REGION"
const SYNTHESIS_LANGUAGE = "en-US"
const SYNTHESIS_VOICE_NAME = "en-US-JennyNeural"

function handleSpeck() {
    // By default, the input type is 'text.' If you change the input type to 'ssml,' then the input string should be in the following SSML format.
    // const ssml = `
    // <speak version="1.0" xmlns="https://www.w3.org/2001/10/synthesis" xml:lang="${SYNTHESIS_LANGUAGE}">
    //     <voice name="${SYNTHESIS_VOICE_NAME}">
    //         When you're on the freeway, it's a good idea to use a GPS.
    //     </voice>
    // </speak>
    // `

    const text = "When you're on the freeway, it's a good idea to use a GPS."

    // Please note that the 'createAudioConfig' is set to false, meaning audio will not play by default on the currently active output device.
    const synthesizer = CreateSynthesizer(CGV_KEY, CGV_REGION, SYNTHESIS_LANGUAGE, SYNTHESIS_VOICE_NAME, false)

    SpeakAsync(synthesizer, text, "text", (result, error) => {
      if (error) {
        console.error(error)
      } else {
        console.log(result)
        const audioBlob = new Blob([result.audioData], { type: "audio/wav" })

        // You can use this URL as an audio source, which allows easy user control such as starting, stopping, resetting, etc.
        console.log(URL.createObjectURL(audioBlob))
      }
    })
  }

  const stopSpeaking = () => {
    audioRef.current.pause()
  }

Note: If you do not wish to play audio through an audio source, you can set createAudioConfig to true. This will cause the audio to play on the current active output device by default. However, using this method will not provide the user with the ability to reset, play, or pause the speaking audio.

Contributing

This project welcomes contributions and suggestions.

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

azure-speech-utilities

Features:

Installing

Function Description

CreateRecognizer

RecognizeOnceAsync

ContinuousRecognitionAsync

StopContinuousRecognitionAsync

CreateSynthesizer

SpeakAsync

Example

Contributing