@vonage/extend-voice-transcription

v0.1.1

Published

9 days ago

A library to help wire up incoming speech to be converted to text through various services

Downloads

150

Vonage Voice Transcription for NodeJS

This is a small wrapper around various Voice Transcription services to make it easier to provide voice transcription from our Voice API. To use this, you'll need a Vonage account. Sign up for free at nexmo.com.

This bundle is currently in development/beta status, so there may be bugs

Installation

Open a command console, enter your project directory and execute the following command to download the latest stable version of this module:

$ npm install @vonage/extend-voice-transcription

Usage

Configuring Transcription Services

This module relies on external services to provide the actual transcription services. We currently support:

Google Cloud Speech
Azure Cognitive Services

Each service takes a configuration object that is passed to the underlying service. To enable a service, just pass in the appropriate configuration object.

Google Cloud Speech

const { SpeechToText } = require("@vonage/extend-voice-transcription");

const STTConnector = new SpeechToText({
  audioRate: "audio/l16;rate=16000",
  handler: (data) => {
    console.log(`Vonage Transcription: ${data}`);
  },
  gCloudSpeech: {
      keyFilename: './keys.json',
      projectId: 'project-name'
  },
});

Azure Cognitive Services

const { SpeechToText } = require("@vonage/extend-voice-transcription");

const STTConnector = new SpeechToText({
  audioRate: "audio/l16;rate=16000",
  handler: (data) => {
    console.log(`Vonage Transcription: ${data}`);
  },
  azureCognitiveSpeech: {
    key: "azure-key",
    region: "region",
  },
});

Integration with Voice API Web Sockets

This module is designed to work directly with incoming audio frames from the Vonage Voice API web sockets. Audio can be streamed through the web socket and directly passed to the transcription service. A handler is defined that will work with the returned data.

When configuring the SpeechToText object, you will need to pass in the audioRate that is being used by the web socket, a handler which will accept a single string parameter (the transcribed text), and the configuration data for the service you are using.

Sample Usage with Express

This sample application sets up a small Express web socket application. The socket listens on the /echo route, and will pass the audio directly to the Azure Cognitive Speech API. Once the text has been transcribed and returned, it is passed to the handler function we defined that will output the text to the application's console log.

const express = require("express");
const app = express();
const expressWs = require("express-ws")(app);
const port = 3000;
const { SpeechToText } = require("@vonage/extend-voice-transcription");

const STTConnector = new SpeechToText({
  audioRate: "audio/l16;rate=16000",
  handler: (data) => {
    console.log(`Vonage Transcription: ${data}`);
  },
  azureCognitiveSpeech: {
    key: "azure-key",
    region: "region",
  },
});

app.get("/", (req, res) => {
  res.setHeader("Content-Type", "application/json");
  res.send(
    JSON.stringify(
      STTConnector.createNCCO(`${req.protocol}://${req.hostname}/echo`)
    )
  );
});

app.get("/events", (req, res) => {
  console.log(req);
});

app.ws("/echo", async (ws, req) => {
  ws.on("message", async (msg) => {
    if (typeof msg === "string") {
      console.log(msg);
    } else {
      STTConnector.stream(msg);
    }
  });

  ws.on("close", () => {
    STTConnector.destroy();
  });
});

app.listen(port, () => {
  console.log(`Listening on port ${port}`);
});

Contributing

This library is actively developed, and we love to hear from you! Please feel free to create an issue or open a pull request with your questions, comments, suggestions and feedback.

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme