@fciannella/nvidia-easy-ar-tts-client

v1.0.15

Published

a year ago

Streaming TTS client for NVIDIA's API, shipped as both ESM and CJS

0High
0Medium
0Low

fciannella

tts streaming nvidia text-to-speech audio worklet

NVIDIA Easy AR TTS – TypeScript/JavaScript Streaming Client

@fciannella/nvidia-easy-ar-tts-client is a tiny, zero‑dependency helper library that makes it trivial to talk to NVIDIA's Easy AR Text‑To‑Speech HTTP endpoint and play the audio back while it is still streaming.

The package is delivered as both ESM and CommonJS, ships its own TypeScript declarations (no @types needed) and can be used from the browser or Node.js.

Installation

# with npm
npm install @fciannella/nvidia-easy-ar-tts-client

# or with yarn
yarn add @fciannella/nvidia-easy-ar-tts-client

Quick start (browser)

import { TTSClient } from "@fciannella/nvidia-easy-ar-tts-client";

// Create the client once and reuse it for all requests
const tts = new TTSClient({
  apiUrl: "https://riva.nvidia.com/tts",  // <-- your Easy AR TTS endpoint
  apiKey: "YOUR_SECRET_TOKEN"             // optional – only if your endpoint is protected
});

await tts.play({
  text: "Hello there – I'm streaming while I speak!", // required
  voice: "English-US.Female-1",                       // required
  emotion: "neutral",                                // optional
  description: "friendly voice"                      // optional (display name shown in UI)
});

Under the hood the library will:

POST the synthesis request and keep the connection open.
Parse server‑sent‑events (SSE) coming back from the service.
Convert each audio_chunk to Float32Array samples.
Feed the samples to an AudioWorklet so you can listen in real time.

❗ AudioWorklet requirement – streaming playback relies on the Web‑Audio API, therefore the quick‑start above needs to run in a modern browser (Chrome, Edge, Firefox, Safari, …).

Usage from Node.js

You can still stream and save the audio in Node.js – you just won't hear it in real time.

import { TTSClient } from "@fciannella/nvidia-easy-ar-tts-client";
import { writeFileSync } from "node:fs";

const tts = new TTSClient({ apiUrl: "https://riva.nvidia.com/tts" });

const chunks = [];
for await (const chunk of tts.synthesize({
  text: "This file has been assembled in Node.js!",
  voice: "English-US.Male-1",
})) {
  if (chunk.samples.length) chunks.push(chunk);
}

const wavBlob = tts.assembleWav(chunks);
writeFileSync("out.wav", Buffer.from(await wavBlob.arrayBuffer()));
console.log("Saved → out.wav");

API reference

`new TTSClient(options)`

| option | type | required | description | |----------------|--------|----------|-------------| | apiUrl | string | yes | Base URL of the Easy AR TTS endpoint (e.g. https://riva.nvidia.com/tts). Do not include a trailing slash – the library will strip it anyway. | | apiKey | string | no | Bearer token that will be sent as Authorization: Bearer <token> if provided. |

`tts.play(synthOptions, onChunk?) → Promise<void>`

Streams audio, plays it immediately using an AudioWorklet and resolves when synthesis is complete.

| synthOptions field | type | required | description | |--------------------|--------------|----------|-------------| | text | string | yes | Text to be spoken. | | voice | string | yes | Actor/voice identifier as expected by your service. | | emotion | string | no | Optional emotion code accepted by the API. | | description | string | no | Free‑form description (showed in dashboards, logs, …). |

onChunk (optional) is a callback that will be invoked for every AudioChunk that is played. An AudioChunk looks like this:

interface AudioChunk {
  samples: Float32Array; // PCM samples in the range –1…+1
  isFirstChunk: boolean;
  isLastChunk: boolean;
}

`tts.synthesize(synthOptions) → AsyncIterable<AudioChunk>`

Low‑level method that yields chunks as soon as they arrive from the network. Useful when you need manual control (e.g. saving to disk, visualising a waveform, custom DSP, …).

`tts.assembleWav(chunks) → Blob`

Utility that concatenates the received Float32Arrays and returns a WAV file in a Blob. In the browser you can generate an object URL with URL.createObjectURL(blob); in Node.js convert it to Buffer as shown above.

Building from source

git clone https://github.com/fciannella/nvidia-easy-ar-tts-client.git
cd nvidia-easy-ar-tts-client
npm install
npm run build

The build step uses tsup to generate:

dist/
  ├── index.js         # ESM (imports)
  ├── index.cjs.js     # CommonJS (requires)
  ├── index.d.ts       # Types
  └── …

License

Acknowledgements

This library is an independent, open‑source project and is not affiliated with NVIDIA in any way. All trademarks belong to their respective owners.

Streaming Chat (text + audio)

The repository also ships AudioChatClient — a thin wrapper around the OpenAI SDK that hits an NVIDIA Cloud Function endpoint which returns both text and audio chunks in real-time.

Prerequisites

npm install speaker          # required only in Node.js to play audio
# set three env vars used by the client / CLI script
export NVCF_KEY=<your_ngc_token>
export OPENAI_PROXY_KEY=<inner_openai_key_expected_by_the_service>
export NVCF_CHAT_BASE_URL=<full_invocation_url>   # optional (defaults to sample URL)

1. Programmatic usage

import { AudioChatClient } from "@fciannella/nvidia-easy-ar-tts-client";
import Speaker from "speaker";

const speaker = new Speaker({
  channels: 1,
  sampleRate: 44_100,
  bitDepth: 32,
  signed: true,
  float: true,
});

const chat = new AudioChatClient({
  systemPrompt: "You are a helpful assistant.",
  actorName:    "Emma World Class",
  emotion:      "Narrative",
  ngcKey:       process.env.NVCF_KEY!,
  proxyKey:     process.env.OPENAI_PROXY_KEY!,
  baseURL:      process.env.NVCF_CHAT_BASE_URL, # optional override
});

await chat.chat("Hi there!", {
  onText:  chunk => process.stdout.write(chunk),
  onAudio: pcm   => speaker.write(Buffer.from(pcm.buffer)),
});

2. Built-in CLI helper

npm run chat                   # REPL — press Enter on an empty line to quit
npm run chat -- "Hello!"       # one-off single turn

The script streams the assistant reply to stdout while simultaneously playing audio through your default output device.

Additional CLI flags

| flag | description | |------|-------------| | --ulaw | Convert the 44.1 kHz PCM stream returned by NVIDIA into 8 kHz G.711 μ-law. Handy if you need to forward the audio to Twilio. | | --ulawFile <path> | In combination with --ulaw dumps the raw μ-law bytes to the given file for further inspection / integration tests. |

Example:

# Interactive chat with μ-law output + dump to tmp.raw
yarn chat -- --ulaw --ulawFile tmp.raw

3. Streaming straight into Twilio (μ-law example)

import { AudioChatClient } from "@fciannella/nvidia-easy-ar-tts-client";
import ws from "ws";                                   // npm install ws

// Outbound Media Stream coming from <Stream> TwiML verb
const twilioSocket = new ws("wss://<your-twilio-stream-url>");

const chat = new AudioChatClient({
  systemPrompt: "You are Chris, the upbeat assistant…",
  actorName:    "Emma World Class",
  emotion:      "Narrative",
  ngcKey:       process.env.NVCF_KEY!,
  proxyKey:     process.env.OPENAI_PROXY_KEY!,
  // ↓ ask the client to down-sample + μ-law encode on-the-fly
  outputEncoding: "ulaw",
});

await chat.chat("Hi Twilio!", {
  onAudio: ulaw => twilioSocket.send(ulaw)   // Uint8Array (PCMU 8 kHz)
});

The helper will take care of:

Converting NVIDIA's 44.1 kHz PCM16 → Float-32.
Down-sampling to 8 kHz using linear interpolation.
Encoding to G.711 μ-law (Uint8Array).

The resulting bytes can be sent straight to a Twilio Programmable Voice / Media Stream without any extra transcoding.

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme