@fciannella/nvidia-asr-client

v0.1.9

Published

9 months ago

Minimal cross-platform wrapper around NVIDIA/Riva streaming ASR WebSocket API with optional client-side silence detection.

Downloads

0High
0Medium
0Low

fciannella

nvidia riva asr speech-recognition websocket typescript browser

NVIDIA ASR Client

Minimal cross-platform wrapper around NVIDIA/Riva streaming ASR WebSocket API with optional client-side silence detection.

Features

Works in Node.js and browsers without any additional dependencies
Built-in audio resampling
Support for different input formats (f32, PCM_s16, G.711 μ-law)
Client-side silence detection to determine when utterances are complete
Minimal footprint with no external dependencies in browser

Installation

npm install @fciannella/nvidia-asr-client

Usage (Browser)

Modern ES Modules Approach

<!DOCTYPE html>
<html>
<head>
  <meta charset="UTF-8">
  <!-- Prevent browser caching during development -->
  <meta http-equiv="Cache-Control" content="no-cache, no-store, must-revalidate">
  <meta http-equiv="Pragma" content="no-cache">
  <meta http-equiv="Expires" content="0">
</head>
<body>
  <div id="transcript"></div>
  <button id="startBtn">Start</button>
  <button id="stopBtn" disabled>Stop</button>

  <script type="module">
    // Dynamic import with cache-busting during development
    const moduleUrl = './node_modules/@fciannella/nvidia-asr-client/dist/index.js?' + Date.now();
    const { NvidiaAsrClient } = await import(moduleUrl);
    
    let asr = null;
    let stopFn = null;
    
    async function startASR() {
      // Setup ASR client
      asr = new NvidiaAsrClient({
        websocketUrl: 'wss://your-riva-endpoint/v1/speech_recognition/streaming_multi',
        languageCode: 'en-US', // Change to 'it-IT', 'es-ES', etc. if supported by server
        silenceTimeout: 1.5,
        closeOnSilence: false,
      });
      
      asr.on('partial', (e) => {
        document.getElementById('transcript').textContent = e.text;
      });
      
      asr.on('final', (e) => {
        document.getElementById('transcript').textContent = e.text;
      });
      
      // Connect and setup WebAudio
      await asr.connect();
      const audioContext = new (window.AudioContext || window.webkitAudioContext)();
      const stream = await navigator.mediaDevices.getUserMedia({ audio: true });
      const source = audioContext.createMediaStreamSource(stream);
      const processor = audioContext.createScriptProcessor(4096, 1, 1);
      
      processor.onaudioprocess = (e) => {
        const float32Data = new Float32Array(e.inputBuffer.getChannelData(0));
        asr.write(float32Data, audioContext.sampleRate);
      };
      
      source.connect(processor);
      processor.connect(audioContext.destination);
      
      // Return cleanup function
      return () => {
        processor.disconnect();
        source.disconnect();
        stream.getTracks().forEach(track => track.stop());
        asr.finish();
        setTimeout(() => asr.end(), 1500);
      };
    }
    
    document.getElementById('startBtn').addEventListener('click', async () => {
      document.getElementById('startBtn').disabled = true;
      document.getElementById('stopBtn').disabled = false;
      stopFn = await startASR();
    });
    
    document.getElementById('stopBtn').addEventListener('click', () => {
      if (stopFn) {
        stopFn();
        stopFn = null;
        document.getElementById('startBtn').disabled = false;
        document.getElementById('stopBtn').disabled = true;
      }
    });
  </script>
</body>
</html>

A complete example is available in examples/browser-example.html.

Notes on Browser Usage

WebSocket Endpoint: Ensure your Riva server allows cross-origin requests from your web application.
Caching: During development, use cache-busting techniques as shown in the example.
Language Selection: The server must support the language code you specify. Not all deployments support all languages.
Audio Context: Modern browsers require a user gesture (like a button click) before allowing audio capture.

Usage (Node.js)

For Node.js usage, you'll need to install the optional dependencies:

npm install ws mic

import { NvidiaAsrClient } from '@fciannella/nvidia-asr-client';
import mic from 'mic';

const SAMPLE_RATE = 16000;

const asr = new NvidiaAsrClient({
  websocketUrl: 'wss://your-riva-endpoint/v1/speech_recognition/streaming_multi',
  languageCode: 'en-US',
  silenceTimeout: 1.5,
  closeOnSilence: false,
});

asr.on('partial', (e) => {
  process.stdout.write(`\r[${e.serverFinal ? 'FINAL' : 'PARTIAL'}] ${e.text}        `);
});

asr.on('final', (e) => {
  console.log(`\n[USER_FINAL] ${e.text}`);
});

asr.on('silence', () => {
  console.log('\n--- silence detected ---');
});

asr.on('error', (err) => console.error('ASR error', err));

(async () => {
  await asr.connect();

  const micInstance = mic({
    rate: String(SAMPLE_RATE),
    channels: '1',
    encoding: 'signed-integer',
    bitwidth: 16,
    endian: 'little',
    fileType: 'raw',
  });

  const stream = micInstance.getAudioStream();
  stream.on('data', (buf) => {
    // convert Int16 PCM -> Float32 [-1,1]
    const int16 = new Int16Array(buf.buffer, buf.byteOffset, buf.byteLength / 2);
    const float32 = new Float32Array(int16.length);
    for (let i = 0; i < int16.length; i++) float32[i] = int16[i] / 0x8000;
    asr.write(float32, SAMPLE_RATE);
  });

  micInstance.start();

  process.on('SIGINT', () => {
    micInstance.stop();
    asr.finish();
    setTimeout(() => process.exit(0), 1500);
  });
})();

API

Constructor

new NvidiaAsrClient(options: NvidiaAsrOptions)

Options

interface NvidiaAsrOptions {
  websocketUrl?: string;            // Required: Your Riva endpoint URL
  languageCode?: string;            // Default: 'en-US'
  silenceTimeout?: number;          // Seconds of inactivity before finalizing
  closeOnSilence?: boolean;         // Default: true
  inputFormat?: 'f32' | 'pcm_s16' | 'g711_ulaw'; // Default: 'f32'
  inputSampleRate?: number;         // Default: 16000
  targetSampleRate?: number;        // Default: 16000
}

Methods

connect(): Promise - Opens WebSocket and sends configuration packet
write(chunk, sampleRate?): void - Send audio data to the ASR service
finish(): void - Signal end-of-audio but keep the socket open
end(): void - Flushes EOS marker and closes the WebSocket immediately

Events

partial: { text: string, serverFinal: boolean }
final: { text: string }
silence: Emitted when silence is detected
error: Error event

Troubleshooting

Language Support

If specifying a non-English language code (e.g., 'it-IT', 'es-ES') doesn't result in transcription in that language, the issue is likely on the server side:

The server may not have that language model loaded
The server may be configured to ignore client language settings
The specific language may not be supported by your Riva deployment

Contact your Riva server administrator to confirm which languages are available.

Browser Caching

When developing or updating the client, use cache-busting techniques:

Add timestamp query parameters to imports: import(...)?v=${Date.now()}
Use cache control meta tags in your HTML
Run your development server with cache disabled (e.g., http-server -c-1)
Use browser developer tools to clear cache and perform hard reloads

License

MIT