@apptek/streaming-js-sdk

v1.0.1

Published

22 days ago

AppTek Streaming JavaScript SDK

Downloads

225

0High
0Medium
0Low

mk_bob

apptek asr speech-to-text transcription translation streaming websocket audio voice

AppTek Streaming JS SDK

A TypeScript SDK for integrating bidirectional streaming ASR (Automatic Speech Recognition) and Translation with AppTek's services.

Features

JavaScript SDK Documentation: Read the Type Definition Docs
Full GRPC Documentation: Read the GRPC Docs
High-Level SDK: Easy-to-use AppTekSDK class for managing sessions.
Microphone Streaming: Built-in support for capturing and streaming microphone audio.
Generic MediaStream Support: Process audio from <video> or <audio> elements or any WebRTC stream.
Get available languages: Access available languages for live transcription and translation.

Installation

npm install @apptek/streaming-js-sdk

Getting Started

1. Initialize the SDK

You can initialize the SDK with your proxy URL and license key.

import { AppTekSDK } from "@apptek/streaming-js-sdk";

// The proxyUrl defaults to "https://accessibility.apptek.com/grpc-proxy" if omitted.
const sdk = new AppTekSDK(
    "https://accessibility.apptek.com/grpc-proxy", 
    "YOUR_APPTEK_LICENSE_KEY"
);

// specific license validation call
await sdk.init();

2. Get available languages for transcription and translation

Example config:

const config = {
    audioConfiguration: {
        sampleType: "INT16",
        sampleRateHz: 16000,
        channels: 1
    },
    speechConfiguration: {
        langCode: "en-US"
    }
};

3. Stream from Microphone

To start recording from the user's microphone:

const config = {
    audioConfiguration: {
        sampleType: "INT16",
        sampleRateHz: 16000,
        channels: 1
    },
    speechConfiguration: {
        langCode: "en-US"
    }
};

await sdk.startMicrophone(config, {
    onData: (data) => {
        // Handle partial or final transcriptions
        if (data.transcription?.stableTranscriptions) {
            console.log("Partial:", data.transcription.stableTranscriptions);
        }
        if (data.segment) {
            console.log("Final:", data.segment.text);
        }
    },
    onError: (err) => {
        console.error("Streaming Error:", err);
    },
    onStatusChange: (status) => {
        console.log("SDK Status:", status); // Idle, Connecting, Recording, etc.
    }
});

4. Stream from Video/Audio Element

You can process audio from an existing MediaStream (e.g., from a <video> element):

const videoElement = document.querySelector('video');
const stream = videoElement.captureStream(); // or valid MediaStream

await sdk.startFromStream(stream, config, {
    onData: (data) => console.log(data),
    onError: (err) => console.error(err)
});

5. Waveform Visualization

You can receive raw audio data (Float32Array) to draw a real-time waveform.

await sdk.startMicrophone(config, {
    onData: handleTranscription,
    onError: handleError,
    onMicrophoneData: (audioData) => {
        // audioData is a Float32Array (-1.0 to 1.0)
        // can be used to visualize audio data
    }
});

API Reference

`AppTekSDK`

Constructor

new AppTekSDK(proxyUrl?: string, licenseKey?: string)

proxyUrl: Base URL of the proxy server (Default: https://accessibility.apptek.com/grpc-proxy).
licenseKey: AppTek license key.

Methods

init(licenseKey?: string): Promise<boolean> Validates the license key with the server.
startMicrophone(config, callbacks, stream?): Promise<void> Starts an audio session.
- config: Recognize2StreamConfig object.
- callbacks: Event handlers (onData, onError, onStatusChange, onMicrophoneData).
- stream: Optional MediaStream to use (overrides microphone).
startFromStream(stream, config, callbacks): Promise<void> Alias for startMicrophone specifically for external streams.
stopMicrophone(): void Stops the current streaming session.
getLanguages(): Promise<Recognize2AvailableResponse> Fetch available recognition languages.
getTranslateLanguages(): Promise<TranslateAvailableResponse> Fetch available translation languages.

Configuration Object

interface Recognize2StreamConfig {
    audioConfiguration: {
        sampleType: "INT16";
        sampleRateHz: number; // e.g., 16000
        channels: 1;
    };
    speechConfiguration: {
        langCode: string; // e.g., "en-US"
        model?: string;
    };
    translateConfigurations?: Array<{
        domain: string;
        targetLangCode: string; // e.g., "es"
    }>;
    diarizerConfiguration?: {
        enable: boolean;
    };
}

Troubleshooting & Advanced Configuration

AudioWorklet & CORS / CSP Issues

The SDK uses an AudioWorklet to process microphone audio efficiently. By default, it attempts to load this worklet via an inline Blob URL to make getting started easier. However, this can cause issues in two scenarios:

Content Security Policy (CSP): Your site blocks worker-src blob:.
CORS: In some strict browser environments, loading worklets from cross-origin Blobs can fail.

Symptoms:

Error: DOMException: The user aborted a request.
Error: SecurityError: The operation is insecure.
Network error loading blob:...

Workaround (Recommended for Production):

Host the processor file yourself to avoid Blob/CSP issues.

There is a file named pcm-processor.js (or similar audio processor code). You can extract the code from Microphone.ts explicitly or serve a static file.
Host this file on your server (e.g., /assets/pcm-processor.js).
Pass the workletUrl to startMicrophone.

const micConfig = {
    audioConfiguration: { ... },
    speechConfiguration: { ... }
};

// Use your own hosted processor file
await sdk.startMicrophone(micConfig, callbacks, undefined, {
    workletUrl: "/assets/pcm-processor.js" 
});

Note: You may need to access the internal MicrophoneRecorder directly if the high-level SDK doesn't expose workletUrl in the top-level config yet, or ensure your config object is passed correctly down to the microphone.