@livefantasia/speechengine-client

v0.9.3

Published

10 days ago

Node.js client library for LiveFantasia SpeechEngine streaming API

Downloads

770

0High
0Medium
0Low

dragonhunter

speech transcription streaming websocket audio livefantasia speechengine

LiveFantasia SpeechEngine Client for Node.js

A powerful Node.js client library for the LiveFantasia SpeechEngine platform, providing real-time speech recognition capabilities through WebSocket streaming.

Features

🎤 Real-time Speech Recognition: Stream audio data and receive live transcription results
🌐 WebSocket Streaming: Efficient real-time communication with the SpeechEngine API
🔄 Multiple Sessions: Support for concurrent streaming sessions
🎯 TypeScript Support: Full TypeScript definitions included
📊 Session Management: Built-in session lifecycle management and statistics
🛠️ Utility Classes: Helper classes like TranscriptionManager for easy result handling
🎵 Audio Format Support: Support 16KHz, 16Bits, mono Wave format.
📤 HTTP Transcription: Support for file-based transcription with real-time SSE updates
⏳ Async Transcription: Support for long-running transcription jobs via URL
🌍 Multi-language: Support for multiple languages
📝 Comprehensive Examples: Rich set of examples for different use cases

Installation

npm install @livefantasia/speechengine-client

Optional Dependencies

For real-time microphone examples, you may want to install one of these packages depending on your platform:

# For Apple Silicon compatibility (recommended)
npm install mic

Note: These packages are only required if you want to run the real-time microphone examples. They are not needed for the core library functionality.

Platform Compatibility Notes

Apple Silicon (M1/M2/M3) Macs:

✅ Recommended: Use mic module for real-time microphone examples
❌ Avoid: naudiodon can cause segmentation faults and build failures on ARM architecture

Intel/x86 Systems:

✅ Both mic and naudiodon should work
💡 Tip: mic is more universally compatible across platforms

CI/CD Environments:

⚠️ Important: naudiodon requires native compilation and may fail in containerized environments (Ubuntu, Alpine Linux)
✅ Solution: Use mic or exclude audio dependencies from CI builds if not needed

Build Issues with naudiodon: If you encounter build failures related to naudiodon, this is typically due to:

Missing system audio libraries (ALSA, PulseAudio on Linux)
Incompatible architecture (ARM vs x86)
Missing build tools (node-gyp, Python, C++ compiler)

Recommended approach: Use the real-time-microphone-node-mic.ts example with the mic module for better cross-platform compatibility.

Usage

Using SpeechEngineClient

import { SpeechEngineClient } from '@livefantasia/speechengine-client';

const config = {
  apiKey: process.env.SPEECHENGINE_API_KEY!,
  baseUrl: process.env.SPEECHENGINE_BASE_URL!,
};

const client = new SpeechEngineClient(config);

// Create Streaming Session
const session = await client.createStreamingSession({
  language: 'en-US',
  model: 'general_stt_en_latest'
});

// Submit Async Job
const asyncSession = await client.submitAsyncJob('https://example.com/audio.wav', {
  language: 'en-US',
  model: 'general_stt_en_latest'
});

Async Transcription

For long-running audio files (e.g., > 1 minute), use the Async Session.

const asyncSession = await client.submitAsyncJob('https://example.com/audio.wav', {
  language: 'en-US',
  model: 'general_stt_en_latest'
});

console.log('Job submitted:', asyncSession.getSessionId());

const result = await asyncSession.waitForCompletion({ pollingIntervalMs: 5000 });
console.log('Transcription:', result.text);

To receive a callback when the job completes or fails, provide a callback URL and optional headers:

const asyncSession = await client.submitAsyncJob(process.env.ASYNC_AUDIO_URL!, {
  language: 'en-US',
  model: 'general_stt_en_latest',
  callback_url: process.env.ASYNC_CALLBACK_URL!,
  callback_headers: process.env.ASYNC_CALLBACK_HEADERS
    ? JSON.parse(process.env.ASYNC_CALLBACK_HEADERS)
    : undefined
});

Quick Start

1. Set up environment variables

export SPEECHENGINE_API_KEY="your-api-key-here"
export SPEECHENGINE_BASE_URL="https://api.livefantasia.com"

2. Basic streaming example

import { SpeechEngineClient, TranscriptionManager, TranscriptionUpdateMessage } from '@livefantasia/speechengine-client';
import * as fs from 'fs';

async function basicExample() {
  // Initialize the client
  const client = new SpeechEngineClient({
    baseUrl: process.env.SPEECHENGINE_BASE_URL!,
    apiKey: process.env.SPEECHENGINE_API_KEY!,
  });

  let session;
  try {
    // Create a streaming session (model is required)
    session = await client.createStreamingSession({
      language: 'en-US',
      model: 'general_stt_en_latest',
    });

    // Use TranscriptionManager for easy result handling
    const transcriptionManager = new TranscriptionManager();

    // Set up event handlers
    session.on('sessionReady', () => {
      console.log('Session ready, starting audio stream...');
    });

    session.on('transcriptionUpdate', (message: TranscriptionUpdateMessage) => {
      transcriptionManager.handleTranscriptionUpdate(message);
      console.log('Live transcription:', transcriptionManager.getConcatenatedTranscription());
    });

    session.on('sessionEnd', () => {
      console.log('Final transcription:', transcriptionManager.getFinalTranscription());
    });

    // Connect and start the stream
    await session.connect();
    await session.startStream({
      wordTimestamp: true,
    });

    // Stream audio data
    const audioData = fs.readFileSync('path/to/your/audio.wav');
    await session.sendAudio(audioData);
    await session.endStream(3); // 3-second grace period

  } catch (error) {
    console.error('Error:', error);
  } finally {
    await client.closeAllSessions();
  }
}

basicExample();

Conventions

All message payloads emitted to your handlers use camelCase, consistent with Node.js conventions.
- segmentId, text, startMs, endMs, isFinal, utteranceOrder, words[] with word, startMs, endMs.
Stream start options are provided in camelCase via startStream(options) and are converted internally to the server’s snake_case.
Configure defaults at session creation using SessionConfig camelCase fields.

Stream Start Options

Use startStream(options) to enable word timestamps. VAD configuration is set when creating the session.

await session.startStream({
  wordTimestamp: true,
});

These options are validated locally; invalid values throw a ClientErrorCode.INVALID_PARAMETER error before any network call.

API Reference

SpeechEngineClient

The main client class for interacting with the SpeechEngine API.

Constructor

const client = new SpeechEngineClient(config: SpeechEngineClientConfig);

Configuration Options:

apiKey: string - Your SpeechEngine API key
baseUrl: string - Base URL for the API (include scheme, e.g., https://api.livefantasia.com)
defaultLanguage?: string - Default language ('en-US' by default)
connectionTimeoutMs?: number - Connection timeout in milliseconds (default: 10000)
maxConcurrentSessions?: number - Max concurrent sessions (default: unlimited)
debug?: boolean - Shortcut that sets logger level to debug when logger.level is not set
logger?: LoggerConfig - Customize logging behavior (level, enableConsole, enableStructured, customHandler)

Methods

`createStreamingSession(config: SessionConfig): Promise<StreamingSession>`

Creates a new real-time WebSocket streaming session.

Session Configuration (required fields in bold):

language: Language - Language code (e.g., 'en-US', 'es-ES')
model: string - Model ID for recognition (required)
vadMinSilenceDuration?: number - Minimum silence duration (300ms - 1500ms)
productCode?: string - Product code for billing (default: 'STT_STREAMING')

`createHttpSession(config: HttpTranscriptionConfig): Promise<HttpSession>`

Creates a new HTTP session for file transcription (SSE).

`submitAsyncJob(audioUrl: string, options?: SubmitJobOptions, config?: Partial<AsyncClientConfig>): Promise<AsyncSession>`

Submits an audio URL for async transcription via POST /api/v1/sessions/async and returns a session handle.

`requestSessionToken(config: SessionConfig): Promise<SessionInitResponse>`

Requests a Control Plane token without creating a session.

`createStreamingSessionFromToken(sessionInitData: SessionInitResponse, config: SessionConfig): StreamingSession`

Creates a streaming session from an existing token.

`createHttpSessionFromToken(sessionInitData: SessionInitResponse, config: HttpTranscriptionConfig): HttpSession`

Creates an HTTP session from an existing token.

`getAsyncSession(sessionId: string, config?: Partial<AsyncClientConfig>): AsyncSession`

Retrieves an existing async session by ID for status tracking.

`closeSession(sessionId: string): void`

Closes a single session.

`closeAllSessions(): Promise<void>`

Closes all active sessions created by this client.

HttpSession

Represents an active HTTP transcription session.

Methods

`transcribe(audio: string | Uint8Array | Buffer | Readable, mediaType?: string): AsyncGenerator<HttpTranscriptionEvent, void, unknown>`

Transcribe an audio file or data using the HTTP SSE endpoint.

If audio is a string, it's treated as a file path (Node.js only).
If audio is binary data, mediaType is required.

`getSessionId(): string`

Returns the session ID.

`getConfig(): HttpSessionConfig`

Returns the session configuration used for this HTTP session.

AsyncSession

Represents an asynchronous transcription job.

Methods

`waitForCompletion(options?: PollingOptions): Promise<AsyncTranscriptionResult>`

Polls for job completion and returns the final result.

`getStatus(): Promise<AsyncJobStatusResponse>`

Retrieves the current status of the async job.

`getSessionId(): string`

Returns the session ID.

StreamingSession

Represents an active streaming session.

Events

sessionReady - Session is ready to receive audio
transcriptionUpdate - New transcription data received
sessionEnd - Session ended
error - Error occurred

Methods

`connect(): Promise<void>`

Connects to the session (establishes WebSocket connection).

`startStream(options?: { wordTimestamp?: boolean }): Promise<void>`

Starts the stream; emits sessionReady when streaming can begin.

`sendAudio(audioData: Buffer): Promise<void>`

Sends audio data while in streaming state.

`endStream(graceWaitingTime?: number): Promise<void>`

Ends the stream and finalizes transcription, waiting up to graceWaitingTime seconds.

`disconnect(): Promise<void>`

Disconnects the session and closes the WebSocket.

TranscriptionManager

Utility class for managing transcription results.

Methods

`handleTranscriptionUpdate(message: TranscriptionUpdateMessage): void`

Processes a transcription update message with interim replacement and deduplication.

`getConcatenatedTranscription(): string`

Gets the assembled transcription (interim + final as appropriate).

`getFinalTranscription(): string`

Gets the final transcription result.

`getSegments(): TranscriptionSegment[]`

Gets all transcription segments.

Examples

The library comes with comprehensive examples in the examples/ directory:

Basic Examples

simple-streaming.ts - Recommended streaming workflow using TranscriptionManager
minimal-streaming.ts - Direct event handling without utilities

Advanced Examples

multiple-sessions.ts - Managing multiple concurrent sessions
error-handling.ts - Comprehensive error handling patterns

Streaming Examples

real-time-microphone-node-mic.ts - Real-time microphone streaming with mic module (Apple Silicon compatible)
file-streaming.ts - Streaming audio from files

Running Examples

# Basic streaming example
npx ts-node examples/basic/simple-streaming.ts

# Streaming with VAD options
npx ts-node examples/basic/simple-streaming-vad.ts

# Real-time microphone (Apple Silicon compatible)
npm install mic
npx ts-node examples/streaming/real-time-microphone-node-mic.ts

# Multiple sessions
npx ts-node examples/advanced/multiple-sessions.ts

Apple Silicon Compatibility

The real-time microphone example uses the mic module which provides excellent compatibility with Apple Silicon Macs (M1/M2/M3). This avoids the known issues with naudiodon/PortAudio that can cause segmentation faults on ARM-based Macs.

npm install mic
npx ts-node examples/streaming/real-time-microphone-node-mic.ts

Why `mic` over `naudiodon`?

Cross-Platform Stability:

mic works reliably across macOS (Intel & Apple Silicon), Linux, and Windows
naudiodon has known compatibility issues with ARM architecture and CI/CD environments

Build Reliability:

mic has fewer native dependencies and simpler build requirements
naudiodon requires PortAudio and can fail in containerized environments (Docker, CI/CD)

Development Experience:

mic provides a simpler API for basic microphone access
Less prone to segmentation faults and memory issues on Apple Silicon

If you encounter build issues with audio dependencies in your CI/CD pipeline, consider excluding them from your production dependencies or using the file-based streaming examples instead.

Supported Languages

English (en)
Spanish (es)
French (fr)
German (de)
Italian (it)
Portuguese (pt)
And more...

Error Handling

The library provides comprehensive error handling with specific error types:

import { SpeechEngineError } from '@livefantasia/speechengine-client';

try {
  await session.connect();
} catch (error) {
  if (error instanceof SpeechEngineError) {
    console.error('SpeechEngine Error:', error.code, error.message);
    console.error('Category:', error.category);
    console.error('Retryable:', error.retryable);
  }
}

Logging

The client’s logs can be routed into your application’s logger. By default, logs print to the console at info level.

Winston integration example

import { SpeechEngineClient } from '@livefantasia/speechengine-client';
import winston from 'winston';

const appLogger = winston.createLogger({
  level: 'info',
  transports: [new winston.transports.Console()],
});

const client = new SpeechEngineClient({
  baseUrl: 'https://api.livefantasia.com',
  apiKey: process.env.SPEECHENGINE_API_KEY!,
  logger: {
    level: 'info',
    enableConsole: false,
    customHandler: (entry) => {
      const level = entry.level.toLowerCase();
      const prefix = `${entry.component}${entry.sessionId ? ':' + entry.sessionId : ''}`;
      const message = `${prefix} - ${entry.message}`;
      const meta = entry.data ? { data: entry.data, ts: entry.timestamp.toISOString() } : { ts: entry.timestamp.toISOString() };
      appLogger.log({ level, message, ...meta });
    },
  },
});

Notes:

Set enableConsole: false to prevent duplicate console output.
customHandler receives structured entries; you control formatting and routing.
Sensitive auth data (JWTs and Bearer tokens) is redacted before logging.

Development

Building

npm run build

Testing

npm test
npm run test:coverage

Linting

npm run lint
npm run lint:fix

Type Checking

npm run type-check

Requirements

Node.js >= 20.0.0
TypeScript >= 5.1.0 (for development)

License

MIT License - see the LICENSE file for details.

Support

Documentation: API Documentation
Issues: GitHub Issues
Examples: See the examples/ directory for comprehensive usage examples

Contributing

We welcome contributions! Please see our contributing guidelines for more information.

Made with ❤️ by LiveFantasia

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

LiveFantasia SpeechEngine Client for Node.js

Features

Installation

Optional Dependencies

Platform Compatibility Notes

Usage

Using SpeechEngineClient

Async Transcription

Quick Start

1. Set up environment variables

2. Basic streaming example

Conventions

Stream Start Options

API Reference

SpeechEngineClient

Constructor

Methods

createStreamingSession(config: SessionConfig): Promise<StreamingSession>

createHttpSession(config: HttpTranscriptionConfig): Promise<HttpSession>

submitAsyncJob(audioUrl: string, options?: SubmitJobOptions, config?: Partial<AsyncClientConfig>): Promise<AsyncSession>

requestSessionToken(config: SessionConfig): Promise<SessionInitResponse>

createStreamingSessionFromToken(sessionInitData: SessionInitResponse, config: SessionConfig): StreamingSession

createHttpSessionFromToken(sessionInitData: SessionInitResponse, config: HttpTranscriptionConfig): HttpSession

getAsyncSession(sessionId: string, config?: Partial<AsyncClientConfig>): AsyncSession

closeSession(sessionId: string): void

closeAllSessions(): Promise<void>

HttpSession

Methods

transcribe(audio: string | Uint8Array | Buffer | Readable, mediaType?: string): AsyncGenerator<HttpTranscriptionEvent, void, unknown>

getSessionId(): string

getConfig(): HttpSessionConfig

AsyncSession

Methods

waitForCompletion(options?: PollingOptions): Promise<AsyncTranscriptionResult>

getStatus(): Promise<AsyncJobStatusResponse>

getSessionId(): string

StreamingSession

Events

Methods

connect(): Promise<void>

startStream(options?: { wordTimestamp?: boolean }): Promise<void>

sendAudio(audioData: Buffer): Promise<void>

endStream(graceWaitingTime?: number): Promise<void>

disconnect(): Promise<void>

TranscriptionManager

Methods

handleTranscriptionUpdate(message: TranscriptionUpdateMessage): void

getConcatenatedTranscription(): string

getFinalTranscription(): string

getSegments(): TranscriptionSegment[]

Examples

Basic Examples

Advanced Examples

Streaming Examples

Running Examples

Apple Silicon Compatibility

Why mic over naudiodon?

Supported Languages

Error Handling

Logging

Winston integration example

Development

Building

Testing

Linting

Type Checking

Requirements

License

Support

Contributing

`createStreamingSession(config: SessionConfig): Promise<StreamingSession>`

`createHttpSession(config: HttpTranscriptionConfig): Promise<HttpSession>`

`submitAsyncJob(audioUrl: string, options?: SubmitJobOptions, config?: Partial<AsyncClientConfig>): Promise<AsyncSession>`

`requestSessionToken(config: SessionConfig): Promise<SessionInitResponse>`

`createStreamingSessionFromToken(sessionInitData: SessionInitResponse, config: SessionConfig): StreamingSession`

`createHttpSessionFromToken(sessionInitData: SessionInitResponse, config: HttpTranscriptionConfig): HttpSession`

`getAsyncSession(sessionId: string, config?: Partial<AsyncClientConfig>): AsyncSession`

`closeSession(sessionId: string): void`

`closeAllSessions(): Promise<void>`

`transcribe(audio: string | Uint8Array | Buffer | Readable, mediaType?: string): AsyncGenerator<HttpTranscriptionEvent, void, unknown>`

`getSessionId(): string`

`getConfig(): HttpSessionConfig`

`waitForCompletion(options?: PollingOptions): Promise<AsyncTranscriptionResult>`

`getStatus(): Promise<AsyncJobStatusResponse>`

`getSessionId(): string`

`connect(): Promise<void>`

`startStream(options?: { wordTimestamp?: boolean }): Promise<void>`

`sendAudio(audioData: Buffer): Promise<void>`

`endStream(graceWaitingTime?: number): Promise<void>`

`disconnect(): Promise<void>`

`handleTranscriptionUpdate(message: TranscriptionUpdateMessage): void`

`getConcatenatedTranscription(): string`

`getFinalTranscription(): string`

`getSegments(): TranscriptionSegment[]`

Why `mic` over `naudiodon`?