@runanywhere/onnx

v0.18.1

Published

a month ago

ONNX Runtime backend for RunAnywhere React Native SDK - Speech-to-Text, Text-to-Speech, and VAD

0High
0Medium
0Low

run_anywhere

react-native runanywhere onnx stt tts vad speech-to-text text-to-speech voice-activity-detection sherpa-onnx nitro expo

@runanywhere/onnx

ONNX Runtime backend for the RunAnywhere React Native SDK. Provides on-device Speech-to-Text (STT), Text-to-Speech (TTS), and Voice Activity Detection (VAD) powered by Sherpa-ONNX.

Overview

@runanywhere/onnx provides the ONNX Runtime backend for on-device voice AI capabilities:

Speech-to-Text (STT)

Whisper Models — Multi-language speech recognition
Batch Transcription — Transcribe audio files
Real-time Streaming — Live transcription support
Word Timestamps — Precise timing information
Confidence Scores — Per-segment reliability metrics

Text-to-Speech (TTS)

Piper TTS — Natural neural voice synthesis
Multiple Voices — Various languages and accents
Customizable — Speed, pitch, and volume control
Streaming Output — Chunked audio generation

Voice Activity Detection (VAD)

Silero VAD — High-accuracy speech detection
Real-time Processing — Low-latency audio analysis
Configurable Sensitivity — Adjustable thresholds

Requirements

@runanywhere/core (peer dependency)
React Native 0.74+
iOS 15.1+ / Android API 24+
Microphone permission (for live recording)

Installation

npm install @runanywhere/core @runanywhere/onnx
# or
yarn add @runanywhere/core @runanywhere/onnx

iOS Setup

cd ios && pod install && cd ..

Add microphone permission to Info.plist:

<key>NSMicrophoneUsageDescription</key>
<string>Required for speech recognition</string>

Android Setup

Add microphone permission to AndroidManifest.xml:

<uses-permission android:name="android.permission.RECORD_AUDIO" />

Quick Start

import { RunAnywhere, SDKEnvironment, ModelCategory } from '@runanywhere/core';
import { ONNX, ModelArtifactType } from '@runanywhere/onnx';

// 1. Initialize SDK
await RunAnywhere.initialize({
  environment: SDKEnvironment.Development,
});

// 2. Register ONNX backend
ONNX.register();

// 3. Add STT model (Whisper)
await ONNX.addModel({
  id: 'sherpa-onnx-whisper-tiny.en',
  name: 'Sherpa Whisper Tiny (English)',
  url: 'https://github.com/RunanywhereAI/sherpa-onnx/releases/download/runanywhere-models-v1/sherpa-onnx-whisper-tiny.en.tar.gz',
  modality: ModelCategory.SpeechRecognition,
  artifactType: ModelArtifactType.TarGzArchive,
  memoryRequirement: 75_000_000,
});

// 4. Add TTS model (Piper)
await ONNX.addModel({
  id: 'vits-piper-en_US-lessac-medium',
  name: 'Piper TTS (US English)',
  url: 'https://github.com/RunanywhereAI/sherpa-onnx/releases/download/runanywhere-models-v1/vits-piper-en_US-lessac-medium.tar.gz',
  modality: ModelCategory.SpeechSynthesis,
  artifactType: ModelArtifactType.TarGzArchive,
  memoryRequirement: 65_000_000,
});

// 5. Download models
await RunAnywhere.downloadModel('sherpa-onnx-whisper-tiny.en');
await RunAnywhere.downloadModel('vits-piper-en_US-lessac-medium');

// 6. Load STT model
const sttModel = await RunAnywhere.getModelInfo('sherpa-onnx-whisper-tiny.en');
await RunAnywhere.loadSTTModel(sttModel.localPath, 'whisper');

// 7. Transcribe audio
const result = await RunAnywhere.transcribeFile(audioFilePath, {
  language: 'en',
});
console.log('Transcription:', result.text);

// 8. Load TTS model
const ttsModel = await RunAnywhere.getModelInfo('vits-piper-en_US-lessac-medium');
await RunAnywhere.loadTTSModel(ttsModel.localPath, 'piper');

// 9. Synthesize speech
const audio = await RunAnywhere.synthesize('Hello world.', {
  rate: 1.0,
  pitch: 1.0,
});
console.log('Audio duration:', audio.duration, 'seconds');

API Reference

ONNX Module

import { ONNX, ModelArtifactType } from '@runanywhere/onnx';

`ONNX.register()`

ONNX.register(): void

Example:

await RunAnywhere.initialize({ ... });
ONNX.register();  // Now STT/TTS features are available

`ONNX.addModel(options)`

Add an ONNX model (STT or TTS) to the model registry.

await ONNX.addModel(options: ONNXModelOptions): Promise<ModelInfo>

Parameters:

interface ONNXModelOptions {
  /**
   * Unique model ID.
   * If not provided, generated from the URL filename.
   */
  id?: string;

  /** Display name for the model */
  name: string;

  /** Download URL for the model */
  url: string;

  /**
   * Model category.
   * Required: ModelCategory.SpeechRecognition or ModelCategory.SpeechSynthesis
   */
  modality: ModelCategory;

  /**
   * How the model is packaged.
   * If not provided, inferred from URL extension.
   */
  artifactType?: ModelArtifactType;

  /** Memory requirement in bytes */
  memoryRequirement?: number;
}

enum ModelArtifactType {
  SingleFile = 'singleFile',      // Single .onnx file
  TarGzArchive = 'tarGzArchive',  // .tar.gz archive
  TarBz2Archive = 'tarBz2Archive', // .tar.bz2 archive
  ZipArchive = 'zipArchive',      // .zip archive
}

Returns: Promise<ModelInfo> — The registered model info

Example:

// STT Model (Whisper)
await ONNX.addModel({
  id: 'sherpa-onnx-whisper-tiny.en',
  name: 'Sherpa Whisper Tiny (English)',
  url: 'https://github.com/RunanywhereAI/sherpa-onnx/releases/.../sherpa-onnx-whisper-tiny.en.tar.gz',
  modality: ModelCategory.SpeechRecognition,
  artifactType: ModelArtifactType.TarGzArchive,
  memoryRequirement: 75_000_000,
});

// Larger STT Model
await ONNX.addModel({
  id: 'sherpa-onnx-whisper-small.en',
  name: 'Sherpa Whisper Small (English)',
  url: 'https://github.com/RunanywhereAI/sherpa-onnx/releases/download/runanywhere-models-v1/sherpa-onnx-whisper-small.en.tar.gz',
  modality: ModelCategory.SpeechRecognition,
  artifactType: ModelArtifactType.TarGzArchive,
  memoryRequirement: 250_000_000,
});

// TTS Model (Piper)
await ONNX.addModel({
  id: 'vits-piper-en_US-lessac-medium',
  name: 'Piper TTS (US English - Lessac)',
  url: 'https://github.com/RunanywhereAI/sherpa-onnx/releases/.../vits-piper-en_US-lessac-medium.tar.gz',
  modality: ModelCategory.SpeechSynthesis,
  artifactType: ModelArtifactType.TarGzArchive,
  memoryRequirement: 65_000_000,
});

Module Properties

ONNX.moduleId        // 'onnx'
ONNX.moduleName      // 'ONNX Runtime'
ONNX.inferenceFramework  // LLMFramework.ONNX
ONNX.capabilities    // ['stt', 'tts']
ONNX.defaultPriority // 100

Speech-to-Text API

Use the RunAnywhere API for STT operations:

Load STT Model

await RunAnywhere.loadSTTModel(
  modelPath: string,
  modelType?: string  // 'whisper' (default)
): Promise<boolean>

Check Model Status

const isLoaded = await RunAnywhere.isSTTModelLoaded(): Promise<boolean>

Unload Model

await RunAnywhere.unloadSTTModel(): Promise<void>

Transcribe Audio File

const result = await RunAnywhere.transcribeFile(
  audioPath: string,
  options?: STTOptions
): Promise<STTResult>

STT Options:

interface STTOptions {
  language?: string;        // e.g., 'en', 'es', 'fr'
  punctuation?: boolean;    // Enable punctuation
  diarization?: boolean;    // Enable speaker diarization
  wordTimestamps?: boolean; // Enable word-level timestamps
  sampleRate?: number;      // Audio sample rate
}

STT Result:

interface STTResult {
  text: string;              // Full transcription
  segments: STTSegment[];    // Segments with timing
  language?: string;         // Detected language
  confidence: number;        // Overall confidence (0-1)
  duration: number;          // Audio duration in seconds
  alternatives: STTAlternative[];
}

interface STTSegment {
  text: string;
  startTime: number;  // seconds
  endTime: number;    // seconds
  confidence: number;
}

Transcribe Raw Audio

// From base64-encoded audio
const result = await RunAnywhere.transcribe(
  audioData: string,  // base64 float32 PCM
  options?: STTOptions
): Promise<STTResult>

// From float32 samples
const result = await RunAnywhere.transcribeBuffer(
  samples: number[],
  sampleRate: number,
  options?: STTOptions
): Promise<STTResult>

Text-to-Speech API

Use the RunAnywhere API for TTS operations:

Load TTS Model

await RunAnywhere.loadTTSModel(
  modelPath: string,
  modelType?: string  // 'piper' (default)
): Promise<boolean>

Check Model Status

const isLoaded = await RunAnywhere.isTTSModelLoaded(): Promise<boolean>

Unload Model

await RunAnywhere.unloadTTSModel(): Promise<void>

Synthesize Speech

const result = await RunAnywhere.synthesize(
  text: string,
  options?: TTSConfiguration
): Promise<TTSResult>

TTS Configuration:

interface TTSConfiguration {
  voice?: string;   // Voice identifier
  rate?: number;    // Speed (0.5-2.0, default: 1.0)
  pitch?: number;   // Pitch (0.5-2.0, default: 1.0)
  volume?: number;  // Volume (0.0-1.0, default: 1.0)
}

TTS Result:

interface TTSResult {
  audio: string;      // Base64-encoded float32 PCM
  sampleRate: number; // Audio sample rate (typically 22050)
  numSamples: number; // Total sample count
  duration: number;   // Duration in seconds
}

Streaming Synthesis

await RunAnywhere.synthesizeStream(
  text: string,
  options?: TTSConfiguration,
  onChunk?: (chunk: TTSOutput) => void
): Promise<TTSResult>

System TTS (Platform Native)

// Speak using AVSpeechSynthesizer (iOS) or Android TTS
await RunAnywhere.speak(text: string, options?: TTSConfiguration): Promise<void>

// Control playback
const isSpeaking = await RunAnywhere.isSpeaking(): Promise<boolean>
await RunAnywhere.stopSpeaking(): Promise<void>

// List available voices
const voices = await RunAnywhere.availableTTSVoices(): Promise<TTSVoiceInfo[]>

Voice Activity Detection API

Initialize VAD

await RunAnywhere.initializeVAD(config?: VADConfiguration): Promise<boolean>

VAD Configuration:

interface VADConfiguration {
  energyThreshold?: number;   // Speech detection threshold
  sampleRate?: number;        // Audio sample rate
  frameLength?: number;       // Frame length in ms
  autoCalibration?: boolean;  // Enable auto-calibration
}

Load VAD Model

await RunAnywhere.loadVADModel(modelPath: string): Promise<boolean>

Process Audio

const result = await RunAnywhere.processVAD(
  audioSamples: number[]
): Promise<VADResult>

VAD Result:

interface VADResult {
  isSpeech: boolean;    // Whether speech is detected
  confidence: number;   // Confidence score (0-1)
  startTime?: number;   // Speech segment start
  endTime?: number;     // Speech segment end
}

Continuous VAD

// Start/stop continuous processing
await RunAnywhere.startVAD(): Promise<void>
await RunAnywhere.stopVAD(): Promise<void>

// Set callbacks
RunAnywhere.setVADSpeechActivityCallback((event) => {
  if (event.type === 'speechStarted') {
    console.log('Speech started');
  } else if (event.type === 'speechEnded') {
    console.log('Speech ended');
  }
});

Supported Models

Speech-to-Text (Whisper)

| Model | Size | Memory | Languages | Description | |-------|------|--------|-----------|-------------| | whisper-tiny.en | ~75MB | 100MB | English | Fastest, English-only | | whisper-base.en | ~150MB | 200MB | English | Better accuracy | | whisper-small.en | ~250MB | 350MB | English | High quality | | whisper-tiny | ~75MB | 100MB | 99+ | Multilingual |

Text-to-Speech (Piper)

| Voice | Size | Language | Description | |-------|------|----------|-------------| | en_US-lessac-medium | ~65MB | English (US) | Natural, clear | | en_US-amy-medium | ~65MB | English (US) | Female voice | | en_GB-alba-medium | ~65MB | English (UK) | British accent | | de_DE-thorsten-medium | ~65MB | German | German voice | | es_ES-mls-medium | ~65MB | Spanish | Spanish voice | | fr_FR-siwis-medium | ~65MB | French | French voice |

Voice Activity Detection

| Model | Size | Description | |-------|------|-------------| | silero-vad | ~2MB | High accuracy, real-time |

Usage Examples

Complete STT Example

import { RunAnywhere, SDKEnvironment, ModelCategory } from '@runanywhere/core';
import { ONNX, ModelArtifactType } from '@runanywhere/onnx';

async function transcribeAudio(audioPath: string): Promise<string> {
  // Initialize
  await RunAnywhere.initialize({ environment: SDKEnvironment.Development });
  ONNX.register();

  // Add model
  await ONNX.addModel({
    id: 'whisper-tiny-en',
    name: 'Whisper Tiny English',
    url: 'https://github.com/RunanywhereAI/sherpa-onnx/releases/.../sherpa-onnx-whisper-tiny.en.tar.gz',
    modality: ModelCategory.SpeechRecognition,
    artifactType: ModelArtifactType.TarGzArchive,
  });

  // Download if needed
  if (!(await RunAnywhere.isModelDownloaded('whisper-tiny-en'))) {
    await RunAnywhere.downloadModel('whisper-tiny-en', (p) => {
      console.log(`Download: ${(p.progress * 100).toFixed(1)}%`);
    });
  }

  // Load and transcribe
  const model = await RunAnywhere.getModelInfo('whisper-tiny-en');
  await RunAnywhere.loadSTTModel(model.localPath, 'whisper');

  const result = await RunAnywhere.transcribeFile(audioPath, {
    language: 'en',
    wordTimestamps: true,
  });

  console.log('Transcription:', result.text);
  console.log('Confidence:', result.confidence);
  console.log('Duration:', result.duration, 'seconds');

  return result.text;
}

Complete TTS Example

async function synthesizeSpeech(text: string): Promise<string> {
  // Initialize
  await RunAnywhere.initialize({ environment: SDKEnvironment.Development });
  ONNX.register();

  // Add model
  await ONNX.addModel({
    id: 'piper-lessac',
    name: 'Piper Lessac Voice',
    url: 'https://github.com/RunanywhereAI/sherpa-onnx/releases/.../vits-piper-en_US-lessac-medium.tar.gz',
    modality: ModelCategory.SpeechSynthesis,
    artifactType: ModelArtifactType.TarGzArchive,
  });

  // Download if needed
  if (!(await RunAnywhere.isModelDownloaded('piper-lessac'))) {
    await RunAnywhere.downloadModel('piper-lessac');
  }

  // Load and synthesize
  const model = await RunAnywhere.getModelInfo('piper-lessac');
  await RunAnywhere.loadTTSModel(model.localPath, 'piper');

  const result = await RunAnywhere.synthesize(text, {
    rate: 1.0,
    pitch: 1.0,
    volume: 0.8,
  });

  console.log('Duration:', result.duration, 'seconds');
  console.log('Sample rate:', result.sampleRate);

  // result.audio is base64-encoded float32 PCM
  return result.audio;
}

Voice Pipeline (STT → TTS)

async function voiceEcho(audioPath: string): Promise<string> {
  // Transcribe input audio
  const transcription = await RunAnywhere.transcribeFile(audioPath);
  console.log('You said:', transcription.text);

  // Synthesize echo
  const audio = await RunAnywhere.synthesize(
    `You said: ${transcription.text}`
  );

  return audio.audio;
}

Native Integration

iOS

This package uses RABackendONNX.xcframework which includes:

ONNX Runtime compiled for iOS
Sherpa-ONNX (Whisper, Piper, Silero VAD)
Optimized for Apple Silicon

Dependencies:

onnxruntime.xcframework — ONNX Runtime core

Android

Native libraries include:

librunanywhere_onnx.so — ONNX backend
libonnxruntime.so — ONNX Runtime
libsherpa-onnx-*.so — Sherpa-ONNX libraries

Package Structure

packages/onnx/
├── src/
│   ├── index.ts                # Package exports
│   ├── ONNX.ts                 # Module API (register, addModel)
│   ├── ONNXProvider.ts         # Service provider
│   ├── native/
│   │   └── NativeRunAnywhereONNX.ts
│   └── specs/
│       └── RunAnywhereONNX.nitro.ts
├── cpp/
│   ├── HybridRunAnywhereONNX.cpp
│   ├── HybridRunAnywhereONNX.hpp
│   └── bridges/
├── ios/
│   ├── RunAnywhereONNX.podspec
│   └── Frameworks/
│       ├── RABackendONNX.xcframework
│       └── onnxruntime.xcframework
├── android/
│   ├── build.gradle
│   └── src/main/jniLibs/
│       └── arm64-v8a/
│           ├── librunanywhere_onnx.so
│           ├── libonnxruntime.so
│           └── libsherpa-onnx-*.so
└── nitrogen/
    └── generated/

Troubleshooting

STT model fails to load

Symptoms: modelLoadFailed error when loading STT model

Solutions:

Verify the model directory contains all required files
Check that archive extraction completed successfully
Ensure the correct model type is specified ('whisper')

Poor transcription quality

Symptoms: Transcription has many errors

Solutions:

Use a larger model (small instead of tiny)
Ensure audio is clear with minimal background noise
Check audio sample rate matches model expectations
Try specifying the language explicitly

TTS audio is silent or distorted

Symptoms: No sound or garbled audio

Solutions:

Verify audio data is being decoded correctly
Check sample rate matches playback device
Ensure volume is not zero
Try a different TTS voice

Permission denied (microphone)

Symptoms: Audio recording fails

Solutions:

Add microphone permission to Info.plist (iOS)
Add RECORD_AUDIO permission to AndroidManifest.xml
Request runtime permission before recording

License

MIT License

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

@runanywhere/onnx

Overview

Speech-to-Text (STT)

Text-to-Speech (TTS)

Voice Activity Detection (VAD)

Requirements

Installation

iOS Setup

Android Setup

Quick Start

API Reference

ONNX Module

ONNX.register()

ONNX.addModel(options)

Module Properties

Speech-to-Text API

Load STT Model

Check Model Status

Unload Model

Transcribe Audio File

Transcribe Raw Audio

Text-to-Speech API

Load TTS Model

Check Model Status

Unload Model

Synthesize Speech

Streaming Synthesis

System TTS (Platform Native)

Voice Activity Detection API

Initialize VAD

Load VAD Model

Process Audio

Continuous VAD

Supported Models

Speech-to-Text (Whisper)

Text-to-Speech (Piper)

Voice Activity Detection

Usage Examples

Complete STT Example

Complete TTS Example

Voice Pipeline (STT → TTS)

Native Integration

iOS

Android

Package Structure

Troubleshooting

STT model fails to load

Poor transcription quality

TTS audio is silent or distorted

Permission denied (microphone)

See Also

License

`ONNX.register()`

`ONNX.addModel(options)`