react-native-sherpa-onnx
v0.3.9
Published
Offline Speech-to-text, text-to-speech, speaker diarization, speech enhancement, source separation, and VAD with sherpa-onnx for React NativeSpeech-to-Text with sherpa-onnx for React Native
Readme
react-native-sherpa-onnx
React Native SDK for sherpa-onnx – offline and streaming speech processing
⚠️ SDK 0.3.0 – Breaking changes from 0.2.0
Since the last release I have restructured and improved the SDK significantly: full iOS support, smoother behaviour, fewer failure points, and a much smaller footprint (~95% size reduction). As a result, logic and the public API have changed. If you are upgrading from 0.2.x, please follow the Breaking changes (upgrading to 0.3.0) section and the updated API documentation
A React Native TurboModule that provides offline and streaming speech processing capabilities using sherpa-onnx. The SDK aims to support all functionalities that sherpa-onnx offers, including offline and online (streaming) speech-to-text, text-to-speech (batch and streaming), speaker diarization, speech enhancement, source separation, and VAD (Voice Activity Detection).
Installation
npm install react-native-sherpa-onnxIf your project uses Yarn (v3+) or Plug'n'Play, configure Yarn to use the Node Modules linker to avoid postinstall issues:
# .yarnrc.yml
nodeLinker: node-modulesAlternatively, set the environment variable during install:
YARN_NODE_LINKER=node-modules yarn installAndroid
No additional setup required. The library automatically handles native dependencies via Gradle. For execution provider support (CPU, NNAPI, XNNPACK, QNN) and optional QNN setup, see Execution provider support. For building Android native libs yourself, see sherpa-onnx-prebuilt.
iOS
The sherpa-onnx XCFramework is not shipped in the repo or npm (size ~80MB). It is downloaded automatically when you run pod install; no manual steps are required. The version used is pinned in third_party/sherpa-onnx-prebuilt/IOS_RELEASE_TAG (format: sherpa-onnx-ios-vX.Y.Z or sherpa-onnx-ios-vX.Y.Z-N with optional build number) and the archive is fetched from GitHub Releases.
Setup
cd your-app/ios
bundle install
bundle exec pod installThe podspec runs scripts/setup-ios-framework.sh, which downloads the XCFramework (and, if needed, libarchive sources) so the Pod builds correctly. Libarchive is compiled from source as part of the Pod; its version is pinned in third_party/libarchive_prebuilt/IOS_RELEASE_TAG.
Building the iOS framework
To build the sherpa-onnx iOS XCFramework yourself (e.g. custom version or patches), see third_party/sherpa-onnx-prebuilt/README.md and the Framework - Sherpa-Onnx (iOS) Release workflow.
Model download (optional)
If you use the download manager to fetch models at runtime, add the following to your AppDelegate so background downloads can finish when the app is in the background or after it was terminated. Without it, downloads only work reliably while the app is in the foreground.
- Swift (RN 0.77+): In your bridging header add
#import <RNBackgroundDownloader.h>. InAppDelegate.swift, implement:func application(_ application: UIApplication, handleEventsForBackgroundURLSession identifier: String, completionHandler: @escaping () -> Void) { RNBackgroundDownloader.setCompletionHandlerWithIdentifier(identifier, completionHandler: completionHandler) } - Objective-C: In
AppDelegate.madd#import <RNBackgroundDownloader.h>and theapplication:handleEventsForBackgroundURLSession:completionHandler:implementation that calls[RNBackgroundDownloader setCompletionHandlerWithIdentifier:identifier completionHandler:completionHandler].
Full step-by-step: Download manager – Setup (iOS & Android). Expo users can use the library’s config plugin to apply this automatically.
Android: Foreground service permissions (Play Console), visible download notifications, and POST_NOTIFICATIONS (API 33+) are covered in Download manager – Android: foreground service & notifications.
Table of contents
- Bundled sherpa-onnx version
- Installation
- Feature Support
- Platform Support Status
- Known issues
- Supported Model Types
- Documentation
- Requirements
- Breaking changes (upgrading to 0.3.0)
- Example Apps
- Contributing
- License
Bundled sherpa-onnx version
| Platform | Version | |----------|---------| | Android | 1.12.31 | | iOS | 1.12.31 |
Feature Support
| Feature | Status | Docs | Notes |
|---------|--------|------|-------|
| Offline Speech-to-Text | ✅ Supported | STT | No internet required; multiple model types (Zipformer, Paraformer, Whisper, etc.). See Supported Model Types. |
| Online (streaming) Speech-to-Text | ✅ Supported | Streaming STT | Real-time recognition from microphone or stream; partial results, endpoint detection. Use streaming-capable models (e.g. transducer, paraformer). |
| Live capture API | ✅ Supported | PCM live stream | Native microphone capture with resampling for live transcription (use with streaming STT). |
| Text-to-Speech | ✅ Supported | TTS | Multiple model types (VITS, Matcha, Kokoro, etc.). See Supported Model Types. |
| Streaming Text-to-Speech | ✅ Supported | Streaming TTS | Incremental speech generation for low time-to-first-byte and playback while generating. |
| Execution providers (CPU, NNAPI, XNNPACK, Core ML, QNN) | ✅ Supported | Execution providers | CPU default; optional accelerators per platform. |
| Play Asset Delivery (PAD) | ✅ Supported | Model setup | Android only. Archives: Extraction API. |
| Automatic Model type detection | ✅ Supported | Model detection | detectSttModel() and detectTtsModel() for a path. |
| Model quantization | ✅ Supported | Model setup | Automatic detection and preference for quantized (int8) models. |
| Flexible model loading | ✅ Supported | Model setup | Asset models, file system models, or auto-detection. |
| TypeScript | ✅ Supported | — | Full type definitions included. |
| Speech Enhancement | ❌ Not yet supported | Enhancement | Scheduled for release 0.4.0 |
| Speaker Diarization | ❌ Not yet supported | Diarization | Scheduled for release 0.5.0 |
| Source Separation | ❌ Not yet supported | Separation | Scheduled for release 0.6.0 |
| VAD (Voice Activity Detection) | ❌ Not yet supported | VAD | Scheduled for release 0.7.0 |
Platform Support Status
| Platform | Status | Notes | |----------|--------|-------| | Android | ✅ Production Ready | CI/CD automated, multiple models supported | | iOS | ✅ Production Ready | CI/CD automated, multiple models supported |
Known issues
- Pocket TTS (voice cloning) — voice cloning: Android supported; iOS experimental. Heuristic EOS and iOS vs Android drift (length/quality); not a React Native–only issue. Full notes: investigation doc.
Supported Model Types
Speech-to-Text (STT) Models
| Model Type | modelType Value | Description | Download Links |
| ------------------------ | ----------------- | ---------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------ |
| Auto Detect | 'auto' | Automatically detects model layout/type from files in the model folder and picks the best supported STT type. | n/a |
| Zipformer/Transducer | 'transducer' | Encoder–decoder–joiner (e.g. icefall). Good balance of speed and accuracy. Folder name should contain zipformer or transducer for auto-detection. | Download |
| LSTM Transducer | 'transducer' | Same layout as Zipformer (encoder–decoder–joiner). LSTM-based streaming ASR; detected as transducer. Folder name may contain lstm. | Download |
| Paraformer | 'paraformer' | Single-model non-autoregressive ASR; fast and accurate. Detected by model.onnx; no folder token required. | Download |
| NeMo CTC | 'nemo_ctc' | NeMo CTC; good for English and streaming. Folder name should contain nemo or parakeet. | Download |
| Whisper | 'whisper' | Multilingual, encoder–decoder; strong zero-shot. Detected by encoder+decoder (no joiner); folder token optional. | Download |
| WeNet CTC | 'wenet_ctc' | CTC from WeNet; compact. Folder name should contain wenet. | Download |
| SenseVoice | 'sense_voice' | Multilingual with emotion/punctuation. Folder name should contain sense or sensevoice. | Download |
| FunASR Nano | 'funasr_nano' | Lightweight LLM-based ASR. Folder name should contain funasr or funasr-nano. | Download |
| Moonshine (v1) | 'moonshine' | Four-part streaming-capable ASR (preprocess, encode, uncached/cached decode). Folder name should contain moonshine. | Download |
| Moonshine (v2) | 'moonshine_v2' | Two-part Moonshine (encoder + merged decoder); .onnx or .ort. Folder name should contain moonshine (v2 preferred if both layouts present). | Download |
| Fire Red ASR | 'fire_red_asr' | Fire Red encoder–decoder ASR. Folder name should contain fire_red or fire-red. | Download |
| Dolphin | 'dolphin' | Single-model CTC. Folder name should contain dolphin. | Download |
| Canary | 'canary' | NeMo Canary multilingual. Folder name should contain canary. | Download |
| Omnilingual | 'omnilingual' | Omnilingual CTC. Folder name should contain omnilingual. | Download |
| MedASR | 'medasr' | Medical ASR CTC. Folder name should contain medasr. | Download |
| Telespeech CTC | 'telespeech_ctc'| Telespeech CTC. Folder name should contain telespeech. | Download |
| Tone CTC (t-one) | 'tone_ctc' | Lightweight streaming CTC (e.g. t-one). Folder name should contain t-one, t_one, or tone (as word). | Download |
For real-time (streaming) recognition from a microphone or audio stream, use streaming-capable model types: transducer, paraformer, zipformer2_ctc, nemo_ctc, or tone_ctc. See Streaming (Online) Speech-to-Text.
Text-to-Speech (TTS) Models
| Model Type | modelType Value | Description | Download Links |
| ---------------- | ----------------- | ---------------------------------------------------------------------------------------------------- | ----------------------------------------------------------------------------------- |
| Auto Detect | 'auto' | Automatically detects the TTS model layout from files in the model folder and selects the matching supported type. | n/a |
| VITS | 'vits' | Fast, high-quality TTS (Piper, Coqui, MeloTTS, MMS). Folder name should contain vits if used with other voice models. | Download |
| Matcha | 'matcha' | High-quality acoustic model + vocoder. Detected by acoustic_model + vocoder; no folder token required. | Download |
| Kokoro | 'kokoro' | Multi-speaker, multi-language. Folder name should contain kokoro (not kitten) for auto-detection. | Download |
| KittenTTS | 'kitten' | Lightweight, multi-speaker. Folder name should contain kitten (not kokoro) for auto-detection. | Download |
| Zipvoice | 'zipvoice' | Standard TTS with sid. Voice cloning (reference audio + referenceText): batch via generateSpeech only—streaming TTS does not support reference audio for Zipvoice. Default numSteps when omitted is 5 on Android and iOS (matches sherpa-onnx GenerationConfig / Kotlin helper). Cloning is supported on Android & iOS. Encoder + decoder + vocoder. | Download |
| Pocket | 'pocket' | Flow-matching TTS. Voice cloning on Android: batch and streaming TTS. iOS: cloning is experimental. Detected by lm_flow, lm_main, text_conditioner, vocab/token_scores. | Download |
| Supertonic | 'supertonic' | Lightning-fast, on-device text-to-speech system designed for extreme performance with minimal computational overhead. | Download |
For streaming TTS (incremental generation, low latency), use createStreamingTTS() with supported model types. See Streaming Text-to-Speech.
Documentation
- Known issues – SDK-facing notes (e.g. Pocket TTS cloning / cross-platform behavior)
- Speech-to-Text (STT) – Offline transcription (file or samples)
- Streaming (Online) Speech-to-Text – Real-time recognition, partial results, endpoint detection
- PCM Live Stream – Native microphone capture with resampling for live transcription (use with streaming STT)
- Text-to-Speech (TTS) – Offline and streaming generation
- Streaming Text-to-Speech – Incremental TTS (createStreamingTTS)
- Execution provider support (QNN, NNAPI, XNNPACK, Core ML) – Checking and using acceleration backends
- Voice Activity Detection (VAD)
- Speaker Diarization
- Speech Enhancement
- Source Separation
- Model Setup – Bundled assets, Play Asset Delivery (PAD), model discovery APIs, and troubleshooting
- Model Download Manager
- Extraction API
- Disable FFMPEG
- Disable LIBARCHIVE
Note: For when to use listAssetModels() vs listModelsAtPath() and how to combine bundled and PAD/file-based models, see Model Setup.
Requirements
- React Native >= 0.70
- Android API 24+ (Android 7.0+)
- iOS 13.0+
Example Apps
We provide example applications to help you get started with react-native-sherpa-onnx:
Example App (Audio to Text)
The example app included in this repository demonstrates audio-to-text transcription, text-to-speech, and streaming features. It includes:
- Multiple model type support (Zipformer, Paraformer, NeMo CTC, Whisper, WeNet CTC, SenseVoice, FunASR Nano, Moonshine, and more)
- Model selection and configuration
- Offline audio file transcription
- Online (streaming) STT – live transcription from the microphone with partial results
- Streaming TTS – incremental speech generation and playback
- Test audio files for different languages
Getting started:
cd example
yarn install
yarn android # or yarn iosVideo to Text Comparison App
A comprehensive comparison app that demonstrates video-to-text transcription using react-native-sherpa-onnx alongside other speech-to-text solutions:
Repository: mobile-videototext-comparison
Features:
- Video to audio conversion (using native APIs)
- Audio to text transcription
- Video to text (video --> WAV --> text)
- Comparison between different STT providers
- Performance benchmarking
This app showcases how to integrate react-native-sherpa-onnx into a real-world application that processes video files and converts them to text.
Contributing
License
MIT
Third-Party Libraries
This SDK includes the following open source components:
sherpa-onnx (Apache License 2.0): https://github.com/k2-fsa/sherpa-onnx
ONNX Runtime (MIT License): https://github.com/microsoft/onnxruntime
FFmpeg (LGPL v2.1): https://ffmpeg.org
Shine MP3 Encoder (LGPL): https://github.com/toots/shine
Opus Codec (BSD License): https://opus-codec.org
Zstandard (zstd) (BSD License): https://github.com/facebook/zstd
libarchive (BSD License): https://github.com/libarchive/libarchive
Full license texts are available in the THIRD_PARTY_LICENSES directory.
LGPL Notice
This SDK includes LGPL-licensed components such as FFmpeg and Shine.
Applications using this SDK must ensure compliance with LGPL requirements when distributing binaries.
FFmpeg source code can be obtained at: https://ffmpeg.org
Qualcomm QNN Support
This SDK supports optional integration with Qualcomm AI Runtime (QNN).
QNN is proprietary software provided by Qualcomm and is not included in this SDK.
To use QNN acceleration, users must obtain and include the required QNN libraries separately and comply with Qualcomm's license terms:
https://softwarecenter.qualcomm.com/
Responsibility
By using this SDK, you are responsible for complying with all third-party licenses included in this project.
Made with create-react-native-library
