react-native-sherpa-onnx
v0.2.0
Published
Offline Speech-to-text, text-to-speech, speaker diarization, speech enhancement, source separation, and VAD with sherpa-onnx for React NativeSpeech-to-Text with sherpa-onnx for React Native
Readme
react-native-sherpa-onnx
React Native SDK for sherpa-onnx - providing offline speech processing capabilities
A React Native TurboModule that provides offline speech processing capabilities using sherpa-onnx. The SDK aims to support all functionalities that sherpa-onnx offers, including offline speech-to-text, text-to-speech, speaker diarization, speech enhancement, source separation, and VAD (Voice Activity Detection).
Feature Support
| Feature | Status | |---------|--------| | Offline Speech-to-Text | ✅ Supported | | Text-to-Speech | ✅ Supported | | Speaker Diarization | ❌ Not yet supported | | Speech Enhancement | ❌ Not yet supported | | Source Separation | ❌ Not yet supported | | VAD (Voice Activity Detection) | ❌ Not yet supported |
Platform Support Status
| Platform | Status | Notes | |----------|--------|-------| | Android | ✅ Production Ready | Fully tested, CI/CD automated, multiple models supported | | iOS | 🟡 Beta / Experimental | XCFramework + Podspec ready✅ GitHub Actions builds pass❌ No local Xcode testing (Windows-only dev) |
🔧 iOS Contributors WANTED!
Full iOS support is a priority! Help bring sherpa-onnx to iOS devices.
What's ready:
- ✅ XCFramework integration
- ✅ Podspec configuration
- ✅ GitHub Actions CI (macOS runner)
- ✅ TypeScript bindings
What's needed:
- Local Xcode testing (Simulator + Device)
- iOS example app (beyond CI)
- TurboModule iOS testing
- Edge case testing
Supported Model Types
Speech-to-Text (STT) Models
| Model Type | modelType Value | Description | Download Links |
| ------------------------ | ----------------- | ---------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------ |
| Zipformer/Transducer | 'transducer' | Requires encoder.onnx, decoder.onnx, joiner.onnx, and tokens.txt | Download |
| Paraformer | 'paraformer' | Requires model.onnx (or model.int8.onnx) and tokens.txt | Download |
| NeMo CTC | 'nemo_ctc' | Requires model.onnx (or model.int8.onnx) and tokens.txt | Download |
| Whisper | 'whisper' | Requires encoder.onnx, decoder.onnx, and tokens.txt | Download |
| WeNet CTC | 'wenet_ctc' | Requires model.onnx (or model.int8.onnx) and tokens.txt | Download |
| SenseVoice | 'sense_voice' | Requires model.onnx (or model.int8.onnx) and tokens.txt | Download |
| FunASR Nano | 'funasr_nano' | Requires encoder_adaptor.onnx, llm.onnx, embedding.onnx, and tokenizer directory | Download |
Text-to-Speech (TTS) Models
| Model Type | modelType Value | Description | Download Links |
| ---------------- | ----------------- | ---------------------------------------------------------------------------------------------------- | ----------------------------------------------------------------------------------- |
| VITS | 'vits' | Fast, high-quality TTS. Includes Piper, Coqui, MeloTTS, MMS variants. Requires model.onnx, tokens.txt | Download |
| Matcha | 'matcha' | High-quality acoustic model + vocoder. Requires acoustic_model.onnx, vocoder.onnx, tokens.txt | Download |
| Kokoro | 'kokoro' | Multi-speaker, multi-language. Requires model.onnx, voices.bin, tokens.txt, espeak-ng-data/ | Download |
| KittenTTS | 'kitten' | Lightweight, multi-speaker. Requires model.onnx, voices.bin, tokens.txt, espeak-ng-data/ | Download |
| Zipvoice | 'zipvoice' | Voice cloning capable. Requires encoder.onnx, decoder.onnx, vocoder.onnx, tokens.txt | Download |
Features
- ✅ Offline Speech-to-Text - No internet connection required for speech recognition
- ✅ Multiple Model Types - Supports Zipformer/Transducer, Paraformer, NeMo CTC, Whisper, WeNet CTC, SenseVoice, and FunASR Nano models
- ✅ Model Quantization - Automatic detection and preference for quantized (int8) models
- ✅ Flexible Model Loading - Asset models, file system models, or auto-detection
- ✅ Android Support - Fully supported on Android
- ✅ iOS Support - Fully supported on iOS (requires sherpa-onnx XCFramework)
- ✅ TypeScript Support - Full TypeScript definitions included
- 🚧 Additional Features Coming Soon - Speaker Diarization, Speech Enhancement, Source Separation, and VAD support are planned for future releases
Installation
npm install react-native-sherpa-onnxIf your project uses Yarn (v3+) or Plug'n'Play, configure Yarn to use the Node Modules linker to avoid postinstall issues:
# .yarnrc.yml
nodeLinker: node-modulesAlternatively, set the environment variable during install:
YARN_NODE_LINKER=node-modules yarn installAndroid
No additional setup required. The library automatically handles native dependencies via Gradle.
iOS
The sherpa-onnx XCFramework is not included in the repository or npm package due to its size (~80MB), but no manual action is required! The framework is automatically downloaded during pod install.
Quick Setup
cd example
bundle install
bundle exec pod install --project-directory=iosThat's it! The Podfile automatically:
- Copies required header files from the git submodule
- Downloads the latest XCFramework from GitHub Releases
- Verifies everything is in place before building
For Advanced Users: Building the Framework Locally
If you want to build the XCFramework yourself instead of using the prebuilt release:
# Clone sherpa-onnx repository
git clone https://github.com/k2-fsa/sherpa-onnx.git
cd sherpa-onnx
git checkout v1.12.23
# Build the iOS XCFramework (requires macOS, Xcode, CMake, and ONNX Runtime)
./build-ios.sh
# Copy to your project
cp -r build-ios/sherpa_onnx.xcframework /path/to/react-native-sherpa-onnx/ios/Frameworks/Then run pod install as usual.
Note: The iOS implementation uses the same C++ wrapper as Android, ensuring consistent behavior across platforms.
Documentation
- Speech-to-Text (STT)
- Text-to-Speech (TTS)
- Voice Activity Detection (VAD)
- Speaker Diarization
- Speech Enhancement
- Source Separation
- General STT Model Setup
- General TTS Model Setup
Example Model READMEs
- kokoro (US) README
- kokoro (ZH) README
- funasr-nano README
- kitten-nano README
- matcha README
- nemo-ctc README
- paraformer README
- sense-voice README
- vits README
- wenet-ctc README
- whisper-tiny README
- zipformer README
Requirements
- React Native >= 0.70
- Android API 24+ (Android 7.0+)
- iOS 13.0+ (requires sherpa-onnx XCFramework - see iOS Setup below)
Example Apps
We provide example applications to help you get started with react-native-sherpa-onnx:
Example App (Audio to Text)
The example app included in this repository demonstrates basic audio-to-text transcription capabilities. It includes:
- Multiple model type support (Zipformer, Paraformer, NeMo CTC, Whisper, WeNet CTC, SenseVoice, FunASR Nano)
- Model selection and configuration
- Audio file transcription
- Test audio files for different languages
Getting started:
cd example
yarn install
yarn android # or yarn iosVideo to Text Comparison App
A comprehensive comparison app that demonstrates video-to-text transcription using react-native-sherpa-onnx alongside other speech-to-text solutions:
Repository: mobile-videototext-comparison
Features:
- Video to audio conversion (using native APIs)
- Audio to text transcription
- Video to text (video --> WAV --> text)
- Comparison between different STT providers
- Performance benchmarking
This app showcases how to integrate react-native-sherpa-onnx into a real-world application that processes video files and converts them to text.
Contributing
License
MIT
Made with create-react-native-library
