@fluidinference/react-native-fluidaudio

v0.1.0

Published

a month ago

React Native wrapper for FluidAudio - ASR, VAD, Diarization, and TTS

0High
0Medium
0Low

bweng

aweng123

react-native ios asr speech-to-text transcription vad voice-activity-detection diarization speaker-recognition tts text-to-speech fluidaudio

react-native-fluidaudio

React Native wrapper for FluidAudio - a Swift library for ASR, VAD, Speaker Diarization, and TTS on Apple platforms.

Features

ASR (Automatic Speech Recognition) - High-quality speech-to-text using Parakeet TDT models
Streaming ASR - Real-time transcription from microphone or system audio
VAD (Voice Activity Detection) - Detect speech segments in audio
Speaker Diarization - Identify and track different speakers
TTS (Text-to-Speech) - Natural voice synthesis using Kokoro TTS

Requirements

iOS 17.0+
React Native 0.71+ or Expo SDK 50+

Installation

React Native CLI

npm install react-native-fluidaudio

Add FluidAudio to your ios/Podfile:

pod 'FluidAudio', :git => 'https://github.com/FluidInference/FluidAudio.git', :tag => 'v0.7.8'

Then install pods:

cd ios && pod install

Expo

For Expo projects, use a development build:

npx expo install react-native-fluidaudio
npx expo prebuild
npx expo run:ios

Note: Expo Go is not supported - native modules require a development build.

Usage

Basic Transcription

import { ASRManager, onModelLoadProgress } from 'react-native-fluidaudio';

// Monitor model loading progress
const subscription = onModelLoadProgress((event) => {
  console.log(`Model loading: ${event.status} (${event.progress}%)`);
});

// Initialize ASR (downloads models on first run)
const asr = new ASRManager();
await asr.initialize();

// Transcribe an audio file
const result = await asr.transcribeFile('/path/to/audio.wav');
console.log(result.text);
console.log(`Confidence: ${result.confidence}`);
console.log(`Processing speed: ${result.rtfx}x realtime`);

// Clean up
subscription.remove();

Streaming Transcription

import { StreamingASRManager, onStreamingUpdate } from 'react-native-fluidaudio';

const streaming = new StreamingASRManager();

// Start streaming with update callback
await streaming.start({ source: 'microphone' }, (update) => {
  console.log('Confirmed:', update.confirmed);
  console.log('Volatile:', update.volatile);
});

// Feed audio data (16-bit PCM, 16kHz, base64 encoded)
await streaming.feedAudio(base64AudioChunk);

// Stop and get final result
const result = await streaming.stop();
console.log('Final transcription:', result.text);

Voice Activity Detection

import { VADManager } from 'react-native-fluidaudio';

const vad = new VADManager();
await vad.initialize({ threshold: 0.85 });

// Process audio file
const result = await vad.processFile('/path/to/audio.wav');

// Get speech segments
const segments = vad.getSpeechSegments(result);
segments.forEach((seg) => {
  console.log(`Speech from ${seg.start}s to ${seg.end}s`);
});

Speaker Diarization

import { DiarizationManager } from 'react-native-fluidaudio';

const diarizer = new DiarizationManager();
await diarizer.initialize({
  clusteringThreshold: 0.7,
  numClusters: -1, // Auto-detect number of speakers
});

// Diarize audio file
const result = await diarizer.diarizeFile('/path/to/meeting.wav');

// Get speaker information
const speakers = diarizer.getUniqueSpeakers(result);
const speakingTime = diarizer.getSpeakingTime(result);

result.segments.forEach((segment) => {
  console.log(`${segment.speakerId}: ${segment.startTime}s - ${segment.endTime}s`);
});

// Pre-register known speakers for identification
await diarizer.setKnownSpeakers([
  { id: 'alice', name: 'Alice', embedding: aliceEmbedding },
  { id: 'bob', name: 'Bob', embedding: bobEmbedding },
]);

Text-to-Speech

import { TTSManager } from 'react-native-fluidaudio';

const tts = new TTSManager();
await tts.initialize({ variant: 'fiveSecond' });

// Synthesize to audio data
const result = await tts.synthesize('Hello, world!');
console.log(`Audio duration: ${result.duration}s`);
// result.audioData is base64-encoded 16-bit PCM

// Synthesize directly to file
await tts.synthesizeToFile('Hello, world!', '/path/to/output.wav');

System Information

import { getSystemInfo } from 'react-native-fluidaudio';

const info = await getSystemInfo();
console.log(info.summary);
// e.g., "Apple A17 Pro, iOS 17.0"

Cleanup

import { cleanup } from 'react-native-fluidaudio';

// Clean up all resources when done
await cleanup();

API Reference

Managers

| Manager | Description | |---------|-------------| | ASRManager | Speech-to-text transcription | | StreamingASRManager | Real-time streaming transcription | | VADManager | Voice activity detection | | DiarizationManager | Speaker identification | | TTSManager | Text-to-speech synthesis |

Events

| Event | Description | |-------|-------------| | onStreamingUpdate | Streaming transcription updates | | onModelLoadProgress | Model download/compilation progress | | onTranscriptionError | Transcription errors |

Types

See src/types.ts for complete TypeScript definitions.

Notes

Model Loading

First initialization downloads and compiles ML models (~500MB total). This can take 20-30 seconds as Apple's Neural Engine compiles the models. Subsequent loads use cached compilations (~1 second).

TTS License

The TTS module uses ESpeakNG which is GPL licensed. Check license compatibility for your project.

License

MIT