@wovin/tranz
v0.1.26
Published
Audio transcription library with provider support and auto-splitting
Readme
@wovin/tranz
Audio transcription library with provider support and auto-splitting for long audio files.
Features
- Multiple Transcription Providers: Mistral Voxtral, Whisper, GreenPT
- Realtime Transcription: Stream audio from microphone or other sources for live transcription
- Automatic Audio Splitting: Handles long audio files by intelligently splitting at silence points
- Smart Input Support: Files, URLs (with HTTP range probing), or buffers
- Speaker Diarization: Identify different speakers in audio
- Flexible Timestamps: Word-level or segment-level timing
- Result Merging: Automatically merge split segment results with accurate timing
Installation
# npm
npm install @wovin/tranz
# pnpm
pnpm add @wovin/tranz
# yarn
yarn add @wovin/tranz
# deno
deno add @wovin/tranz
# jsr (for any runtime)
npx jsr add @wovin/tranzQuick Start
import { createMistralTranscriber } from '@wovin/tranz'
// Create a transcriber instance
const transcriber = createMistralTranscriber({
apiKey: process.env.MISTRAL_API_KEY,
model: 'voxtral-mini-latest'
})
// Transcribe from file (auto-splits if too long)
const result = await transcriber.transcribe({
audioPath: './interview.mp3',
diarize: true,
timestamps: 'segment'
})
console.log(result.text)
console.log(result.words) // word-level timestamps
console.log(result.speakers) // speaker segmentsUsage Examples
Transcribe from URL
// Smart handling: probes duration via HTTP, downloads only if splitting needed
const result = await transcriber.transcribe({
audioUrl: 'https://example.com/audio.mp3'
})
// If you know the duration, skip detection
const result = await transcriber.transcribe({
audioUrl: 'https://example.com/audio.mp3',
duration: 120 // seconds
})Transcribe from Buffer
const audioBuffer = fs.readFileSync('./audio.mp3')
const result = await transcriber.transcribe({
audioBuffer,
mimeType: 'audio/mpeg'
})Control Auto-Splitting
// Disable auto-split (use for short audio)
const result = await transcriber.transcribe({
audioPath: './short-clip.mp3',
autoSplit: false
})
// Specify custom split output directory
const result = await transcriber.transcribe({
audioPath: './long-audio.mp3',
splitOutputDir: './segments'
})Language Specification
// Note: setting language disables word-level timestamps for Mistral
const result = await transcriber.transcribe({
audioPath: './french-audio.mp3',
language: 'fr',
timestamps: 'segment'
})Custom Logging
const result = await transcriber.transcribe({
audioPath: './audio.mp3',
logger: {
info: (msg) => console.log(`[INFO] ${msg}`),
warn: (msg) => console.warn(`[WARN] ${msg}`),
debug: (msg) => console.debug(`[DEBUG] ${msg}`)
},
verbose: true // promotes debug logs to info level
})Advanced: Using Providers Directly
import { MistralProvider, WhisperProvider } from '@wovin/tranz/providers'
// Mistral provider
const mistral = new MistralProvider()
const result = await mistral.transcribe({
audioPath: './audio.mp3',
apiKey: process.env.MISTRAL_API_KEY,
model: 'voxtral-mini-latest',
diarize: true,
timestampGranularity: 'segment'
})
// Whisper provider (local)
const whisper = new WhisperProvider()
const result = await whisper.transcribe({
audioPath: './audio.mp3',
model: 'base'
})Realtime Transcription
Stream audio for real-time transcription using Mistral's WebSocket API:
import {
createRealtimeTranscriber,
captureAudioFromMicrophone,
} from '@wovin/tranz/realtime'
// Create realtime transcriber
const transcriber = createRealtimeTranscriber({
apiKey: process.env.MISTRAL_API_KEY,
})
// Capture audio from microphone (requires SoX)
const { stream, stop } = captureAudioFromMicrophone(16000)
try {
for await (const event of transcriber.transcribe(stream)) {
if (event.type === 'transcription.text.delta') {
process.stdout.write(event.text)
} else if (event.type === 'transcription.done') {
console.log('\nComplete:', event.text)
break
}
}
} finally {
stop()
}Custom Audio Source
You can provide any AsyncIterable<Uint8Array> as an audio source:
async function* myAudioSource() {
// Read from file, socket, etc.
const buffer = await readSomeAudio()
yield new Uint8Array(buffer)
}
for await (const event of transcriber.transcribe(myAudioSource())) {
// Handle events
}Realtime Event Types
session.created- WebSocket connection establishedsession.updated- Audio format confirmedtranscription.text.delta- Transcription text chunks (use for live display)transcription.language- Detected audio languagetranscription.done- Complete transcript availableerror- Error occurred
Limitations
The WebSocket realtime API has some limitations compared to batch transcription:
- No timestamp information (no word or segment timing)
- No speaker diarization
- Designed for streaming/live use cases, not long audio files
For timestamped transcriptions or speaker identification, use the batch API instead.
Advanced: Audio Utilities
import {
autoSplitAudio,
getAudioDuration,
mergeTranscriptionResults
} from '@wovin/tranz/audio'
// Get audio duration
const duration = await getAudioDuration('./audio.mp3')
// Split long audio at optimal silence points
const segments = await autoSplitAudio('./long-audio.mp3', './output-dir', {
maxDurationSec: 300, // 5 minutes
minSilenceDuration: 0.5,
silenceThreshold: -40
})
// Manually transcribe and merge segments
const results = await Promise.all(
segments.map(seg => transcribe(seg.outputPath))
)
const merged = mergeTranscriptionResults(results, segments)API Reference
createMistralTranscriber(config)
Creates a Mistral transcriber instance with auto-splitting support.
Config:
apiKey: string- Mistral API key (required)model?: string- Model name (default: 'voxtral-mini-latest')
Returns: MistralTranscriber with transcribe(options) method
TranscribeOptions
Options for the transcribe() method:
audioPath?: string- Path to audio fileaudioBuffer?: Buffer- Audio data as buffermimeType?: string- MIME type for buffer (auto-detected if omitted)audioUrl?: string- URL to audio file (supports HTTP range probing)duration?: number- Known duration in seconds (skips detection)language?: string- Language code (e.g., 'en', 'fr') - disables word timestampsmodel?: string- Override default modeldiarize?: boolean- Enable speaker diarization (default: true)timestamps?: 'word' | 'segment'- Timestamp granularity (default: 'segment' when diarize is true, disabled if language is set)autoSplit?: boolean- Auto-split long audio (default: true)splitOutputDir?: string- Directory for split segments (default: system temp)logger?: TranscribeLogger- Custom loggerverbose?: boolean- Enable debug logging
TranscriptionResult
Result from transcription:
text: string- Full transcription textduration?: number- Audio duration in secondslanguage?: string- Detected or specified languagewords?: WordData[]- Word-level timestamps and confidencespeakers?: SpeakerSegment[]- Speaker diarization dataerror?: string- Error message if transcription failed
MergedTranscriptionResult
Extended result for multi-segment transcriptions:
- All fields from
TranscriptionResult totalSegments: number- Number of segments mergedsegments?: TranscriptionResult[]- Individual segment results
Providers
Mistral (Voxtral)
- Models:
voxtral-mini-latest,voxtral-large-latest - Max recommended duration: 300s (5 minutes)
- Auto-split supported: Yes
- Speaker diarization: Yes
- Word timestamps: Yes (unless language specified)
Whisper (Local)
- Requires local Whisper installation
- Models:
tiny,base,small,medium,large - No API key required
GreenPT
- API-based transcription
- Requires
GREENPT_API_KEY
License
AGPL-3.0-or-later
