@wovin/tranz
v0.1.2
Published
Audio transcription library with provider support and auto-splitting
Readme
@wovin/tranz
Audio transcription library with provider support and auto-splitting for long audio files.
Features
- Multiple Transcription Providers: Mistral Voxtral, Whisper, GreenPT
- Automatic Audio Splitting: Handles long audio files by intelligently splitting at silence points
- Smart Input Support: Files, URLs (with HTTP range probing), or buffers
- Speaker Diarization: Identify different speakers in audio
- Flexible Timestamps: Word-level or segment-level timing
- Result Merging: Automatically merge split segment results with accurate timing
Installation
# npm
npm install @wovin/tranz
# pnpm
pnpm add @wovin/tranz
# yarn
yarn add @wovin/tranz
# deno
deno add @wovin/tranz
# jsr (for any runtime)
npx jsr add @wovin/tranzQuick Start
import { createMistralTranscriber } from '@wovin/tranz'
// Create a transcriber instance
const transcriber = createMistralTranscriber({
apiKey: process.env.MISTRAL_API_KEY,
model: 'voxtral-mini-latest'
})
// Transcribe from file (auto-splits if too long)
const result = await transcriber.transcribe({
audioPath: './interview.mp3',
diarize: true,
timestamps: 'word'
})
console.log(result.text)
console.log(result.words) // word-level timestamps
console.log(result.speakers) // speaker segmentsUsage Examples
Transcribe from URL
// Smart handling: probes duration via HTTP, downloads only if splitting needed
const result = await transcriber.transcribe({
audioUrl: 'https://example.com/audio.mp3'
})
// If you know the duration, skip detection
const result = await transcriber.transcribe({
audioUrl: 'https://example.com/audio.mp3',
duration: 120 // seconds
})Transcribe from Buffer
const audioBuffer = fs.readFileSync('./audio.mp3')
const result = await transcriber.transcribe({
audioBuffer,
mimeType: 'audio/mpeg'
})Control Auto-Splitting
// Disable auto-split (use for short audio)
const result = await transcriber.transcribe({
audioPath: './short-clip.mp3',
autoSplit: false
})
// Specify custom split output directory
const result = await transcriber.transcribe({
audioPath: './long-audio.mp3',
splitOutputDir: './segments'
})Language Specification
// Note: setting language disables word-level timestamps for Mistral
const result = await transcriber.transcribe({
audioPath: './french-audio.mp3',
language: 'fr',
timestamps: 'segment'
})Custom Logging
const result = await transcriber.transcribe({
audioPath: './audio.mp3',
logger: {
info: (msg) => console.log(`[INFO] ${msg}`),
warn: (msg) => console.warn(`[WARN] ${msg}`),
debug: (msg) => console.debug(`[DEBUG] ${msg}`)
},
verbose: true // promotes debug logs to info level
})Advanced: Using Providers Directly
import { MistralProvider, WhisperProvider } from '@wovin/tranz/providers'
// Mistral provider
const mistral = new MistralProvider()
const result = await mistral.transcribe({
audioPath: './audio.mp3',
apiKey: process.env.MISTRAL_API_KEY,
model: 'voxtral-mini-latest',
diarize: true,
timestampGranularity: 'word'
})
// Whisper provider (local)
const whisper = new WhisperProvider()
const result = await whisper.transcribe({
audioPath: './audio.mp3',
model: 'base'
})Advanced: Audio Utilities
import {
autoSplitAudio,
getAudioDuration,
mergeTranscriptionResults
} from '@wovin/tranz/audio'
// Get audio duration
const duration = await getAudioDuration('./audio.mp3')
// Split long audio at optimal silence points
const segments = await autoSplitAudio('./long-audio.mp3', './output-dir', {
maxDurationSec: 300, // 5 minutes
minSilenceDuration: 0.5,
silenceThreshold: -40
})
// Manually transcribe and merge segments
const results = await Promise.all(
segments.map(seg => transcribe(seg.outputPath))
)
const merged = mergeTranscriptionResults(results, segments)API Reference
createMistralTranscriber(config)
Creates a Mistral transcriber instance with auto-splitting support.
Config:
apiKey: string- Mistral API key (required)model?: string- Model name (default: 'voxtral-mini-latest')
Returns: MistralTranscriber with transcribe(options) method
TranscribeOptions
Options for the transcribe() method:
audioPath?: string- Path to audio fileaudioBuffer?: Buffer- Audio data as buffermimeType?: string- MIME type for buffer (auto-detected if omitted)audioUrl?: string- URL to audio file (supports HTTP range probing)duration?: number- Known duration in seconds (skips detection)language?: string- Language code (e.g., 'en', 'fr') - disables word timestampsmodel?: string- Override default modeldiarize?: boolean- Enable speaker diarization (default: true)timestamps?: 'word' | 'segment'- Timestamp granularity (default: 'word')autoSplit?: boolean- Auto-split long audio (default: true)splitOutputDir?: string- Directory for split segments (default: system temp)logger?: TranscribeLogger- Custom loggerverbose?: boolean- Enable debug logging
TranscriptionResult
Result from transcription:
text: string- Full transcription textduration?: number- Audio duration in secondslanguage?: string- Detected or specified languagewords?: WordData[]- Word-level timestamps and confidencespeakers?: SpeakerSegment[]- Speaker diarization dataerror?: string- Error message if transcription failed
MergedTranscriptionResult
Extended result for multi-segment transcriptions:
- All fields from
TranscriptionResult totalSegments: number- Number of segments mergedsegments?: TranscriptionResult[]- Individual segment results
Providers
Mistral (Voxtral)
- Models:
voxtral-mini-latest,voxtral-large-latest - Max recommended duration: 300s (5 minutes)
- Auto-split supported: Yes
- Speaker diarization: Yes
- Word timestamps: Yes (unless language specified)
Whisper (Local)
- Requires local Whisper installation
- Models:
tiny,base,small,medium,large - No API key required
GreenPT
- API-based transcription
- Requires
GREENPT_API_KEY
License
AGPL-3.0-or-later
