media-transcriber
v1.0.2
Published
Batch transcribe audio/video files using pluggable AI backends (Whisper, OpenAI API, and more)
Downloads
272
Maintainers
Readme
media-transcriber
Transcribe audio and video files to text and subtitles using pluggable AI backends, including local Whisper and the OpenAI Whisper API.
Quick Start
# Run directly with npx (no install)
npx media-transcriber doctor
# Or install globally
npm install -g media-transcriber
# Check your machine
media-transcriber doctor
# Transcribe a single file
media-transcriber transcribe ./meeting.mp4
# Transcribe a folder
media-transcriber transcribe ./data/input ./data/outputRequirements
- Node.js >= 18
- FFmpeg and ffprobe in PATH
- Windows:
winget install ffmpegorscoop install ffmpeg - macOS:
brew install ffmpeg - Linux:
sudo apt install ffmpeg
- Windows:
Backend requirements:
whisper-local: A local Whisper installation available on your system. This can be managed with uv,pip, or another Python environment setup.whisper-api: OpenAI API key (OPENAI_API_KEYenv var or--api-key)
Usage
Transcribe
# Single file, output written next to the input file
media-transcriber transcribe ./recordings/interview.mp4
# Single file, explicit output folder
media-transcriber transcribe ./recordings/interview.mp4 ./transcripts
# Folder input
media-transcriber transcribe ./recordings ./transcripts
# Model and device
media-transcriber transcribe ./data/input ./data/output -m medium -d cpu
# OpenAI API backend
media-transcriber transcribe ./data/input ./data/output -b whisper-api --api-key <key>
# Split long files before transcription
media-transcriber transcribe ./data/input ./data/output --split-threshold 900
# Enable audio enhancement
media-transcriber transcribe ./data/input ./data/output --enhance-audio
# Keep temporary files in ./data/output/temp
media-transcriber transcribe ./data/input ./data/output --keep-temp
# Output only subtitles
media-transcriber transcribe ./data/input ./data/output -f srt
# JSON output for agents
media-transcriber transcribe ./data/input ./data/output --jsonDoctor
Check system dependencies and backend availability:
media-transcriber doctorExecution Options
All parameters are passed at execution time (stateless CLI).
| Field | Type | Default | Description |
| ---- | ---- | ---- | ---- |
| input | string | required | Input file or folder |
| output | string | optional for files, required for folders | Output folder |
| backend | string | whisper-local | Transcription backend |
| whisperModel | string | large-v2 | Model name |
| device | cuda or cpu | cuda | Processing device |
| maxDurationSeconds | number | 1200 | Split files longer than this threshold |
| enableAudioEnhancement | boolean | false | Enable enhancement filters |
| keepIntermediateFiles | boolean | false | Keep temp files with --keep-temp |
| tempFolder | string | <outputFolder>/temp | Temp working folder |
| outputFormats | txt, srt, or both | txt,srt | Output transcript formats |
| openaiApiKey | string | env/flag | API key for OpenAI backend |
Commands
transcribe <input> [output]
- Accepts either a single file or a directory.
- For a single input file,
[output]is optional. If omitted, output files are written next to the source file. - For a directory input,
[output]is required.
Options:
-m, --model <name>: Whisper model name-d, --device <type>: Processing device, typicallycudaorcpu-b, --backend <name>: Transcription backend--split-threshold <seconds>: Split files longer than this duration before transcription--enhance-audio: Apply audio enhancement before transcription--keep-temp: Keep intermediate files in the temp folder--api-key <key>: API key for API-based backends-f, --format <formats>: Comma-separated output formats, such astxt,srt, ortxt,srt--json: Emit machine-readable output to stdout and NDJSON progress events to stderr
doctor
- Checks
ffmpegandffprobe - Shows system information
- Checks registered backends and reports availability
- Exits with a non-zero code when required dependencies are missing
AI Agent Integration
Structured JSON
media-transcriber transcribe ./data/input ./data/output --json 2>/dev/nullProgress Events (stderr)
In --json mode, NDJSON progress events are emitted to stderr.
Example:
media-transcriber transcribe ./data/input ./data/output --json 1>result.jsonExit Codes
| Code | Meaning |
| ---- | ------- |
| 0 | Success (all files transcribed) |
| 1 | General error |
| 2 | Missing dependency |
| 3 | Configuration/argument error |
| 4 | No input files found |
| 10 | Partial success |
Supported Formats
Input: .m4a, .mp3, .mp4, .mkv, .wav, .flac, .ogg, .webm
Output: .txt, .srt
Development
npm install
npm run dev
npm run typecheck
npm test
npm run buildDevelopment workflow:
# Terminal 1: rebuild dist/ on changes
npm run dev
# Terminal 2: run the built CLI
node dist/index.js --help
node dist/index.js doctor
node dist/index.js transcribe ./test/input ./test/outputAvailable scripts:
npm run dev # tsup --watch
npm run typecheck
npm test
npm run buildLicense
MIT
