transcribe-cli

v2.0.2

Published

2 months ago

Local audio/video transcription with speaker diarization and live audio support. No API keys. Powered by faster-whisper.

Downloads

1,647

0High
0Medium
0Low

robit

transcription whisper speech-to-text audio video diarization live-audio streaming local offline faster-whisper cli

transcribe-cli

Local audio/video transcription with speaker diarization. No API keys. No cloud. One command.

Quickstart

npm (Node.js API + CLI):

npm install transcribe-cli

Shell (standalone CLI):

curl -sSL https://raw.githubusercontent.com/robit-man/transcribe-cli/main/install.sh | bash

Then:

transcribe audio.mp3
transcribe meeting.wav --model medium --diarize --format json
transcribe batch ./recordings --recursive --format srt

Features

100% Local — Runs on your machine via faster-whisper (CTranslate2). No API keys, no cloud, no data leaves your system.
Speaker Diarization — Identify who said what with --diarize (via pyannote.audio)
Word-Level Timestamps — Precise per-word timing with --word-timestamps
Live Audio Streaming — Real-time transcription via Node.js streams
4 Output Formats — txt, srt (with speaker labels), vtt (with W3C voice tags), json (full metadata)
Audio + Video — MP3, WAV, FLAC, AAC, M4A, OGG, WMA, MP4, MKV, AVI, MOV, WebM, FLV
Batch Processing — Process entire directories with configurable concurrency
5 Model Sizes — tiny, base, small, medium, large-v3 (auto-downloads on first use)
Auto Audio Extraction — Videos are automatically handled via FFmpeg
Dual Interface — Use as CLI tool or Node.js API with full TypeScript types
Cross-Platform — Linux and macOS

Requirements

Python 3.9+
FFmpeg 4.0+
~1 GB disk (for base model; large-v3 needs ~3 GB)

The install script handles all dependencies automatically.

Installation

One-Line Install (Recommended)

curl -sSL https://raw.githubusercontent.com/robit-man/transcribe-cli/main/install.sh | bash

This will:

Install system dependencies (Python, FFmpeg, git) if missing
Clone the repository to ~/.local/share/transcribe-cli
Create a Python virtual environment with all packages
Pre-download the default Whisper model (base)
Create transcribe and transcribe-cli commands in ~/.local/bin
Add ~/.local/bin to your PATH if needed

Environment variables (optional):

TRANSCRIBE_INSTALL_DIR — Custom install location (default: ~/.local/share/transcribe-cli)
TRANSCRIBE_MODEL — Model to pre-download (default: base)

Manual Install

git clone https://github.com/robit-man/transcribe-cli.git
cd transcribe-cli
python3 -m venv .venv
source .venv/bin/activate
pip install -e .

With Speaker Diarization

pip install -e ".[diarization]"

Usage

Transcribe a Single File

transcribe audio.mp3
transcribe video.mkv --format srt
transcribe recording.wav --output-dir ./transcripts
transcribe lecture.mp3 --model medium --language en

Speaker Diarization

transcribe meeting.wav --diarize --format srt
transcribe interview.mp3 --diarize --format json
transcribe podcast.mp3 --diarize --word-timestamps --format vtt

Batch Processing

transcribe batch ./recordings
transcribe batch ./videos --format srt --concurrency 3
transcribe batch ./media --recursive --dry-run
transcribe batch ./meetings --model medium --diarize --format json

Audio Extraction

transcribe extract video.mkv
transcribe extract video.mp4 --output audio.mp3
transcribe extract video.avi --format wav

Configuration

transcribe config --show        # Show current settings
transcribe config --init        # Create transcribe.toml in current directory
transcribe config --locations   # Show config file search paths

Dependency Check

transcribe setup --check

Node.js API

Install

npm install transcribe-cli

On install, the package automatically:

Creates a Python virtual environment
Installs faster-whisper and all Python dependencies
Downloads the default Whisper model (base)

Set TRANSCRIBE_VERBOSE=1 to see setup progress. Set TRANSCRIBE_MODEL=medium to pre-download a different model.

File Transcription

const { transcribe, transcribeBatch, shutdownBridge } = require('transcribe-cli');

// Transcribe a single file
const result = await transcribe('meeting.mp3', {
  model: 'base',           // tiny, base, small, medium, large-v3
  diarize: true,           // speaker identification
  wordTimestamps: true,    // per-word timing
  format: 'json',          // txt, srt, vtt, json
  language: 'auto',        // or 'en', 'es', etc.
});

console.log(result.text);
console.log(result.speakers);    // ['SPEAKER_00', 'SPEAKER_01']
console.log(result.segments);    // [{id, start, end, text, speaker, words}]

// Batch transcribe a directory
const batch = await transcribeBatch('./recordings', {
  recursive: true,
  concurrency: 3,
  format: 'srt',
});
console.log(`${batch.successful}/${batch.totalFiles} files transcribed`);

// Clean up when done
await shutdownBridge();

Live Audio Streaming

const { TranscribeLive } = require('transcribe-cli');

const live = new TranscribeLive({
  model: 'base',
  sampleRate: 16000,    // Hz
  channels: 1,          // mono
  sampleWidth: 2,       // 16-bit
  chunkDuration: 5,     // seconds per chunk
  wordTimestamps: true,
});

live.on('ready', () => {
  console.log('Model loaded, streaming...');
});

live.on('transcript', (event) => {
  console.log(`[${event.isFinal ? 'FINAL' : 'partial'}] ${event.text}`);
  // event.segments has full timing + speaker info
});

// Feed raw PCM audio buffers
live.write(pcmBuffer);

// Or pipe from any readable stream (microphone, file, etc.)
audioSource.pipe(live.stream);

// Finish and flush remaining audio
await live.finish();

TypeScript

Full type definitions included:

import { transcribe, TranscribeLive, TranscriptionResult, LiveTranscriptEvent } from 'transcribe-cli';

const result: TranscriptionResult = await transcribe('audio.mp3', { diarize: true });

CLI Reference

`transcribe <file> [OPTIONS]`

| Option | Short | Description | Default | |--------|-------|-------------|---------| | --output-dir | -o | Output directory | Current dir | | --format | -f | Output format: txt, srt, vtt, json | txt | | --language | -l | Language code or auto | auto | | --model | -m | Model: tiny, base, small, medium, large-v3 | base | | --diarize | | Enable speaker diarization | Off | | --word-timestamps | | Enable word-level timestamps | Off | | --verbose | | Verbose output | Off |

`transcribe batch <directory> [OPTIONS]`

All options from transcribe plus:

| Option | Short | Description | Default | |--------|-------|-------------|---------| | --concurrency | -c | Max concurrent jobs (1-20) | 5 | | --recursive | -r | Scan subdirectories | Off | | --dry-run | | Preview files without processing | Off |

`transcribe extract <file> [OPTIONS]`

| Option | Short | Description | Default | |--------|-------|-------------|---------| | --output | -o | Output file path | Auto-generated | | --format | -f | Audio format: mp3, wav | mp3 |

Output Formats

TXT — Plain text

Hello, welcome to the meeting. Today we'll discuss the quarterly results.

SRT — SubRip subtitles (with speaker labels when diarized)

1
00:00:00,000 --> 00:00:03,500
[SPEAKER_00] Hello, welcome to the meeting.

2
00:00:03,500 --> 00:00:07,200
[SPEAKER_01] Thanks for having me.

VTT — WebVTT (with W3C voice tags when diarized)

WEBVTT

00:00:00.000 --> 00:00:03.500
<v SPEAKER_00>Hello, welcome to the meeting.</v>

00:00:03.500 --> 00:00:07.200
<v SPEAKER_01>Thanks for having me.</v>

JSON — Full metadata

{
  "text": "Hello, welcome to the meeting...",
  "language": "en",
  "duration": 120.5,
  "speakers": ["SPEAKER_00", "SPEAKER_01"],
  "segments": [
    {
      "id": 0,
      "start": 0.0,
      "end": 3.5,
      "text": "Hello, welcome to the meeting.",
      "speaker": "SPEAKER_00",
      "words": [
        {"word": "Hello,", "start": 0.1, "end": 0.5},
        {"word": "welcome", "start": 0.6, "end": 1.0}
      ]
    }
  ]
}

Configuration File

Create with transcribe config --init:

[output]
format = "txt"

[processing]
concurrency = 5
language = "auto"
recursive = false

[model]
size = "base"
device = "auto"
compute_type = "auto"

[features]
diarize = false
word_timestamps = false

Config files are searched in order:

./transcribe.toml
./.transcriberc
~/.config/transcribe/config.toml
~/.transcriberc

Environment Variables

| Variable | Description | Default | |----------|-------------|---------| | TRANSCRIBE_MODEL_SIZE | Whisper model size | base | | TRANSCRIBE_DEVICE | Compute device (auto/cpu/cuda) | auto | | TRANSCRIBE_COMPUTE_TYPE | Compute type (auto/int8/float16/float32) | auto | | TRANSCRIBE_CONCURRENCY | Max concurrent batch jobs | 5 | | TRANSCRIBE_LANGUAGE | Default language | auto | | TRANSCRIBE_DIARIZE | Enable diarization by default | false | | TRANSCRIBE_WORD_TIMESTAMPS | Enable word timestamps by default | false |

Model Sizes

| Model | Size | English | Multilingual | Speed | |-------|------|---------|-------------|-------| | tiny | ~75 MB | Good | Fair | Fastest | | base | ~150 MB | Better | Good | Fast | | small | ~500 MB | Great | Great | Moderate | | medium | ~1.5 GB | Excellent | Excellent | Slower | | large-v3 | ~3 GB | Best | Best | Slowest |

Models are auto-downloaded on first use and cached locally.

Development

git clone https://github.com/robit-man/transcribe-cli.git
cd transcribe-cli
python3 -m venv .venv
source .venv/bin/activate
pip install -e ".[dev]"

# Run tests
pytest

# Run tests without coverage
pytest tests/unit/ -v --no-cov

Uninstall

rm -rf ~/.local/share/transcribe-cli
rm -f ~/.local/bin/transcribe ~/.local/bin/transcribe-cli

License

MIT

Acknowledgments

faster-whisper — CTranslate2 Whisper implementation
pyannote.audio — Speaker diarization
FFmpeg — Audio/video processing
Typer — CLI framework
Rich — Terminal formatting

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

transcribe-cli

Quickstart

Features

Requirements

Installation

One-Line Install (Recommended)

Manual Install

With Speaker Diarization

Usage

Transcribe a Single File

Speaker Diarization

Batch Processing

Audio Extraction

Configuration

Dependency Check

Node.js API

Install

File Transcription

Live Audio Streaming

TypeScript

CLI Reference

transcribe <file> [OPTIONS]

transcribe batch <directory> [OPTIONS]

transcribe extract <file> [OPTIONS]

Output Formats

TXT — Plain text

SRT — SubRip subtitles (with speaker labels when diarized)

VTT — WebVTT (with W3C voice tags when diarized)

JSON — Full metadata

Configuration File

Environment Variables

Model Sizes

Development

Uninstall

License

Acknowledgments

`transcribe <file> [OPTIONS]`

`transcribe batch <directory> [OPTIONS]`

`transcribe extract <file> [OPTIONS]`