speak2text
v0.2.3
Published
speak2text CLI tool. Transcribe and translate audio and video files using OpenAI Whisper.
Maintainers
Readme
speak2text
speak2text CLI tool. Drop audio or video files into input/, run s2t transcribe, get transcripts in output/. Powered by OpenAI Whisper.
Installation
pnpm add -g speak2textRequirements
- Node.js 22+
- OpenAI API key (or Grok / Gemini)
- ffmpeg - only needed for unsupported formats (mkv, avi, flac, ogg...)
Install ffmpeg (if needed)
macOS
brew install ffmpegLinux (Debian/Ubuntu)
sudo apt install ffmpegWindows
winget install ffmpegConfiguration
Create ~/.speak2text/.env and add your API key:
OPENAI_API_KEY=your-key-hereOther providers:
GROK_API_KEY=your-key-here
GEMINI_API_KEY=your-key-hereSet default provider (OpenAI is default):
s2t config set provider openaiHow It Works
- Put any audio or video file in
input/ - Run
s2t transcribe - Transcripts appear in
output/
Files sent directly to OpenAI: mp3, mp4, m4a, wav, webm
Other formats (mkv, avi, flac, ogg...) are converted to mp3 via ffmpeg first. If ffmpeg cannot process the file, an error is shown.
Usage
Transcribe
s2t transcribeProcesses all files in input/, writes transcripts to output/.
$ s2t transcribe
Transcribing 1 file(s)...
chile_promo.mp3 - transcribing...
✓ output/chile_promo.txt [19dee66d]
Done.Output:
September 11, 1973, a military coup overthrows the government in Chile, ending the longest
democratic tradition in Latin America. It was a bloody, bloody coup.Transcribe a single file:
s2t transcribe path/to/audio.mp3Options
s2t transcribe --format srt
s2t transcribe --format json
s2t transcribe --language fi
s2t transcribe --translate fi
s2t transcribe --translate en
s2t transcribe --provider gemini
s2t transcribe --out my-transcript.txt
s2t transcribe --out transcripts/Translation
s2t transcribe --translate fi
s2t transcribe --translate en
s2t transcribe --translate sv--translate en uses Whisper's native translation endpoint - one call, fast and cheap.
--translate <other> transcribes first, then translates via GPT-4o-mini.
Translation requires OpenAI provider.
Example - English audio translated to Finnish:
$ s2t transcribe --translate fi
Transcribing 1 file(s)...
chile_promo.mp3 - transcribing...
✓ output/chile_promo.txt [acc2fb12]
Done.Output:
11. syyskuuta 1973 sotilasvallankaappaus kaataa hallituksen Chilessä, mikä päättää
Latinalaisen Amerikan pisimmän demokraattisen perinteen. Se oli verinen, verinen vallankaappaus.Manage transcripts
s2t list
s2t show <id>
s2t export <id> --format srt
s2t delete <id>
s2t delete --allDatabase
s2t db stats
s2t db list
s2t delete --alls2t db stats shows transcript count, database file size, provider breakdown, and oldest/newest dates.
s2t db list shows each source file that has been transcribed, with transcript count and last transcribed date.
s2t delete --all removes all stored transcripts.
List supported languages
s2t languagesOutput Formats
| Format | Description |
|--------|-------------|
| txt | Plain text (default) |
| srt | SRT subtitles with timestamps |
| json | Full JSON with timestamps, confidence, and metadata |
Providers
| Provider | Flag | Notes |
|----------|------|-------|
| OpenAI Whisper | --provider openai | Default. $0.006/min |
| Grok | --provider grok | OpenAI-compatible API |
| Gemini | --provider gemini | OpenAI-compatible API |
Storage
Transcripts are stored locally in SQLite:
- macOS/Linux:
~/.speak2text/transcripts.db - macOS/Linux config:
~/.speak2text/.env
Supported Languages
99 languages supported. Use the code with --language or --translate.
| Code | Language | Code | Language | Code | Language | |------|----------|------|----------|------|----------| | af | Afrikaans | hi | Hindi | pt | Portuguese | | am | Amharic | hr | Croatian | ro | Romanian | | ar | Arabic | ht | Haitian Creole | ru | Russian | | as | Assamese | hu | Hungarian | sa | Sanskrit | | az | Azerbaijani | haw | Hawaiian | sd | Sindhi | | ba | Bashkir | hy | Armenian | si | Sinhala | | be | Belarusian | id | Indonesian | sk | Slovak | | bg | Bulgarian | is | Icelandic | sl | Slovenian | | bn | Bengali | it | Italian | sn | Shona | | bo | Tibetan | ja | Japanese | so | Somali | | br | Breton | jw | Javanese | sq | Albanian | | bs | Bosnian | ka | Georgian | sr | Serbian | | ca | Catalan | kk | Kazakh | su | Sundanese | | cs | Czech | km | Khmer | sv | Swedish | | cy | Welsh | kn | Kannada | sw | Swahili | | da | Danish | ko | Korean | ta | Tamil | | de | German | la | Latin | te | Telugu | | el | Greek | lb | Luxembourgish | tg | Tajik | | en | English | ln | Lingala | th | Thai | | es | Spanish | lo | Lao | tk | Turkmen | | et | Estonian | lt | Lithuanian | tl | Tagalog | | eu | Basque | lv | Latvian | tr | Turkish | | fa | Persian | mg | Malagasy | tt | Tatar | | fi | Finnish | mi | Maori | uk | Ukrainian | | fo | Faroese | mk | Macedonian | ur | Urdu | | fr | French | ml | Malayalam | uz | Uzbek | | gl | Galician | mn | Mongolian | vi | Vietnamese | | gu | Gujarati | mr | Marathi | yi | Yiddish | | ha | Hausa | ms | Malay | yo | Yoruba | | he | Hebrew | mt | Maltese | yue | Cantonese | | nn | Nynorsk | my | Myanmar | zh | Chinese | | no | Norwegian | ne | Nepali | | | | oc | Occitan | nl | Dutch | | | | pa | Punjabi | ps | Pashto | | |
Tech Stack
| Tool | Purpose | |------|---------| | TypeScript | Language | | Commander | CLI framework | | better-sqlite3 | Local transcript history | | OpenAI Whisper API | Transcription | | ffmpeg | Converts unsupported formats before transcription (optional) | | Vitest | Testing | | Biome | Lint & format | | pnpm | Package manager |
License
See MIT
