@ilyavorobiev/transcribe
v0.2.0
Published
Offline transcription of iPhone voice memos on macOS — 3 engines (mlx/cpp/gigaam) optimized for Russian
Maintainers
Readme
Transcribe — offline voice-memo transcription for macOS
A macOS command-line tool that transcribes audio files — iPhone .m4a
voice memos by default — entirely on your machine. No cloud, no API
keys, no network calls during transcription. Three transcription
engines are unified behind a single CLI: mlx-whisper (default;
Apple Silicon native), whisper.cpp (offline-strict, no Python
runtime), and Sber GigaAM-v3 (Russian-only opt-in). Optimized for
Russian out of the box; multilingual support via Whisper for ~99
languages.
Built for a specific personal workflow (long Russian voice memos recorded on iPhone, often with English tech vocabulary mixed in), then published as OSS in case it's useful to anyone else. Bug reports and PRs are welcome — see CONTRIBUTING.md.
bun add -g @ilyavorobiev/transcribe
transcribe setup # one-time, ~6 GB, ~8 min (mlx + antony66)
transcribe memo.m4a # produces memo.txt next to memo.m4aThe minimal default install gets you Russian transcription with the
mlx engine. Other engines (whisper.cpp for offline-strict, gigaam
for an opt-in 2nd-opinion) are one flag away — see Disk
footprint below. If you run transcribe <file>
--engine gigaam without gigaam installed, the CLI prompts you to
install it in place.
Why three engines
Different recordings sound their best on different models. transcribe
ships all three so you can switch with one flag.
| Recording type | Engine + model | One-liner |
| ------------------------------------------- | ------------------------------------- | ---------------------------------------------------- |
| Russian (with or without tech acronyms) | mlx + antony66-russian (default) | transcribe memo.m4a |
| Russian, heavy ru+en code-switching | mlx + bond005-turbo | transcribe memo.m4a --model bond005-turbo |
| English / German / French / Japanese / … | mlx + large-v3 (default for --language≠ru) | transcribe memo.m4a --language en |
| Russian, second-opinion for LLM-vote | gigaam + gigaam-v3 | transcribe memo.m4a --engine gigaam |
| Offline-strict / version-pinned / no Python | cpp + large-v3 | transcribe memo.m4a --engine cpp |
Defaults are language-aware for the mlx engine: --language ru →
antony66/whisper-large-v3-russian,
otherwise stock multilingual large-v3. Engine and language are
independent — there is no automatic engine selection based on language
(see specs/gigaam/spec.md field findings for
the reasoning).
Supported platforms
- macOS Apple Silicon (M-series) — fully supported. All three engines.
- macOS Intel — cpp engine works. mlx + gigaam require Apple Silicon for usable performance.
- Linux / Windows — not supported in v0.x. The
osfield inpackage.jsonblocks install with a clear error.
Disk footprint
Default install is the smallest useful set: mlx engine + antony66
Russian fine-tune. ~6 GB, ~8 min. Add opt-in components with
--with:
transcribe setup --with cpp # add whisper.cpp (~+4 GB)
transcribe setup --with gigaam # add GigaAM (~+4 GB)
transcribe setup --with bond005 # add bond005 ru+en fine-tune (~+4 GB)
transcribe setup --with cpp --with gigaam # combine
transcribe setup --full # everything (~20 GB)Per-engine single-shot installs (for narrow setups):
transcribe setup:mlx # only the mlx engine + Russian fine-tune
transcribe setup:cpp # only whisper.cpp + ggml-large-v3 + VADWhen you run a command whose engine isn't installed, the CLI prompts
you to install it in place (TTY only — scripts get a fail-fast error
with the install command). Disable the prompt with --no-auto-install
or TRANSCRIBE_AUTO_INSTALL=0.
To wipe and re-install something (corrupted download, transformers version drift, etc.):
transcribe reinstall # the minimal default set
transcribe reinstall antony66-russian # one model
transcribe reinstall gigaam # one engine
transcribe reinstall --all # everything currently installedModels and the whisper.cpp build live under ~/Library/Caches/transcribe/
by default (override with TRANSCRIBE_CACHE_DIR), so a bun update -g
doesn't wipe gigabytes of downloads. Conversion intermediates (the -hf
source dirs and whisper.cpp CMake objects) are auto-cleaned after a
strict sanity check — opt out with TRANSCRIBE_KEEP_HF=1 /
TRANSCRIBE_KEEP_BUILD=1 if you're debugging.
The legacy
--no-cpp,--no-mlx,--no-gigaam,--no-bond005flags are still honored (with a deprecation warning) and will be removed in1.0.0. Use--withand--fullinstead.
Usage
transcribe <file.m4a> [options]
Options:
--engine <name> mlx | cpp | gigaam (default: mlx)
--model <name> engine-specific alias or HuggingFace repo id
--format <fmt> txt | srt | vtt | json | all (default: txt)
(gigaam v1: txt | json only)
--output <file> output file path (default: <input-stem>.<ext>)
--language <code> ISO language code (default: ru)
--prompt <text> initial prompt (mlx/cpp — vocabulary biasing)
--threads <n> decoder threads (cpp only)
--keep-wav retain the intermediate 16kHz WAV (cpp only)
--auto-install prompt to install missing engine if not ready (default on TTY)
--no-auto-install fail-fast on missing engine (default in scripts / non-TTY)
-h, --help show this help
transcribe setup [--with cpp|--with gigaam|--with bond005|--full]
default: minimal (mlx + antony66)
transcribe setup --clean remove intermediary files only
transcribe setup --force re-download even if present
transcribe setup:mlx | setup:cpp single-engine installs
transcribe reinstall [<name>|--all] wipe + reinstall
transcribe --versionVocabulary biasing example (helpful for tech / domain-specific acronyms):
transcribe memo.m4a --prompt "Обсуждаем MCP, API, latency, RAG, GPT-4."Environment variables
| Variable | Effect |
| ------------------------------ | ----------------------------------------------------------------------- |
| TRANSCRIBE_CACHE_DIR | Root for models + the whisper.cpp build (default ~/Library/Caches/transcribe/). Hard override: disables the local-dev fallback to <repo>/{models,vendor}. |
| XDG_CACHE_HOME | If set, default cache becomes $XDG_CACHE_HOME/transcribe. |
| WHISPER_BIN | Explicit override for the whisper-cli binary path (cpp engine). |
| WHISPER_MODEL_DIR | Explicit override for the ggml-<name>.bin lookup (cpp engine). |
| TRANSCRIBE_AUTO_INSTALL | 0/false/no → never prompt to install (fail-fast). 1/true/yes → prompt iff TTY. Default = prompt iff TTY. |
| TRANSCRIBE_KEEP_HF | If truthy, keep the -hf source directory after MLX conversion (saves ~3 GB normally; useful when debugging a bad conversion). |
| TRANSCRIBE_KEEP_BUILD | If truthy, keep whisper.cpp build/CMakeFiles/ after a successful build (saves ~200 MB normally; useful when iterating on whisper.cpp). |
| TRANSCRIBE_SKIP_POSTINSTALL | If truthy (1/true/yes), the postinstall banner is silent. |
| CI | If truthy, postinstall is skipped (avoids polluting CI logs). |
Troubleshooting
Model file not found or model 'antony66-russian' resolves to … but that directory doesn't exist
— You haven't run setup yet. Run transcribe setup (or the narrower
transcribe setup:mlx). If you previously installed and the dir got
clobbered, transcribe reinstall antony66-russian re-downloads it.
HF download stalls in CLOSE_WAIT during setup — known issue with
HuggingFace's parallel Python downloader on big multi-file Russian
repos. Setup uses plain curl to avoid it. If you see this from
something else (e.g. mlx_whisper auto-downloading a model you didn't
pre-fetch), re-run setup which downloads via the reliable path.
TypeError: ModelDimensions.__init__() got an unexpected keyword argument '_name_or_path'
— You're trying to point --model at an HF-format Whisper repo
(antony66, bond005) without converting it first. Use the alias
(--model antony66-russian) which resolves to the locally-converted
MLX directory. Setup runs the conversion automatically.
Too long wav file when using the gigaam engine — the model's
built-in transcribe() rejects files longer than ~30 s. We work around
this by chunking the file (30 s windows, 2 s overlap). If you hit this,
you're probably calling GigaAM directly outside this CLI — use
transcribe ... --engine gigaam which handles chunking.
transformers install error during gigaam setup — we pin
transformers>=4.40,<4.50 because 4.50 changed the meta-device init
path in a way that breaks GigaAM's trust_remote_code modeling. The pin
lives in scripts/gigaam_transcribe.py PEP 723 inline metadata.
Transcription in the wrong language — pass --language <code> (e.g.
en, de, ja); the default is ru. When using --engine gigaam,
non-Russian language produces gibberish (the model is Russian-only); the
CLI warns about this.
How it works
- mlx engine spawns
mlx_whisper(from auv tool installvenv) with anti-hallucination tuning (--temperature 0,--condition-on-previous-text False,--no-speech-threshold 0.5). - cpp engine spawns
whisper-clibuilt from whisper.cpp source with Metal, plus the Silero VAD model and tuned anti-hallucination flags (-mc 0 -bs 5 -et 2.6 -fa --suppress-nst). - gigaam engine spawns
scripts/gigaam_transcribe.pyviauv run(PEP 723 inline deps). The script chunks audio into 30 s windows with 2 s overlap and stitches the output (avoids the model's hardcoded 30 s limit without needing pyannote /HF_TOKEN).
For Russian inputs the default mlx + antony66 produced the most
readable output in our benchmarks. gigaam is opt-in because on real
recordings it didn't measurably beat antony66 (similar acronym
preservation; single-paragraph output is less readable). See
specs/gigaam/spec.md for the full comparison.
Credits
This project is a thin wrapper around brilliant upstream work:
- whisper.cpp — Georgi Gerganov
- Whisper — OpenAI
- mlx-whisper — Apple ML Explore
- GigaAM-v3 — Sber AI
antony66/whisper-large-v3-russian— antony66 (HF)bond005/whisper-podlodka-turbo— bond005 (HF)- Silero VAD — Silero Team
- Bun, uv
License
MIT — see LICENSE.
