assistvideo
v0.1.7
Published
CLI that downloads video/audio from a URL (YouTube today, more coming) and transcribes to markdown using local whisper.cpp. Drop the URL, get an MP3, MP4, or transcript — in your current folder.
Maintainers
Readme
assistvideo
CLI that downloads video/audio from a URL (YouTube today, more coming) and transcribes to markdown using local whisper.cpp. Drop the URL, get an MP3, MP4, or transcript — in your current folder.
The problem
Grabbing a YouTube video as an MP3 means remembering the right yt-dlp incantation. Pulling an MP4 means a different one. Turning the audio into a readable transcript means wiring up ffmpeg, a whisper model, and a markdown writer yourself. Every time.
assistvideo collapses that into three subcommands. Paste the URL, pick what you want, and the artefact lands in your current folder — with sensible metadata embedded in MP3s, a YAML-fronted markdown file for transcripts, and zero cloud API keys required.
What this unlocks
For note-taking
assistvideo transcribe <url> gives you an Obsidian-ready markdown file with frontmatter (title, uploader, source URL, upload date, duration, whisper model). Drop it straight into a vault — the YAML is already queryable, the # heading is the video title, and the source link is clickable.
For audio archives
assistvideo audio <url> produces a tagged MP3 — title, artist, album, thumbnail as album art, release date, and a .info.json sidecar with the full yt-dlp metadata blob. Drag it into Music / Plex / Jellyfin and the metadata is already there.
For offline viewing
assistvideo video <url> produces a best-quality MP4 (yt-dlp picks the best video + best audio streams and muxes them). No quality ladder flags to remember.
For privacy
Transcription runs locally via whisper.cpp bindings — no API keys, no upload, no third-party service seeing your URL. The whisper model is downloaded once and cached.
Quick start
cd wherever-you-want-the-output
npx -y assistvideo transcribe "https://youtu.be/dQw4w9WgXcQ"That's it. You now have:
assistvideo/audio/<title> [id].mp3— the audio, MP3 with embedded metadata and thumbnailassistvideo/transcript/<title> [id].md— the transcript, YAML-fronted markdown
# Just the audio (MP3 + metadata)
npx -y assistvideo audio "https://youtu.be/dQw4w9WgXcQ"
# Just the video (best-quality MP4)
npx -y assistvideo video "https://youtu.be/dQw4w9WgXcQ"
# Full pipeline: MP3 + markdown transcript
npx -y assistvideo transcribe "https://youtu.be/dQw4w9WgXcQ"Install
The fastest way is npx — no install needed. Or install a persistent global binary:
# Node / npm
npm install -g assistvideo
# Bun
bun install -g assistvideoOnce installed globally, drop the npx -y prefix from every command.
System dependencies
assistvideo shells out to yt-dlp, ffmpeg, and (for transcribe) cmake + make. Install them in one shot:
# Auto-installs via Homebrew on macOS; prints guidance on Linux/Windows
assistvideo installOn macOS this runs brew install yt-dlp ffmpeg cmake for you (Homebrew must already be installed — see https://brew.sh). On Linux/Windows it prints the matching apt / dnf / pacman / winget / choco commands so you can install manually.
Quote your URLs. YouTube URLs contain
&,?, and=, which zsh/bash will otherwise interpret. Always wrap the URL in double quotes:# ✅ Good assistvideo audio "https://www.youtube.com/watch?v=dQw4w9WgXcQ&t=60s" # ❌ zsh: parse error near `&' assistvideo audio https://www.youtube.com/watch?v=dQw4w9WgXcQ&t=60s
Features
- Three commands, one pattern:
audio | video | transcribe <url>— no flag soup - Auto-detected source: the content type is inferred from the URL (YouTube only today; see FEATURE-ROADMAP.md)
- CWD-relative output: artefacts always land in
./assistvideo/— run it anywhere - Rich MP3 metadata: title, artist, album, thumbnail, release date embedded via yt-dlp
- Local transcription: whisper.cpp runs on CPU, no API keys, no uploads
- YAML-fronted markdown: transcripts are ready for Obsidian / any markdown tool
- Single-file bundle: ships as one
dist/cli.js— fast install, no build step on the user's side
Everything important (the yt-dlp call, the whisper invocation, the output folder) is deterministic and offline-capable once the model is cached.
Requirements
- Node 18+ or Bun 1.0+
yt-dlpon PATH —brew install yt-dlpffmpegon PATH —brew install ffmpegcmake+makeon PATH (only needed the first time you runtranscribe—nodejs-whisperbuilds whisper.cpp on first use)
Run assistvideo install to have the CLI install these for you on macOS.
CLI reference
assistvideo audio <url> Download audio as MP3 with embedded metadata + thumbnail
assistvideo video <url> Download best-quality MP4 (bestvideo+bestaudio, merged)
assistvideo transcribe <url> Download audio + produce a markdown transcript
assistvideo install Install system deps (yt-dlp, ffmpeg, cmake) — auto on macOS, guidance elsewhere
assistvideo --help Full help
assistvideo --version VersionOutput layout
<cwd>/
└── assistvideo/ <- generated, add to .gitignore
├── video/
│ └── <title> [id].mp4
├── audio/
│ ├── <title> [id].mp3 <- embedded ID3 + album art
│ ├── <title> [id].info.json <- full yt-dlp metadata
│ └── <title> [id].jpg <- thumbnail
└── transcript/
└── <title> [id].md <- YAML frontmatter + transcriptTranscript markdown format
---
title: "<video title>"
uploader: "<channel>"
source: https://youtu.be/...
type: youtube
duration: 00:12:34
uploadDate: 2025-08-14
transcribedAt: 2026-04-21T12:00:00Z
model: base.en
---
# <video title>
<full transcript, whisper base.en, timestamps stripped>Architecture
src/
├── features/
│ └── youtube/
│ ├── youtube.ts <- orchestrator: dispatches per subcommand
│ └── metadata.ts <- yt-dlp --print-json --no-download
├── shared/
│ ├── ytdlp.ts <- Bun.spawn wrapper around yt-dlp binary
│ ├── transcribe.ts <- nodejs-whisper wrapper, MD writer
│ ├── logger.ts <- chalk-themed logger
│ └── paths.ts <- CWD-relative output/* helpers
├── core/
│ ├── types.ts <- ContentType, OutputType unions
│ └── detect-type.ts <- URL -> ContentType
└── index.ts <- commander CLI entryA single bun build produces dist/cli.js — one minified ESM file with nodejs-whisper marked external (it ships native whisper.cpp artefacts that must be installed at the user's machine, not bundled).
FAQ
Where do the files go?
Into ./assistvideo/ in your current working directory. Run it from the folder you want the artefacts in.
Does it upload my video anywhere? No. yt-dlp downloads directly from YouTube; whisper.cpp runs locally on CPU. Nothing is sent to any third-party API.
What whisper model does it use?
base.en by default (~150MB). Downloaded once on first transcribe run, cached for subsequent calls. A --model flag is on the roadmap.
The first transcribe run is slow — why?
Two first-run costs: nodejs-whisper compiles whisper.cpp via make, and the base.en model is downloaded. Both are cached; subsequent runs skip them.
Does it handle playlists or channels?
Not yet — pass one video URL at a time. A batch <file> command is on the roadmap.
Can I use my own yt-dlp config?
yt-dlp respects its own ~/.config/yt-dlp/config when invoked, so anything you've set globally (auth cookies, rate limits, proxy) applies automatically. assistvideo just calls the binary.
Does it work on non-YouTube URLs?
Not yet — the URL detector only recognises youtube.com / youtu.be at the moment. Since the underlying yt-dlp supports 1000+ sites, extending to vimeo, generic video, etc. is just a detector change. See FEATURE-ROADMAP.md.
Should I commit assistvideo/ to git?
No. Add assistvideo/ to your .gitignore. The artefacts are generated — re-run the CLI if you want them back.
Roadmap
See FEATURE-ROADMAP.md for planned content types (website, vimeo, audio, podcast, …), commands (metadata, subtitles, summary, batch, …) and options (--model, --language, --format, …).
License
MIT.
