assistvideo

v0.1.7

Published

10 days ago

CLI that downloads video/audio from a URL (YouTube today, more coming) and transcribes to markdown using local whisper.cpp. Drop the URL, get an MP3, MP4, or transcript — in your current folder.

0High
0Medium
0Low

codedeveloper

cli youtube video audio mp3 mp4 transcription transcribe whisper whisper.cpp yt-dlp markdown bun ffmpeg

assistvideo

CLI that downloads video/audio from a URL (YouTube today, more coming) and transcribes to markdown using local whisper.cpp. Drop the URL, get an MP3, MP4, or transcript — in your current folder.

The problem

Grabbing a YouTube video as an MP3 means remembering the right yt-dlp incantation. Pulling an MP4 means a different one. Turning the audio into a readable transcript means wiring up ffmpeg, a whisper model, and a markdown writer yourself. Every time.

assistvideo collapses that into three subcommands. Paste the URL, pick what you want, and the artefact lands in your current folder — with sensible metadata embedded in MP3s, a YAML-fronted markdown file for transcripts, and zero cloud API keys required.

What this unlocks

For note-taking

assistvideo transcribe <url> gives you an Obsidian-ready markdown file with frontmatter (title, uploader, source URL, upload date, duration, whisper model). Drop it straight into a vault — the YAML is already queryable, the # heading is the video title, and the source link is clickable.

For audio archives

assistvideo audio <url> produces a tagged MP3 — title, artist, album, thumbnail as album art, release date, and a .info.json sidecar with the full yt-dlp metadata blob. Drag it into Music / Plex / Jellyfin and the metadata is already there.

For offline viewing

assistvideo video <url> produces a best-quality MP4 (yt-dlp picks the best video + best audio streams and muxes them). No quality ladder flags to remember.

For privacy

Transcription runs locally via whisper.cpp bindings — no API keys, no upload, no third-party service seeing your URL. The whisper model is downloaded once and cached.

Quick start

cd wherever-you-want-the-output
npx -y assistvideo transcribe "https://youtu.be/dQw4w9WgXcQ"

That's it. You now have:

assistvideo/audio/<title> [id].mp3 — the audio, MP3 with embedded metadata and thumbnail
assistvideo/transcript/<title> [id].md — the transcript, YAML-fronted markdown

# Just the audio (MP3 + metadata)
npx -y assistvideo audio "https://youtu.be/dQw4w9WgXcQ"

# Just the video (best-quality MP4)
npx -y assistvideo video "https://youtu.be/dQw4w9WgXcQ"

# Full pipeline: MP3 + markdown transcript
npx -y assistvideo transcribe "https://youtu.be/dQw4w9WgXcQ"

Install

The fastest way is npx — no install needed. Or install a persistent global binary:

# Node / npm
npm install -g assistvideo

# Bun
bun install -g assistvideo

Once installed globally, drop the npx -y prefix from every command.

System dependencies

assistvideo shells out to yt-dlp, ffmpeg, and (for transcribe) cmake + make. Install them in one shot:

# Auto-installs via Homebrew on macOS; prints guidance on Linux/Windows
assistvideo install

On macOS this runs brew install yt-dlp ffmpeg cmake for you (Homebrew must already be installed — see https://brew.sh). On Linux/Windows it prints the matching apt / dnf / pacman / winget / choco commands so you can install manually.

Quote your URLs. YouTube URLs contain &, ?, and =, which zsh/bash will otherwise interpret. Always wrap the URL in double quotes:
# ✅ Good
assistvideo audio "https://www.youtube.com/watch?v=dQw4w9WgXcQ&t=60s"

# ❌ zsh: parse error near `&'
assistvideo audio https://www.youtube.com/watch?v=dQw4w9WgXcQ&t=60s

Features

Three commands, one pattern: audio | video | transcribe <url> — no flag soup
Auto-detected source: the content type is inferred from the URL (YouTube only today; see FEATURE-ROADMAP.md)
CWD-relative output: artefacts always land in ./assistvideo/ — run it anywhere
Rich MP3 metadata: title, artist, album, thumbnail, release date embedded via yt-dlp
Local transcription: whisper.cpp runs on CPU, no API keys, no uploads
YAML-fronted markdown: transcripts are ready for Obsidian / any markdown tool
Single-file bundle: ships as one dist/cli.js — fast install, no build step on the user's side

Everything important (the yt-dlp call, the whisper invocation, the output folder) is deterministic and offline-capable once the model is cached.

Requirements

Node 18+ or Bun 1.0+
yt-dlp on PATH — brew install yt-dlp
ffmpeg on PATH — brew install ffmpeg
cmake + make on PATH (only needed the first time you run transcribe — nodejs-whisper builds whisper.cpp on first use)

Run assistvideo install to have the CLI install these for you on macOS.

CLI reference

assistvideo audio <url>         Download audio as MP3 with embedded metadata + thumbnail
assistvideo video <url>         Download best-quality MP4 (bestvideo+bestaudio, merged)
assistvideo transcribe <url>    Download audio + produce a markdown transcript
assistvideo install             Install system deps (yt-dlp, ffmpeg, cmake) — auto on macOS, guidance elsewhere
assistvideo --help              Full help
assistvideo --version           Version

Output layout

<cwd>/
└── assistvideo/                              <- generated, add to .gitignore
    ├── video/
    │   └── <title> [id].mp4
    ├── audio/
    │   ├── <title> [id].mp3                  <- embedded ID3 + album art
    │   ├── <title> [id].info.json            <- full yt-dlp metadata
    │   └── <title> [id].jpg                  <- thumbnail
    └── transcript/
        └── <title> [id].md                   <- YAML frontmatter + transcript

Transcript markdown format

---
title: "<video title>"
uploader: "<channel>"
source: https://youtu.be/...
type: youtube
duration: 00:12:34
uploadDate: 2025-08-14
transcribedAt: 2026-04-21T12:00:00Z
model: base.en
---

# <video title>

<full transcript, whisper base.en, timestamps stripped>

Architecture

src/
├── features/
│   └── youtube/
│       ├── youtube.ts        <- orchestrator: dispatches per subcommand
│       └── metadata.ts       <- yt-dlp --print-json --no-download
├── shared/
│   ├── ytdlp.ts              <- Bun.spawn wrapper around yt-dlp binary
│   ├── transcribe.ts         <- nodejs-whisper wrapper, MD writer
│   ├── logger.ts             <- chalk-themed logger
│   └── paths.ts              <- CWD-relative output/* helpers
├── core/
│   ├── types.ts              <- ContentType, OutputType unions
│   └── detect-type.ts        <- URL -> ContentType
└── index.ts                  <- commander CLI entry

A single bun build produces dist/cli.js — one minified ESM file with nodejs-whisper marked external (it ships native whisper.cpp artefacts that must be installed at the user's machine, not bundled).

FAQ

Where do the files go? Into ./assistvideo/ in your current working directory. Run it from the folder you want the artefacts in.

Does it upload my video anywhere? No. yt-dlp downloads directly from YouTube; whisper.cpp runs locally on CPU. Nothing is sent to any third-party API.

What whisper model does it use? base.en by default (~150MB). Downloaded once on first transcribe run, cached for subsequent calls. A --model flag is on the roadmap.

The first transcribe run is slow — why? Two first-run costs: nodejs-whisper compiles whisper.cpp via make, and the base.en model is downloaded. Both are cached; subsequent runs skip them.

Does it handle playlists or channels? Not yet — pass one video URL at a time. A batch <file> command is on the roadmap.

Can I use my own yt-dlp config? yt-dlp respects its own ~/.config/yt-dlp/config when invoked, so anything you've set globally (auth cookies, rate limits, proxy) applies automatically. assistvideo just calls the binary.

Does it work on non-YouTube URLs? Not yet — the URL detector only recognises youtube.com / youtu.be at the moment. Since the underlying yt-dlp supports 1000+ sites, extending to vimeo, generic video, etc. is just a detector change. See FEATURE-ROADMAP.md.

Should I commit assistvideo/ to git? No. Add assistvideo/ to your .gitignore. The artefacts are generated — re-run the CLI if you want them back.

Roadmap

See FEATURE-ROADMAP.md for planned content types (website, vimeo, audio, podcast, …), commands (metadata, subtitles, summary, batch, …) and options (--model, --language, --format, …).

License

MIT.

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

assistvideo

The problem

What this unlocks

For note-taking

For audio archives

For offline viewing

For privacy

Quick start

Install

System dependencies

Features

Requirements

CLI reference

Output layout

Transcript markdown format

Architecture

FAQ

Roadmap

License