framecap

v0.2.1

Published

15 hours ago

YouTube videos → structured markdown with visual frame captures

0High
0Medium
0Low

aviralv

youtube transcript frames markdown obsidian

framecap

YouTube videos → structured markdown with visual frame captures.

Takes a YouTube URL and outputs a clean markdown document with a structured transcript (chapter headers, speaker labels, paragraph breaks) and frame captures at key moments — embedded as images in the markdown.

Why

YouTube videos contain valuable knowledge, but it's trapped in a format you can't search, reference, or link to. Transcripts alone miss the visual context. framecap gives you both: a readable document with visual bookmarks.

Install

# Prerequisites
brew install yt-dlp ffmpeg

# Install framecap
npm install -g framecap

Usage

# Single video
framecap https://youtube.com/watch?v=abc123

# Custom output directory
framecap https://youtube.com/watch?v=abc -o ./notes/

# Capture frames at specific timestamps
framecap https://youtube.com/watch?v=abc --capture-at 1:30,5:00,12:45

# Hint speaker names for interviews
framecap https://youtube.com/watch?v=abc --speakers "Lex Fridman,Andrej Karpathy"

# Skip LLM structuring (free mode — raw transcript + frames only)
framecap https://youtube.com/watch?v=abc --no-structure

# Obsidian-compatible output (wikilink image syntax)
framecap https://youtube.com/watch?v=abc --format obsidian

# Preview plan and cost (fetches metadata + transcript, skips video download and LLM)
framecap https://youtube.com/watch?v=abc --dry-run

Output

./how-karpathy-builds-software.md
./frames/how-karpathy-builds-software/
├── frame-0001-00m00s.jpg
├── frame-0002-01m45s.jpg
├── frame-0003-05m30s.jpg
└── ...

The markdown file includes:

YAML frontmatter — title, channel, URL, duration, upload date, auto-generated tags
Structured transcript — organized by chapters (from video description), with speaker labels and natural paragraph breaks
Embedded frames — images at key moments, with timestamps and captions

Options

| Flag | Default | Description | |---|---|---| | -o, --output | ./ | Output directory | | --interval | auto | Force fixed-interval frame capture (seconds) | | --max-frames | 50 | Maximum frames to extract | | --dedup-threshold | 0.85 | Frame similarity filter (0.0-1.0) | | --no-dedup | off | Keep all frames | | --format | markdown | markdown or obsidian (wikilinks) | | --capture-at | — | Capture at specific timestamps (e.g. 1:30,5:00) | | --speakers | auto | Comma-separated speaker names | | --no-structure | off | Skip LLM pass (free mode) | | --no-frames | off | Transcript only | | --language | en | Transcript language | | --keep-video | off | Retain downloaded video file | | --cookies-from-browser | — | Use cookies from browser (chrome, firefox, edge) | | --model | claude-sonnet-latest | LLM model for structuring | | --dry-run | off | Preview plan and cost (skips video download and LLM) | | --quiet | off | Suppress all output except the final path (for piping) | | -v, --verbose | off | Detailed logging |

Requirements

Node.js 18+
yt-dlp — video/transcript download
ffmpeg — frame extraction
Anthropic API key (optional, for transcript structuring — set ANTHROPIC_API_KEY)

How It Works

Fetch metadata — yt-dlp gets title, channel, duration, chapters, description
Extract transcript — yt-dlp pulls auto/manual captions, parses VTT
Capture frames — ffmpeg extracts frames at intervals or chapter boundaries
Deduplicate frames — removes visually similar frames (configurable threshold)
Structure transcript (optional) — LLM adds chapter headers, speaker labels, paragraph breaks. All words stay verbatim — only whitespace and labels are added.
Assemble markdown — combines metadata, structured transcript, and frame references into the output file

Cost

The LLM structuring pass is the only part that costs money (requires Anthropic API key):

| Video Length | Approximate Cost (Sonnet) | |---|---| | 15 minutes | ~$0.05–0.08 | | 1 hour | ~$0.20–0.35 | | 2 hours | ~$0.40–0.70 |

Use --no-structure for completely free operation (raw transcript + frames).

License

MIT

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

framecap

Why

Install

Usage

Output

Options

Requirements

How It Works

Cost

License