video-understand
v0.1.0
Published
CLI tool that enables AI coding agents to understand and analyze videos
Maintainers
Readme
video-understand
CLI tool that enables AI agents to understand and analyze videos.
video-understand analyze video.mp4 "What happens in this video?"Works with local files, YouTube URLs, and HTTP video URLs. Outputs clean markdown. Designed to be invoked by AI agents (Claude Code, Cursor, Copilot, OpenClaw, etc.) via Bash.
Supports multiple AI providers: Gemini (Google) and Kimi (Moonshot AI).
Install
npm install -g video-understandRequires Node.js 18+.
Authentication
Gemini (default)
Get a Gemini API key from Google AI Studio.
# Option A: Environment variable
export GEMINI_API_KEY="your-key"
# Option B: CLI login
video-understand login --provider gemini --key "your-key"Kimi (Moonshot AI)
Get a Moonshot API key from platform.moonshot.ai.
# Option A: Environment variable
export MOONSHOT_API_KEY="your-key"
# Option B: CLI login
video-understand login --provider kimi --key "your-key"Usage
Analyze a video
# Local file (Gemini or Kimi)
video-understand analyze video.mp4 "Describe what happens"
video-understand analyze video.mp4 "Describe what happens" --provider kimi
# YouTube URL (Gemini: no download needed; Kimi: downloads via yt-dlp then uploads)
video-understand analyze "https://www.youtube.com/watch?v=VIDEO_ID" "Summarize this"
video-understand analyze "https://www.youtube.com/watch?v=VIDEO_ID" "Summarize this" --provider kimi
# With timestamps
video-understand analyze video.mp4 "Key moments?" --timestamps
# JSON output
video-understand analyze video.mp4 "Describe" --json
# Save to file
video-understand analyze video.mp4 "Describe" -o analysis.md
# Use a specific model
video-understand analyze video.mp4 "Describe" --model gemini-3-pro-preview
video-understand analyze video.mp4 "Describe" --provider kimi --model kimi-k2.5Upload + Ask (multi-turn)
# Upload first
video-understand upload video.mp4
video-understand upload video.mp4 --provider kimi
# Ask follow-up questions without re-uploading
video-understand ask "video.mp4" "What color is the car?"
video-understand ask "video.mp4" "How many people appear?" --provider kimiFile management
video-understand list # List uploaded files (default provider)
video-understand list --provider kimi # List Kimi files
video-understand delete "video.mp4" # Delete by name
video-understand delete "f8csbxsqrz9111fuxjki" --provider kimi # Delete by file IDConfiguration
video-understand config # Show current config
video-understand login --provider gemini --key "key"
video-understand login --provider kimi --key "key"Agent Skill
This package includes an Agent Skill that teaches AI coding agents when and how to use the CLI.
Install the skill
From GitHub (works with 25+ agents):
npx skills add sifr42/video-understandManual (Claude Code):
cp -r skills/video-understand ~/.claude/skills/video-understandThe skill provides:
- Command reference and examples
- Installation and auth instructions
- Guidance on when to use each command
Supported Formats
MP4, MPEG, MOV, AVI, FLV, MPG, WebM, WMV, 3GPP, MKV
Providers & Models
| Provider | Model | Default | Use case |
|----------|-------|---------|----------|
| gemini | gemini-3-flash-preview | ✓ | Fast, cost-effective |
| gemini | gemini-3-pro-preview | | Detailed, nuanced analysis |
| kimi | kimi-k2.5 | ✓ | Same as gemini models overall but requires yt-dlp for YouTube videos. Install: winget install yt-dlp (Windows), brew install yt-dlp (macOS), sudo apt install yt-dlp (Linux), or uv tool install yt-dlp (cross-platform). |
How It Works
Gemini:
- Local files → Content-hashed → Reused if cached, otherwise uploaded → Polled until ready → Analyzed
- YouTube / HTTP URLs → Passed directly (native support, no upload)
Kimi:
- Local files → Content-hashed → Reused if cached, otherwise uploaded via Moonshot Files API → Analyzed
- YouTube / HTTP URLs → Downloaded to a temp file via
yt-dlp(YouTube) orfetch(HTTP) → Uploaded → Analyzed → Temp file deleted - Files persist indefinitely (no auto-expiry), but there are some limits on how many files you can upload at once and the total size of all uploaded files. See Kimi's File API documentation for more information.
The CLI handles upload, deduplication, prompt construction, and output formatting. Spinners are TTY-aware — they fall back to simple log lines when invoked by agents.
Upload cache lives at ~/.video-understand/uploads.json — same file content won't be re-uploaded across sessions or directories.
License
MIT
