video-cli
v0.4.0
Published
Local-first video REPL for AI agents - search, ask, navigate, extract
Maintainers
Readme
Why It Exists
Video is hard to work with programmatically. You can watch it, but you cannot grep it, cite it, diff it, or hand it to an agent and expect repeatable answers.
video-cli exists to make a video usable as a working surface: it extracts transcript spans, OCR text, frame descriptions, embeddings, timestamps, frames, and clips so an agent can search and answer with grounded evidence.
Quick Start
video-cli init
video-cli setup video.mp4
video-cli ask <id> "what is the main argument?"That is the normal path: initialize credentials once, ingest a video once, then ask questions against the local artifacts.
Install
Published package:
npm install -g video-cli
video-cli initRepo checkout:
git clone https://github.com/Dexin-Huang/video-cli
cd video-cli
cp .env.example .env
node video-cli.js initThe published package is the normal user path. The repo-local node video-cli.js ... form is mainly for development and contributor workflows.
Onboarding
- Add your API key with
video-cli init. - Run
video-cli setup <video-file>to build the artifacts. - Ask a grounded question with
video-cli ask <video-id> "<question>". - Use
video-cli search,context,chapters,frame, orclipwhen you need to inspect a specific moment.
Example
video-cli setup lecture.mp4
# { "id": "lec-abc123", "ready": true, ... }
video-cli ask lec-abc123 "what is the main argument?"
# { "answer": "...", "citations": [...], "suggestedFollowUps": [...] }
video-cli context lec-abc123 --at 198 --window 15
# { "utterances": [...], "ocrItems": [...], "frames": [...], "sceneChanges": [...] }
video-cli clip lec-abc123 --at 198 --pre 5 --post 10
# { "output": "data/videos/lec-abc123/clips/clip-198_000.mp4" }Command Surface
| Area | Commands | Purpose |
|------|----------|---------|
| Setup | init, install --skills, cleanup | Configure API key, install agent skill, remove artifacts |
| Intent | setup, ask | Ingest a video or answer a question with evidence |
| Navigate | search, context, chapters, next, grep | Find and inspect specific moments |
| Extract | frame, clip | Pull out a still image or short clip |
| Pipeline | ingest, transcribe, ocr, analyze, embed, describe | Run individual stages when you need control |
| Inspect | list, status, inspect, timeline, watchpoints, bundle, brief, config | Check readiness and inspect artifacts |
| Automation | eval:generate, eval:run | Measure retrieval quality |
How It Works
video.mp4
-> ffmpeg scene detection
-> transcription
-> OCR + frame descriptions
-> embeddings
-> searchable JSON artifacts on disk
query
-> semantic + lexical + description search
-> grounded answer with citations and follow-upsConfiguration
video-cli starts from a built-in preset, then resolves runtime config in this order:
- Built-in preset defaults
video-cli.config.jsonin the repo root- Environment variable overrides
- Command flags for commands that expose them
Use video-cli config to inspect the final merged config that the CLI will actually use.
A typical video-cli.config.json looks like this:
{
"preset": "balanced",
"ocr": {
"provider": "gemini",
"model": "gemini-3.1-flash-lite-preview",
"watchpointLimit": 8
},
"transcribe": {
"provider": "gemini-transcribe",
"model": "gemini-3.1-flash-lite-preview",
"chunkSeconds": 480,
"trimSilence": false,
"minSilenceSec": 1.5,
"padSec": 0.25,
"silenceNoiseDb": -35,
"diarize": true,
"utterances": true,
"smartFormat": true,
"punctuate": true,
"detectLanguage": false,
"language": null
},
"embed": {
"provider": "gemini",
"model": "gemini-embedding-2-preview",
"dimensions": 768,
"sources": {
"transcript": true,
"ocr": true,
"frames": false
}
}
}Common environment overrides:
VIDEO_CLI_PRESET: choose the base preset before file/env overrides are mergedVIDEO_CLI_OCR_PROVIDER,VIDEO_CLI_OCR_MODEL: change OCR provider/modelVIDEO_CLI_TRANSCRIBE_PROVIDER,VIDEO_CLI_TRANSCRIBE_MODEL: change transcription provider/modelVIDEO_CLI_TRANSCRIBE_CHUNK_SECONDS: change transcription chunk sizeVIDEO_CLI_TRANSCRIBE_TRIM_SILENCE,VIDEO_CLI_TRANSCRIBE_MIN_SILENCE_SEC,VIDEO_CLI_TRANSCRIBE_PAD_SEC: control silence trimmingVIDEO_CLI_EMBED_PROVIDER,VIDEO_CLI_EMBED_MODEL,VIDEO_CLI_EMBED_DIMENSIONS: change embedding provider/model/dimensionsVIDEO_CLI_EMBED_TRANSCRIPT,VIDEO_CLI_EMBED_OCR,VIDEO_CLI_EMBED_FRAMES: turn embedding sources on or offVIDEO_CLI_DATA_ROOT: move the artifact store away from the defaultdata/videosVIDEO_CLI_ID: provide a default<video-id>for commands that normally take one as the first positional argument
Provider-specific model aliases are also accepted where relevant, including GEMINI_OCR_MODEL and GEMINI_TRANSCRIBE_MODEL.
Requirements
- Node 22 or newer
ffmpegandffprobeGEMINI_API_KEYin.envfor the default path
Troubleshooting
ffmpeg not foundorffprobe not foundInstallffmpegand make sure both binaries are onPATH. The CLI shells out to them directly for probing, frame extraction, clip extraction, and silence detection.No embeddings foundorNo transcript/ocr foundRunvideo-cli setup <file>for the normal path, or run the missing pipeline stage directly and re-check withvideo-cli status <video-id>.Unknown video idRunvideo-cli listto see available artifacts, or setVIDEO_CLI_IDif you want a default active video.- PowerShell blocks
npmOn some Windows setups, PowerShell execution policy blocksnpm.ps1. Usenpm.cmd ...instead. - Restricted sandboxes fail on child processes
The CLI depends on subprocesses for
ffmpegandffprobe. Some sandboxes block nested process execution; that is an environment limit, not avideo-clibug.
For AI Agents
See SKILL.md for the agent-facing command reference and output shapes.
Development
npm test # test suite
npm run eval # retrieval quality evalZero npm dependencies. Pure Node.js plus ffmpeg.
