videoclaw
v3.0.0-alpha.1
Published
Agent-friendly multi-provider AI video CLI (Veo, Seedance, Runway, Omni Flash) with on-disk JSON artifacts, approval gates, storyboard review UI, and portfolio ops. v3 — narrow, deterministic, agent-target CLI.
Maintainers
Readme
videoclaw
Turn a one-line idea into a finished AI video — step by step, in the open, with a human approval before anything expensive runs.
A command-line tool (vclaw) that takes a plain-English idea like "a 15-second ad for my coffee brand" and walks it all the way to a reviewed, published video — using AI video services like Veo, Seedance, and Runway. Every step is saved to a file you can read, so nothing is hidden and you can stop, inspect, or replay any stage.
What is it? · What you can do · How it works · The main parts · Quickstart · Deep docs
📚 Full documentation site → videoclaw-docs.vercel.app
The complete, idiot-proof and agent-ready docs: an interactive guide, a page for every feature (30+), a recipe book of what you can make, the skills catalog, and all the deep reference docs — with diagrams throughout.
New here? The easiest path is to drive it by talking to Claude Code (or any agent host); there's a dedicated how-it-works page for agents too. Once you're in Claude Code in this repo, just type
/concierge(or/clawbot) — or say "make me a video" — and Clawbot, VideoClaw's mascot and concierge, walks you from idea to finished film, with a preview and your approval before anything costs money.
🗺️ Explore the docs
Everything lives at videoclaw-docs.vercel.app (there's a search box on every page).
Guide — Start here · Use it with Claude Code · Install & setup · Your first video · What videos can you make? · How it works · Storyboard vs Director · Characters · Providers · Review & publish · Assemble & polish · Troubleshooting · Cheat sheet
Skills & tools — Skills catalog · Video skills · Workflow skills · Scripts & tooling · For agents
Reference — Reference home · Architecture · CLI reference · Production workflow · Provider platform · Story bible · Assemble · Internal plans & specs
- Create — projects & lifecycle · one-shot create · briefs · brand DNA · storyboards · storyboard grid · story bible
- Direct & refine — director mode · prompt quality · multi-shot · filmmaking & cinematography
- Keep it consistent — characters · reference sheets · scene candidates · Seedance Asset Library
- Generate — providers & routing · execution runtime · overnight batch queue
- Finish — assembly · review UI · preview portal · publishing
- Manage — templates & cloning · portfolio & ops · doctor · Obsidian · telemetry & cost
- Plan & agents — studio planner · agent surface · Veo CLI
🎯 What is videoclaw? (in plain English)
videoclaw is a tool you run in your terminal that makes AI videos for you, one careful step at a time.
You give it an idea. It writes a short brief, plans the shots (a storyboard), generates the video clips with an AI provider, stitches them together with narration and music, lets a human approve the result, and marks it published. That's it.
Think of it like an assembly line for video. Your idea goes in one end; a finished MP4 comes out the other. At each station along the line, the machine writes down exactly what it did in a file you can open and read. If something looks wrong, you can rewind to any station and try again — you never have to start over from scratch.
Two things make it different from the usual "type a prompt, get a video" tools:
- It never hides what it's doing. Every stage leaves a plain, readable file behind. And if a video provider fails, it tells you loudly — it does not quietly swap in a different provider and pretend it worked.
- It has a "stop and check" button. In director mode it refuses to spend money on the real render until a human has looked at the storyboard and said "yes, go."
New here? You don't need to understand any of the technical sections lower down. Read the next four short sections and you'll know what this whole thing does.
Who is it for?
- Creators & marketers who want repeatable, reviewable AI videos — product ads, presenter explainers, UGC campaigns, music videos.
- Small teams juggling many video projects who need a single dashboard of what's stuck, what needs a look, and what's ready to ship.
- AI agents — everything is machine-readable, so an automated agent can drive the whole pipeline on its own.
✨ What you can do with it
- Go from idea → finished video without leaving the terminal.
- Use several AI video engines (Veo, Seedance, Runway) through one consistent set of commands — no need to learn each provider's quirks.
- Keep characters consistent — the same person, outfit, set, and props look the same in every scene (via character profiles, reference sheets, and the story bible).
- Approve in your browser before any expensive render runs — open the Review UI, look at the stills, click approve or regenerate.
- Add the finishing touches — narration, background music, subtitles, thumbnails, and vertical / square / looping variants for different platforms.
- Run many projects at once and see them all on a dashboard: what's blocked, what's stale, what needs review, what's ready to publish.
- Trust the result — a project is only "ready" when its review report literally says
pass. No guessing. - Rehearse for free — almost every command has a
--dry-runthat plans the whole thing without spending a credit.
🪜 How it works, step by step
Here is the whole assembly line in plain words. Each step is a command, and each step saves its work to a file.
| Step | Command | What actually happens |
|---|---|---|
| 1 | init | Create a new project folder to hold everything. |
| 2 | brief | Turn your one-line idea into a short written brief. |
| 3 | storyboard | Break the brief into scenes — and auto-build a story bible so characters, settings, and props stay consistent. |
| 4 | (director mode) preflight + approval | Check for problems, then wait for your "go" before spending anything. |
| 5 | plan | Pick the AI provider and prepare the exact request to send. |
| 6 | produce | Actually generate the clips. (Add --dry-run to rehearse for free.) |
| 7 | assemble | Stitch clips + narration + music into one MP4, then quality-check it. |
| 8 | review | A human (or the Review UI) approves the result. |
| 9 | publish | Mark it done and hand it off. |
You don't always run these by hand — vclaw video create can do the whole front of the line in one shot — but this is the path everything follows underneath.
🧩 The main parts, explained simply
videoclaw is made of a few moving pieces. Here's each one in everyday terms:
| Part | In plain terms |
|---|---|
| The project folder (projects/<slug>/) | A single folder that holds everything about one video — the brief, storyboard, clips, approvals, history. This folder is the source of truth; the commands are just a tidy way to work on it. |
| Stages & checkpoints | The assembly-line steps above. Each one writes a file and records that it finished, so you can replay or rewind any stage safely. |
| Two modes: storyboard vs director | Storyboard mode is fast and runs straight through. Director mode adds the "stop and approve" gate — it won't render for real until you say so. |
| The Review UI | A small web page (vclaw video review-ui) where a human looks at the storyboard stills and clicks approve or regenerate. No code needed. |
| Characters & the Story Bible | Tools that keep the same people, places, and props looking identical across every scene — the #1 thing that breaks in multi-scene AI video. The story bible is an auto-written "continuity reference" for the whole project. |
| Assembly | The step that turns raw clips into a polished, narrated MP4 — adds a title card, TTS narration, slide animation, music, then runs an automatic media quality check on the result. |
| Providers | The AI engines that actually make the video (Veo, Seedance, Runway). videoclaw routes to the right one and never silently switches if it fails. |
| Portfolio & ops | Dashboards for when you have many projects — status, metrics, next-actions, and a doctor that tells you the single safest next step. |
| Obsidian workspace | The same project data, rendered as browsable notes you can read in Obsidian instead of the terminal. |
| Skills | Pre-built, ready-to-run workflows for common jobs (make a presenter video, clone an ad, build a character) that an AI agent can invoke. |
That's the whole picture. Everything below is reference detail for power users, operators, and AI agents — you can stop here and still use the tool.
🚀 Quickstart
These commands are for a source checkout. If you installed the published package, replace
node dist/cli/vclaw.jswithvclaw.
First, get it running and check your setup:
npm install # Node 20+
npm run build # compile the CLI
npm test # run the full test suite
node dist/cli/vclaw.js video providers # show which AI providers are readyThen run one full lifecycle against a throwaway project (no credits spent — note the --dry-run):
node dist/cli/vclaw.js video init demo
node dist/cli/vclaw.js video brief --project demo --title "Demo" --intent "A 15s product tease"
node dist/cli/vclaw.js video storyboard --project demo --scene "open on product" --scene "close on logo"
node dist/cli/vclaw.js video plan --project demo
node dist/cli/vclaw.js video produce --project demo --dry-run
node dist/cli/vclaw.js video status --project demoAdd --auto-chain to produce/execute to render the whole storyboard sequentially, each scene seeded from the previous scene's output video (continuity chain, unattended).
Or just run the packaged pre-flight that does build + tests + smokes + guardrails for you:
npm run check:release-readiness-lite🎬 Production workflow (the three common jobs)
Most people enter through one of three paths:
- Make a campaign video —
createthe director project, openreview-ui, lock real storyboard stills, attach artifact-backed 4K stills, then publish only after the saved review report hasverdict: "pass"andmetrics.publishReady: true. - Review and fix a project — use
status,next-actions,doctor-project, andreview-uito find the single safest next step. - Manage a portfolio — use
metrics,report,export-csv, andsync-obsidianto see blocked, stale, review-needed, and publish-ready work.
The advanced command surface stays available, but production trust comes from
one rule: a handoff is ready only when review-report.json has
verdict: "pass" and metrics.publishReady: true.
The simple video review --verdict pass command intentionally writes that
approval for already-reviewed projects; for director storyboard-image handoffs,
use review-ui or review-autopilot so publishReady is derived from locked
scene candidates, artifact-backed 4K stills, and final assembly approvals.
Full operator guide: docs/PRODUCTION_WORKFLOW.md.
Handoff checklist: docs/OPERATOR_HANDOFF.md.
🤖 For AI agents
You are landing in a TypeScript/Node 20 CLI repo. If you do nothing else, read these in order:
CLAUDE.md— non-obvious conventions, single-test command, review-state invariant, agent-first orientation.AGENTS.md— autonomy directive, coding style, commit/PR format, security expectations.docs/ARCHITECTURE.md— layer map and the canonical project flow.skills/catalog.json— machine-readable skill surface (don't scrape markdown).docs/MASTER_PLAN_ALIGNMENT.md— what ships today + honest remaining gaps.Canonical entry skills — start broad, specialize later:
video-framework·brand-presenterContracts —schemas/video/*.jsonis the source of truth for artifact shapes. Tests —src/tests/*.test.tsrun vianode --test dist/tests/*.test.js.Don't: edit
dist/· drop.jsextensions from relative imports (NodeNext ESM requires them) · add silent fallback across materially different provider routes · commit.omx//.vclaw// secrets.
Agent integration
videoclaw is built as a target for agent hosts, not as an orchestrator.
- One-call discovery:
vclaw schema --jsonreturns the full command contract. - MCP server:
vclaw mcp serveexposes read-only state queries to MCP-aware hosts. - Sample skills: see
mcp/skills-pack/for Claude Code skill templates.
See docs/AGENT_INTEGRATION_RESEARCH.md for the design rationale.
🏗️ Architecture
The system is layered: a thin command-line front, a domain core that does the real work, and an adapter tier that talks to the AI providers. Each stage writes to the artifact/checkpoint/event ledger on disk.
flowchart TB
U[👤 Operator / AI agent]
U -->|vclaw video ...| CLI["src/cli/vclaw.ts<br/>single entrypoint"]
CLI --> CORE["src/video/*<br/>domain modules"]
CORE --> ART[Artifacts · JSON]
CORE --> CKPT[Checkpoints · stage state]
CORE --> EVT["Events · events.jsonl"]
CORE --> RT[Execution runtime]
RT --> PP[provider-platform/<br/>route descriptors]
RT --> PM[pipeline-manifests/<br/>storyboard · director]
RT --> ADP[Adapter layer]
ADP -->|built-in binary| NAT["Native transports<br/>Seedance · Veo"]
ADP -->|command shim| CMD["_SUBMIT_CMD<br/>_POLL_CMD · _CANCEL_CMD"]
ADP -->|custom override| EXT["VCLAW_*_ADAPTER<br/>external binary"]
CORE -. validates against .-> SCH["schemas/video/*.json"]
CORE --> OBS[Obsidian export / sync]
CORE --> REP[Reports · CSV · Snapshots · Diffs]- CLI layer — argparse + dispatch only; no business logic.
- Domain layer (
src/video/*) — small, single-purpose modules. Each file owns one concept (artifacts, checkpoints, readiness, execution-plan, execution-runtime, doctor, metrics, next-actions, project-index, obsidian-export, etc.). - Provider platform — route descriptors for
veo-useapi,seedance-direct,runway-useapi,dreamina-useapi. - Adapter layer — three resolution strategies (custom binary → built-in adapter with command shim → native in-process transport). Explicit fall-through, never silent.
- Schemas — JSON Schema contracts under
schemas/video/are the source of truth for every artifact shape.
🔁 Project lifecycle
flowchart LR
init([init]) --> brief([brief])
brief --> sb([storyboard])
sb --> ready{{readiness}}
ready -->|storyboard mode| runPlan([plan / produce])
ready -->|director mode| pre{{director-preflight<br/>content · refs · pronouns}}
pre -->|hazards| fix[auto-fix or<br/>storyboard-review]
fix --> pre
pre -->|pass| gate{{approval gate<br/>VIDEOCLAW_APPROVE_STORYBOARD=1}}
gate -.awaiting-approval.-> ops[(ops queue:<br/>needs-review)]
gate -->|approved| runPlan
runPlan --> execStatus([execute-status<br/>poll adapter])
execStatus --> ingest[[ingest outputs]]
ingest --> assets([assets])
assets --> review([review])
review --> publish([publish])
execStatus -. operator .-> cancel([execute-cancel])On-disk shape of a project
projects/<slug>/
├── project.json # manifest: slug, mode, state, metadata, execution profile
├── artifacts/
│ ├── brief.json # canonical brief
│ ├── storyboard.json # scenes + character bindings
│ ├── story-bible.json # deterministic continuity reference (cast · settings · props · timeline)
│ ├── asset-manifest.json # per-scene assets
│ ├── clone-plan.json # optional: template → clone decisions
│ ├── execution-plan.json # selected route + payload
│ ├── execution-report.json # submit · poll · output ingest
│ ├── readiness.json
│ ├── character-consistency.json
│ ├── review-report.json
│ ├── publish-report.json
│ ├── analyze-output.json
│ └── history/ # append-only artifact snapshots
├── checkpoints/ # brief · storyboard · assets · review · publish (+ state)
├── events/events.jsonl # append-only timeline
├── characters/characters.json # optional GB-anchored profiles
├── storyboard.md # director-mode approval review (human-readable)
└── state/ # derived state cache🧭 Production modes
Every command accepts --mode storyboard|director. The mode drives the pipeline manifest and the gate semantics.
| Dimension | storyboard mode | director mode |
|---|---|---|
| Default stage set | init → brief → storyboard → assets → review → publish | Same, plus preflight + approval gate before execution |
| Approval gate | none | VIDEOCLAW_APPROVE_STORYBOARD=1 required before provider submission |
| Storyboard review file | optional | projects/<slug>/storyboard.md auto-generated with character binding table and cost estimate |
| Preflight checks | readiness + character-consistency | + content-hazard detection · GB-id validation · remote-ref probe · pronoun drift · repeated-scene warnings |
| Preflight bypasses | n/a | DIRECTOR_AUTO_FIX_CONTENT=1 (rewrite hazards) · SKIP_DIRECTOR_PREFLIGHT=1 |
| Ops visibility | active | awaiting-approval → surfaced as needs-review across index · metrics · dashboards |
The normalized review-state ladder (missing → current → stale) flows through status, index, report,
export-csv, Obsidian export, dashboards, next-actions, snapshot diffs, and the doctor layer. A stale director review
blocks execute/execute-status even if approval is set — review freshness is a first-class runtime invariant.
🔌 Provider routing
flowchart TD
Start[["route ∈ { veo-useapi · seedance-direct ·<br/>runway-useapi · dreamina-useapi }"]]
Start --> Q1{"VCLAW_*_ADAPTER set?"}
Q1 -->|yes| Custom[["Custom adapter binary<br/>stdin → JSON, stdout → JSON"]]
Q1 -->|no| Q2{"Built-in adapter supports route?<br/>(seedance-direct · veo-useapi · runway-useapi)"}
Q2 -->|no| Fail([["❌ hard fail<br/>no silent fallback"]])
Q2 -->|yes| Q3{"_SUBMIT_CMD / _POLL_CMD set?"}
Q3 -->|yes| Shim[["Command shim<br/>through built-in adapter"]]
Q3 -->|no| Q4{"Native creds available?<br/>SUTUI_API_KEY (seedance) · local vclaw-cli (veo) · USEAPI_API_TOKEN (runway)"}
Q4 -->|yes| Native[["✅ Native in-process transport"]]
Q4 -->|no| FailEnvironment variables
| Variable | Used by | Purpose |
|---|---|---|
| VCLAW_VEO_USEAPI_ADAPTER | veo-useapi | custom adapter binary override |
| VCLAW_SEEDANCE_DIRECT_ADAPTER | seedance-direct | custom adapter binary override |
| VCLAW_RUNWAY_USEAPI_ADAPTER | runway-useapi | custom adapter binary override |
| VCLAW_SEEDANCE_DIRECT_SUBMIT_CMD · _POLL_CMD · _CANCEL_CMD | seedance-direct | command shim through built-in adapter |
| SUTUI_API_KEY | seedance-direct (native) | XSkill API credentials for in-process transport |
| VIDEOCLAW_APPROVE_STORYBOARD | director mode | =1 to approve storyboard and allow provider submission |
| DIRECTOR_AUTO_FIX_CONTENT | director mode | =1 to rewrite provider-risk phrases in storyboard |
| SKIP_DIRECTOR_PREFLIGHT | director mode | =1 to bypass preflight (use sparingly) |
| DIRECTOR_STRICT_PROMPT_QUALITY | director mode | =1 to promote prompt-quality warnings to blockers |
| DIRECTOR_STRICT_DIALOGUE_FIT | director mode | =1 to promote dialogue-duration warnings to blockers |
| GEMINI_API_KEYS · GOOGLE_API_KEYS · GOOGLE_API_KEY | analyze-template / analyze | Gemini key pool (round-robin with per-key cooldown) |
| VCLAW_GEMINI_API_ENDPOINT | Gemini pool | override HTTP endpoint (local or alt Gemini-compatible) |
| GO_BANANAS_API_KEY | preflight | validate stored character GB-id anchors |
Google Flow inline @-markers (veo-useapi)
useapi.net's Google Flow v1 API (blog 260609) accepts inline @-mention markers in prompt text that anchor a body-slot reference to a position in the prompt:
| Marker | Index range | Endpoint |
|---|---|---|
| @character_N | 1–7 | POST /videos and POST /images |
| @referenceImage_N | 1–7 | POST /videos |
| @referenceAudio_N | 1–5 | POST /videos |
| @reference_N | 1–10 | POST /images |
Markers are case-insensitive and opt-in (a slot without a marker is fine; a marker without a matching body slot makes the API 400). The grammar is reserved through videoclaw's prompt pipeline (@Name tag resolution preserves the tokens verbatim, like @imageN) and is veo-useapi-route-only — on any other route the tokens are stripped from the scene prompt with a warning. V2V deliberately has no marker (referenceVideo_1 stays flag-only via --ref-video). Helper module: src/video/flow-markers.ts; full details in docs/CLI_REFERENCE.md.
Auto-injection: on veo-useapi, when a scene's prompt tags a character that has a registered Flow ref (flow-characters.json), buildExecutionPayload rewrites the @Name tag into its canonical @character_N marker automatically. Slot order = scene cast order first, then tag-only characters (capped at 7, overflow warned); hand-authored markers pass through untouched. Cast-name matching stays exact-case (the legacy lookup, verbatim) — which is exactly why a tagless prompt produces a byte-identical payload; only @Name tag matching is case-insensitive (@clawbot resolves to a registered Clawbot). Note that a ref-registered character's tag no longer also attaches its loose portrait image — the saved Flow character bundles its identity images. Characters without a Flow ref keep the normal descriptor substitution (including portrait collection).
🧰 Command surface
Full reference: docs/CLI_REFERENCE.md. Condensed groups:
Lifecycle
init · create · auto · iterate · run-pipeline · brief · storyboard · assets · review · publish · approve
Readiness · planning · runtime
readiness · plan (alias execution-plan) · produce (alias execute) · execute-status · execute-cancel · director-preflight · storyboard-review · review-ui · review-autopilot
review-ui is the local human-in-the-loop station for the Seedance storyboard
workflow documented in
docs/REFERENCE_VIDEO_SEEDANCE_MOTION_DESIGN_WORKFLOW.md.
Use its director defaults to save a production ledger that follows the
still-frame-first, start/end-frame, bridge-pose, variant-pass, and post-retiming
recipe.
review-autopilot lets the agent do that handoff without manual clicks once
storyboard still candidates exist. It locks the best available stills, promotes
artifact-backed upscaled handoff assets, fills the reference and assembly gates,
and writes the same review artifacts as the browser station.
Templates · cloning · storyboard templates
analyze · analyze-template · template-create · template-save · template-list · template-show · template-validate · clone-plan · clone-init · clone-ad · clone-execute · storyboard-from-clone · storyboard-template-list · storyboard-template-show
Character subsystem
character-add · character-list · character-show · character-consistency · character-auto-create · environment-auto-create · character-import-library · find-library · library find · library clean · list-library · seedance-register-assets · flow-register-characters · flow-register-voices · voice-clone · show-bible
Portfolio · ops · reporting
list · index · metrics · workload · next-actions · dependencies · status · doctor-project · doctor-portfolio · report · report-snapshot · report-history · report-diff · trends · export-csv · artifact-history · verify-env · cost-estimate
Metadata
set-meta · set-execution-profile · import-legacy
Post-production
remix-narrated · verify-final · make-vertical · make-square · make-loop · thumbnail · archive-project · motion-overlay · subtitle burn-in
Obsidian
export-obsidian · sync-obsidian · scaffold-obsidian-vault
Multi-shot prompt
multi-shot — scaffold / validate / Gemini-author timecoded multi-shot cinematic prompts, with preset discovery, storyboard scene hydration, provider-shaped defaults, and parsed shot artifacts (docs)
Director & brand layer
director-blueprint — validate + persist the project visual bible (artifacts/project-blueprint.json); filmmaking-prompts auto-appends a prose DIRECTOR addendum when present (docs) · brand-definition — validate + persist the locked brand system (artifacts/brand-definition.json: palette/voice/typography/theme map, strict #RRGGBB hex validation); filmmaking-prompts auto-appends a prose BRAND line when present (docs)
Reference libraries
playbook-list · playbook-show · prompt-lib-list · prompt-lib-show · providers
📖 The story bible & assembly (newest features)
Two recent additions are worth calling out because they fix the two things that most often go wrong in multi-scene AI video — consistency and timing.
Story bible (
artifacts/story-bible.json) — a single, machine-readable "continuity reference" for a project: the cast, settings, props, and the full scene timeline, with per-scene continuity notes. It is auto-written every time the storyboard is created, derived deterministically from the brief + storyboard + character profiles, so downstream generation stays consistent across scenes and regenerations. It spends no credits and calls no providers. Full guide:docs/STORY_BIBLE.md.Assemble: media QC + narration fit — when
vclaw video assemblestitches the final MP4 it now (a) ffprobe-checks every clip and the master for codec / audio / duration problems and folds the findings into the report, and (b) fits the narration to the video — speeding the voice slightly when it's a hair too long, otherwise keeping speech natural and looping the visual bed. Full guide:docs/ASSEMBLE.md.Soundtrack A/B (
vclaw video soundtrack) — generate a music bed candidate from every configured backend (Suno viaKIE_API_KEY, Lyria via Vertex, Lyria 3 via a Gemini key, FlowMusic — full vocal songs via Lyria 3 Pro on the sharedUSEAPI_API_TOKEN, with--lyrics/--instrumental), compare them side-by-side in the preview portal, then--select <backend>to lock the winning track into the project (written tosoundtrack.json+ the manifestsoundtrackfield). Dry-run plans the candidates with no keys. Full guide:docs/ASSEMBLE.md.Narration / TTS (
vclaw video narrate) — synthesize a narration clip from a script via a TTS backend (gemini-tts, the Gemini APIgemini-2.5-flash-preview-ttsmodel — API-key product, resolves a key fromGEMINI_API_KEYS/GOOGLE_API_KEYS/GOOGLE_API_KEY) toartifacts/audio/narration.wav+ a typednarration.json.--video-duration-msembeds a narration-fit timing plan;--dry-runestimates duration with no keys. Full guide:docs/ASSEMBLE.md.Cartoon-show voice clone (
vclaw video voice-clone) — build the "blank video with audio" voice reference (the production-learned voice-cloning trick) so a target voice — including your own — locks into a Seedance/Veo generation. A raw MP3/WAV reference drifts to a generic accent; a black-frame video carrying the same audio locks the voice. Renders that clip from--audio(local ffmpeg, no spend) and persists it invoice-clones.json;--characterbinds it so the character's scene@Nametags auto-route the clip into Seedancereference_videos. Plan-only by default;--executerenders + persists.--slice-secondssplits the recording per line for drift control. Full guide:docs/CLI_REFERENCE.md.Cartoon-show bible (
vclaw video show-bible) — the repeatable cartoon-SHOW asset-library index (show-bible.json): tie the project's characters + locations + voice clones into one reusable world and track the episode list, so a solo creator can make many consistent episodes. Derives the bible from existing artifacts (auto-binding each character's voice clone), or--from-jsonto persist an authored one;--add-episode "id|title|logline"merges episodes by id;--showprints without writing. Deterministic, no spend — distinct fromstory-bible(continuity) anddirector-blueprint(visual direction). Full guide:docs/CLI_REFERENCE.md.Diegetic stills (
vclaw video gen-image) — generate an in-world prop, on-screen screen (UI/dashboard), or overlay graphic (e.g. a "SYSTEM COMPROMISED" alert) intoassets/props/. Three backends via--backend: gobananas (default,GO_BANANAS_API_KEY, no OpenAI key), openai (gpt-image), and flow — Google Flow via useapi.net (USEAPI_API_TOKEN+USEAPI_ACCOUNT_EMAIL):imagen-4/nano-banana/nano-banana-proauto-selected by reference count, repeated--ref(reference_1..10, local paths upload first) +--character(character_1..7, names resolve viaflow-characters.json) slots,--count/--seed, and inline@reference_N/@character_Nprompt markers validated before any upload or spend. Per-kind render directives (screens/overlays keep text, props suppress it);--dry-runprints the composed request with no spend. Composite it onto footage with the assemble overlay builders. Full guide:docs/CLI_REFERENCE.md.Motion-graphics overlays (
vclaw video overlay) — composite a graphic onto a clip (time-gated, faded, positioned — pairs withgen-imageto drop a generated screen/alert onto footage, font-free + real-render validated), or burn a pulsing--alert/ boxed--lower-thirdcaption (FFmpegdrawtext, needs a libfreetype build).--dry-runprints the planned ffmpeg command. Full guide:docs/CLI_REFERENCE.md.Motion-overlay reels (
vclaw video motion-overlay) — turn an existing talking-head video into a reel with motion-graphics overlays synced to the speech via Google Flow's Omni Flash V2V (kinetic typography / icons / metaphors painted on the footage, original voice preserved). Plan/dry by default — ingest → Gemini STT (or--transcript) → sentence-boundary slice into ≤10s takes → per-take overlay-prompt composition → work folder + manifest +--previewreview surface, no spend.--execute --confirm-spendrenders each take (V2V → audio-restore → clip-stitch). Four layouts (split/overlay/motion-only/avatar-host); theavatar-hostlayout fills the frame with an identity-locked character host (go-bananasgenerate_with_character→ Veo I2V) and needs--gb-character <Name:ID>. Full guide:docs/MOTION_OVERLAY.md.Music videos (
vclaw video music-video) — the vocal-synced, beat-exact assembler. From a hand-authored config (song + clip registry + B-roll pools + an explicit vocal map or a transcript it auto-classifies into rap / hook / instrumental / outro by word density), it pins each performer to their own vocal time-aligned across B-roll cutaways (lips stay locked to the muxed song), cuts every segment frame-exact (-frames:v, never-t, so there is zero cumulative drift), then concats + applies one grade pass + muxes the song. Fully local — ffmpeg only, no provider, no spend. Plan/dry by default;--executerenders and asserts the built master matches the plan within one frame. Full guide:docs/CLI_REFERENCE.md.Music-video titles (
vclaw video title-card) — burn the titles you see in music videos (a faded lower-third + a centred end card that holds to EOF) onto a finished cut. Text is rasterized via Pillow + RAQM, so it works on any ffmpeg build (no libfreetype) and any script — including Devanagari/Arabic (vowel marks shape + stack correctly). Each card is a looped PNG input so delayed alpha fades animate. Fully local, no spend;--dry-runplans, omit to render. Full guide:docs/CLI_REFERENCE.md.HD finish / upscale (
vclaw video finish) — upscale a rendered cut to a clean HD master via Topaz (hosted Proteus/Gaia/Starlight through the apiz/xskill aggregator, or a local Topaz CLI). The anti-plastic "detail-not-sharp" recipe (denoise + halo off, film grain kept + clamped to the real 0.1 cap, detail recovery high) avoids waxy skin. Hosted backends are paid → refuses without--confirm-spend(--dry-runplans free);topaz-localis free. Hosted needsAPIZ_API_KEY/XSKILL_API_KEY. Full guide:docs/CLI_REFERENCE.md.Audio-driven lip-sync (
vclaw video lipsync) — a still/keyframe + a vocal track → a lip-synced talking-head clip via OmniHuman v1.5 (apiz/xskill). Uploads image+audio → submits → awaits → downloads → normalizes to CFR fps + even dims (OmniHuman's 25fps/odd dims otherwise break frame-accurate seeking). Drives an external vocal (a rapper's verse, a singer's hook) — the lane's way to put a performer's real vocal on their face. Audio cap enforced up front (1080p≤30s, 720p≤60s). Paid → refuses without--confirm-spend(--dry-runplans free); needsAPIZ_API_KEY/XSKILL_API_KEY. Full guide:docs/CLI_REFERENCE.md.
🧠 Skills ecosystem
The repo bundles a curated skills library — agent-invokable workflows split into video (production) and workflow (orchestration) categories. Skills are not equal: a small hierarchy keeps the surface sane.
| Role | Examples | When you reach for it |
|---|---|---|
| Canonical entry | video-framework, brand-presenter | Generic / unspecified video request — the entry skill routes into a specialist. |
| Specialist | video-storyboard, video-clone-ad, movie-director, video-post, ... | The mode is clearly known up front. |
| Compatibility alias | davendra-presenter, nex-presenter, bunty | Personal/brand presets that delegate into brand-presenter. |
| Workflow | doctor, pipeline, worker, studio-mode, ... | Orchestration, debugging, ops — independent of any one production mode. |
Rule of thumb: start at a canonical entry, specialize only when the mode is clearly known.
Quick skill map
| Skill | Role | One-liner |
|---|---|---|
| video-framework | canonical | Routes across copy/create/narrated/presentation/long-form/film/UGC. |
| brand-presenter | canonical (generic) | Slide deck → narrated presenter video over a branded host profile. |
| video-storyboard | native clean-room | Brief or clone plan → scene-by-scene storyboard artifact. |
| video-analyze-template | native clean-room | Reference video → reusable template packet (Gemini auto-mode). |
| video-clone-ad | native clean-room | Saved template → new product/brand via clone-execute. |
| video-thumbnail-lab | native clean-room | Final render → thumbnail + platform variants. |
| movie-director | imported | Multi-scene Director-mode (12 genres, two-phase approval, structured entry modes). |
| video-replicator | imported (deep) | 7-mode legacy pipeline (COPY/CREATE/NARRATED/PRESENTATION/LONG-FORM/FILM/UGC). |
| video-post | imported | Post-render verify, variants, thumbnails, archive. |
| higgsfield-generate | external bridge | Higgsfield CLI bridge for Marketing Studio, product photoshoots, Soul IDs, and virality scoring. |
| character-creator | imported | Go Bananas characters with multi-view reference sheets. |
| character-library | imported | Audit / patch / delete entries in the shared GB library. |
| seedance-prompts | imported | Seedance prompt reference library (incl. music-video patterns). |
| youtube-audio | imported | YouTube → MP3/MP4 via yt-dlp + FFmpeg. |
| ugc | imported | Belief-driven UGC campaign generator (E5 method). |
Compatibility aliases (all delegate into brand-presenter):
davendra-presenter · nex-presenter · bunty
| Group | Skills |
|---|---|
| Multi-agent orchestration | worker · pipeline · studio-mode |
| Diagnostics & exploration | doctor · build-fix · deepsearch · deep-interview |
| Review & governance | review |
| Operational utilities | ai-slop-cleaner · configure-notifications · skill · note · help · web-clone |
Generic orchestration skills (autopilot, ralph/ralph-init/ralplan, team, cancel,
trace, hud, git-master, code-review, security-review, omx-setup) were culled —
they duplicated the operator's global plugin set with zero repo-specific content.
📖 Full per-skill reference with descriptions, key features, and when-to-reach-for guidance:
docs/SKILLS.md · machine-readable index: skills/catalog.json
🗂️ Obsidian operator workspace
The repo writes a vault of machine-generated notes that mirrors canonical project state — dashboards, queues, metrics, health, timelines, dependencies, and per-project notes — all regenerated from one command.
Obsidian is a view, not the source of truth. The repo state on disk is canonical; the vault is a regenerable rendering of it.
What you get
- A control plane that isn't a terminal — browse the active queue, blockers, owners, dependencies, and review-state ladder from a normal Obsidian sidebar.
- 12 dashboard notes —
Dashboard,Active,Needs Review,Blocked,Complete,Metrics,Health,Next Actions,Dependencies,Timeline,Changes,Owner Workload. - One project note per project — rich frontmatter (lifecycle state, owner, priority, due risk, blockers, character bindings, review-state, execution profile, genre, runtime) plus body sections for stage status, recent events, artifact links, and cost estimates.
- Honest health visibility — backed by the same
doctor-portfolioandmetricsmachinery that drives reporting, including missing-approval and stale-review counts. - Zero lock-in — plain markdown files anywhere you point
--output-dir. Delete and rebuild any time.
Three commands
vclaw video scaffold-obsidian-vault --output-dir ./ops/obsidian # one-time scaffold
vclaw video export-obsidian --project my-project --output-dir ./ops/obsidian/Projects # single-project export
vclaw video sync-obsidian --root . --output-dir ./ops/obsidian # full regenerate (the common case)📖 Full operator guide — vault layout, every dashboard note explained, frontmatter schema, daily loop,
common workflows: docs/OBSIDIAN.md
📦 Artifacts & schemas
Every stage writes a canonical JSON artifact under projects/<slug>/artifacts/. Schemas under
schemas/video/ are the machine-readable source of truth. Key artifacts:
- brief →
brief.json - storyboard →
storyboard.json(+ optionalstoryboard.mdreview) - story bible →
story-bible.json(deterministic continuity reference — cast, settings, props, scene timeline, and continuity notes derived from brief + storyboard + character profiles; auto-written at storyboard time so downstream generation stays consistent across scenes and regenerations) - asset manifest →
asset-manifest.json - readiness →
readiness.json - clone plan →
clone-plan.json - execution plan →
execution-plan.json - execution report →
execution-report.json - review report →
review-report.json - publish report →
publish-report.json - analyze output →
analyze-output.json - character consistency →
character-consistency.json
Artifacts are append-only via artifacts/history/ and every write emits a machine-readable event to events/events.jsonl.
✅ Testing & smoke matrix
Unit + CLI contract tests run via node:test:
npm test # build + full suite
npm run test:node # rerun compiled tests only
node --test dist/tests/cli-full-flow.test.js # single test fileReproducible end-to-end smokes — each builds first:
| Command | Covers | Run after |
|---|---|---|
| npm run smoke:runtime | init → brief → storyboard → assets → plan → produce dry-run → status → report → Obsidian | runtime / artifact changes |
| npm run smoke:native-veo | Native Veo (veo-useapi) path | changes to built-in Veo path |
| npm run smoke:character-hydration | Create-time cast hydration + approval-gate cost | character-profile or cost changes |
| npm run smoke:execution-cancel | Submit → cancel → failed-assets transition | adapter cancel or project cancel changes |
| npm run smoke:portfolio | init → brief → storyboard → plan → index → report → export-csv | index/report/CSV visibility changes |
| npm run smoke:story-bible-image | create → storyboard continuity bible + content-fix propagation, image-only path | story-bible artifact or storyboard-time continuity changes |
| npm run e2e:image-storyboard | Go Bananas still manifest → local image assets → scene candidates → selections → reference sheet → readiness/preflight/plan plus Review UI API checks for request queues, candidate recording, media proxy, artifact-backed upscales, and final review decision, with no video generation. Uses a temporary root unless --root is passed to the script. | review UI or image-storyboard workflow changes |
| npm run e2e:image-storyboard:examples | Image-storyboard E2E plus non-video example smokes; writes a human-readable prompt, command, API, and artifact ledger in the run root. Uses a temporary root by default. | before testing live provider credentials |
Guardrails — fast local sanity checks:
| Command | Watches |
|---|---|
| npm run check:movie-director-wrappers | Bundled Director helper scripts target the clean-room CLI |
| npm run check:cleanroom-docs | Clean-room-facing docs/skills don't reference stale legacy paths |
| npm run check:skill-frontdoor | Repo-local skill front door stays consistent |
| npm run check:release-readiness-lite | One-shot: generated-artifact ignore guard + build + tests + smokes + isolated image-storyboard E2E + guardrails |
Use check:release-readiness-lite as the pre-flight before any non-trivial change lands.
It also checks that local verification output folders, Playwright state, and
Review UI screenshots stay ignored so release diffs remain source-only unless a
fixture update is intentional.
📍 What's shipped
The master plan has 50+ implemented slices across lifecycle state, ops visibility, execution planning, adapter-backed runtime, character hydration, director approval gate, review-freshness enforcement, create-time parity, cost visibility, environment verification, post-production utilities, and packaged release-readiness. Themes:
- Lifecycle & contracts — canonical artifacts, stage checkpoints, pipeline manifests, stage guards, legacy import bridge.
- Portfolio ops — index · metrics · next-actions · workload · dependencies · readiness · doctor · scorecards.
- Reporting — report · snapshots · history · diffs · trends · CSV export · Obsidian export/sync/dashboards.
- Runtime — adapter submission, dry-run, polling, output ingest, native Seedance transport, native Veo transport, execution cancel.
- Character subsystem — project profiles, GB-id anchors, consistency enforcement, library hygiene, auto-create, import-library, cast provenance.
- Director lane — storyboard-first approval, preflight hazards, review freshness ladder, review-as-runtime-invariant, cost visibility.
- Continuity & assembly — deterministic story bible at storyboard time, assemble media-QC (ffprobe clips + master), and narration-fit timing planner.
- Prompt-quality preflight — six Seedance-handbook anti-pattern checks (adjective soup, multiple actions, multiple camera moves, style-word overload, literary emotion, overlong prompts) via
director-preflight; warnings by default,DIRECTOR_STRICT_PROMPT_QUALITY=1to block. - Dialogue preflight — duration-aware dialogue fit checks via
director-preflight; warnings by default,DIRECTOR_STRICT_DIALOGUE_FIT=1to block. - Reference sheets — role-tagged sheets (identity, outfit-material, environment, motion-camera, palette-mood) with closed role vocabularies, per-scene bindings, Go Bananas refs, and identity-per-character-bound-scene enforcement in director readiness/preflight.
- Scene candidates — per-scene append-only candidate registry + mutable selection ledger,
produce --scene <n>partial reruns, chain-from-prev with hard-fail on missing upstream, selection-coverage stage guards on review/publish, per-scene Obsidian notes, and a migration helper for legacy single-generation projects. - Generation telemetry — route/task/config/cost/timing/output events recorded after execute and poll, with completed Seedance USD samples feeding
cost-estimate. - Clone workflow enrichment — analyze/template/clone artifacts carry style layers, beat compression, technical notes, dialogue notes, and workflow checklists.
- Front door —
video create/auto/iterate/run-pipeline/approvewith genre-aware defaults. - Release-readiness — packaged one-build smoke bundle + doc guardrails.
Honest remaining gaps (tracked in docs/MASTER_PLAN_ALIGNMENT.md):
- Deeper
video createparity with the legacy Director/movie surface (richer decomposition/tuning). - Historical project migration from shallow stage-guess → structured reconciliation.
- Provider-contract hardening (shared error taxonomy, richer recovery guidance).
Current status: npm test green · check:release-readiness-lite passing.
📚 Documentation map
Read it on the site: everything below is rendered (with search + diagrams) at videoclaw-docs.vercel.app — jump to the Guide, a page for every feature, the skills catalog, or the full reference. The links below open the same docs on the live site; each also exists as raw markdown under
docs/.
| # | Doc | What it gives you |
|---|---|---|
| 1 | Production workflow | Operator-first workflow: make, review/fix, and manage video projects |
| 2 | Architecture | Layer map + canonical flow |
| 3 | CLI reference | Full command reference |
| 4 | Skills | Comprehensive per-skill reference with features and when-to-reach-for guidance |
| 5 | Story bible | Story bible reference — the deterministic continuity artifact, when it's generated, its shape, and how downstream stages use it |
| 6 | Assemble | vclaw video assemble operator guide — pipeline stages, media QC, narration fit, API keys, dry-run vs real render, validation status |
| 7 | Obsidian | Obsidian operator workspace deep guide — vault layout, dashboard notes, frontmatter schema, daily loop |
| 8 | Reference sheets | Reference sheets operator guide — 5 sheet types, role vocabularies, CLI commands, readiness/preflight semantics, GB integration |
| 9 | Scene candidates | Scene candidates operator guide — append-only candidates + mutable selection, 9 CLI commands, partial reruns, chain-from-prev, migration |
| 10 | Prompt quality | Prompt-quality preflight operator guide — Seedance-handbook anti-pattern checks, thresholds, strict-mode blocking |
| 11 | Generation telemetry | Generation event ledger + cost-estimate telemetry behavior |
| 12 | Operations | Day-to-day maintenance loop |
| 13 | Templates | Template store + clone bridge |
| 14 | Migration | Legacy → clean-room moves |
| 15 | Deprecation | Alias + deprecation status |
| 16 | Release readiness | Release checklist |
| 17 | Master plan alignment | What's shipped + remaining gaps |
Skill deep-dives (also indexed in docs/SKILLS.md):
skills/video-framework/SKILL.md·skills/brand-presenter/SKILL.md— canonical entriesskills/video-storyboard/SKILL.md·skills/video-clone-ad/SKILL.md·skills/video-analyze-template/SKILL.md— clean-room native specialists- Full catalog in
skills/README.md· machine-readable inskills/catalog.json
🧬 Where it came from
videoclaw (the current videoclaw-v3 repo) is the merged successor of two predecessor codebases: the older videoclaw package (which had a heavy orchestration layer on top of a video pipeline) and the clean-room vclaw-video-core rebuild (which kept only the pipeline, with strict on-disk artifacts and approval gates). This repo takes the clean-room core as its foundation, drops the old orchestration layer (Claude Code, Codex, and the OMC plugin cover those workflows natively now), and ports forward the pieces worth keeping: the vclaw-cli Bun package for Google Flow + UseAPI, the Runway transport, a curated Python pipeline, and the Omni Flash backend additions. See MERGE_PLAN.md for the full rationale and per-phase commits.
🧭 Principles
- Clean-room implementation only. No code inherited from the legacy repo; every module against an explicit contract.
- Video-first command surface. One
vclaw video ...namespace that mirrors the production flow. - Explicit stage artifacts. Every stage writes machine-readable JSON; the state on disk is the source of truth.
- No silent fallback across materially different provider paths. Fail hard and say what went wrong.
🤝 Contributing
Read AGENTS.md for the autonomy directive, coding style, and commit protocol, then
CLAUDE.md for non-obvious conventions (NodeNext ESM .js extensions, review-state ladder,
dist/ is generated, etc.). The expected pre-flight before a non-trivial change lands:
npm run check:release-readiness-lite🪪 License
Source-available under a custom proprietary license — see LICENSE.
- ✅ Free for personal, educational, research, evaluation, and non-commercial internal use
- 💼 Commercial / production use requires a paid license — contact the repository owner via github.com/davendra/videoclaw
