videoclaw

v3.0.0-alpha.1

Published

12 days ago

Agent-friendly multi-provider AI video CLI (Veo, Seedance, Runway, Omni Flash) with on-disk JSON artifacts, approval gates, storyboard review UI, and portfolio ops. v3 — narrow, deterministic, agent-target CLI.

videoclaw

Turn a one-line idea into a finished AI video — step by step, in the open, with a human approval before anything expensive runs.

A command-line tool (vclaw) that takes a plain-English idea like "a 15-second ad for my coffee brand" and walks it all the way to a reviewed, published video — using AI video services like Veo, Seedance, and Runway. Every step is saved to a file you can read, so nothing is hidden and you can stop, inspect, or replay any stage.

What is it? · What you can do · How it works · The main parts · Quickstart · Deep docs

📚 Full documentation site → videoclaw-docs.vercel.app

The complete, idiot-proof and agent-ready docs: an interactive guide, a page for every feature (30+), a recipe book of what you can make, the skills catalog, and all the deep reference docs — with diagrams throughout.

New here? The easiest path is to drive it by talking to Claude Code (or any agent host); there's a dedicated how-it-works page for agents too. Once you're in Claude Code in this repo, just type /concierge (or /clawbot) — or say "make me a video" — and Clawbot, VideoClaw's mascot and concierge, walks you from idea to finished film, with a preview and your approval before anything costs money.

🗺️ Explore the docs

Everything lives at videoclaw-docs.vercel.app (there's a search box on every page).

Guide — Start here · Use it with Claude Code · Install & setup · Your first video · What videos can you make? · How it works · Storyboard vs Director · Characters · Providers · Review & publish · Assemble & polish · Troubleshooting · Cheat sheet

Skills & tools — Skills catalog · Video skills · Workflow skills · Scripts & tooling · For agents

Reference — Reference home · Architecture · CLI reference · Production workflow · Provider platform · Story bible · Assemble · Internal plans & specs

All features

Create — projects & lifecycle · one-shot create · briefs · brand DNA · storyboards · storyboard grid · story bible
Direct & refine — director mode · prompt quality · multi-shot · filmmaking & cinematography
Keep it consistent — characters · reference sheets · scene candidates · Seedance Asset Library
Generate — providers & routing · execution runtime · overnight batch queue
Finish — assembly · review UI · preview portal · publishing
Manage — templates & cloning · portfolio & ops · doctor · Obsidian · telemetry & cost
Plan & agents — studio planner · agent surface · Veo CLI

🎯 What is videoclaw? (in plain English)

videoclaw is a tool you run in your terminal that makes AI videos for you, one careful step at a time.

You give it an idea. It writes a short brief, plans the shots (a storyboard), generates the video clips with an AI provider, stitches them together with narration and music, lets a human approve the result, and marks it published. That's it.

Think of it like an assembly line for video. Your idea goes in one end; a finished MP4 comes out the other. At each station along the line, the machine writes down exactly what it did in a file you can open and read. If something looks wrong, you can rewind to any station and try again — you never have to start over from scratch.

Two things make it different from the usual "type a prompt, get a video" tools:

It never hides what it's doing. Every stage leaves a plain, readable file behind. And if a video provider fails, it tells you loudly — it does not quietly swap in a different provider and pretend it worked.
It has a "stop and check" button. In director mode it refuses to spend money on the real render until a human has looked at the storyboard and said "yes, go."

New here? You don't need to understand any of the technical sections lower down. Read the next four short sections and you'll know what this whole thing does.

Who is it for?

Creators & marketers who want repeatable, reviewable AI videos — product ads, presenter explainers, UGC campaigns, music videos.
Small teams juggling many video projects who need a single dashboard of what's stuck, what needs a look, and what's ready to ship.
AI agents — everything is machine-readable, so an automated agent can drive the whole pipeline on its own.

✨ What you can do with it

Go from idea → finished video without leaving the terminal.
Use several AI video engines (Veo, Seedance, Runway) through one consistent set of commands — no need to learn each provider's quirks.
Keep characters consistent — the same person, outfit, set, and props look the same in every scene (via character profiles, reference sheets, and the story bible).
Approve in your browser before any expensive render runs — open the Review UI, look at the stills, click approve or regenerate.
Add the finishing touches — narration, background music, subtitles, thumbnails, and vertical / square / looping variants for different platforms.
Run many projects at once and see them all on a dashboard: what's blocked, what's stale, what needs review, what's ready to publish.
Trust the result — a project is only "ready" when its review report literally says pass. No guessing.
Rehearse for free — almost every command has a --dry-run that plans the whole thing without spending a credit.

🪜 How it works, step by step

Here is the whole assembly line in plain words. Each step is a command, and each step saves its work to a file.

| Step | Command | What actually happens | |---|---|---| | 1 | init | Create a new project folder to hold everything. | | 2 | brief | Turn your one-line idea into a short written brief. | | 3 | storyboard | Break the brief into scenes — and auto-build a story bible so characters, settings, and props stay consistent. | | 4 | (director mode) preflight + approval | Check for problems, then wait for your "go" before spending anything. | | 5 | plan | Pick the AI provider and prepare the exact request to send. | | 6 | produce | Actually generate the clips. (Add --dry-run to rehearse for free.) | | 7 | assemble | Stitch clips + narration + music into one MP4, then quality-check it. | | 8 | review | A human (or the Review UI) approves the result. | | 9 | publish | Mark it done and hand it off. |

You don't always run these by hand — vclaw video create can do the whole front of the line in one shot — but this is the path everything follows underneath.

🧩 The main parts, explained simply

videoclaw is made of a few moving pieces. Here's each one in everyday terms:

| Part | In plain terms | |---|---| | The project folder (projects/<slug>/) | A single folder that holds everything about one video — the brief, storyboard, clips, approvals, history. This folder is the source of truth; the commands are just a tidy way to work on it. | | Stages & checkpoints | The assembly-line steps above. Each one writes a file and records that it finished, so you can replay or rewind any stage safely. | | Two modes: storyboard vs director | Storyboard mode is fast and runs straight through. Director mode adds the "stop and approve" gate — it won't render for real until you say so. | | The Review UI | A small web page (vclaw video review-ui) where a human looks at the storyboard stills and clicks approve or regenerate. No code needed. | | Characters & the Story Bible | Tools that keep the same people, places, and props looking identical across every scene — the #1 thing that breaks in multi-scene AI video. The story bible is an auto-written "continuity reference" for the whole project. | | Assembly | The step that turns raw clips into a polished, narrated MP4 — adds a title card, TTS narration, slide animation, music, then runs an automatic media quality check on the result. | | Providers | The AI engines that actually make the video (Veo, Seedance, Runway). videoclaw routes to the right one and never silently switches if it fails. | | Portfolio & ops | Dashboards for when you have many projects — status, metrics, next-actions, and a doctor that tells you the single safest next step. | | Obsidian workspace | The same project data, rendered as browsable notes you can read in Obsidian instead of the terminal. | | Skills | Pre-built, ready-to-run workflows for common jobs (make a presenter video, clone an ad, build a character) that an AI agent can invoke. |

That's the whole picture. Everything below is reference detail for power users, operators, and AI agents — you can stop here and still use the tool.

🚀 Quickstart

These commands are for a source checkout. If you installed the published package, replace node dist/cli/vclaw.js with vclaw.

First, get it running and check your setup:

npm install                                   # Node 20+
npm run build                                 # compile the CLI
npm test                                      # run the full test suite
node dist/cli/vclaw.js video providers        # show which AI providers are ready

Then run one full lifecycle against a throwaway project (no credits spent — note the --dry-run):

node dist/cli/vclaw.js video init demo
node dist/cli/vclaw.js video brief    --project demo --title "Demo" --intent "A 15s product tease"
node dist/cli/vclaw.js video storyboard --project demo --scene "open on product" --scene "close on logo"
node dist/cli/vclaw.js video plan     --project demo
node dist/cli/vclaw.js video produce  --project demo --dry-run
node dist/cli/vclaw.js video status   --project demo

Add --auto-chain to produce/execute to render the whole storyboard sequentially, each scene seeded from the previous scene's output video (continuity chain, unattended).

Or just run the packaged pre-flight that does build + tests + smokes + guardrails for you:

npm run check:release-readiness-lite

🎬 Production workflow (the three common jobs)

Most people enter through one of three paths:

Make a campaign video — create the director project, open review-ui, lock real storyboard stills, attach artifact-backed 4K stills, then publish only after the saved review report has verdict: "pass" and metrics.publishReady: true.
Review and fix a project — use status, next-actions, doctor-project, and review-ui to find the single safest next step.
Manage a portfolio — use metrics, report, export-csv, and sync-obsidian to see blocked, stale, review-needed, and publish-ready work.

The advanced command surface stays available, but production trust comes from one rule: a handoff is ready only when review-report.json has verdict: "pass" and metrics.publishReady: true. The simple video review --verdict pass command intentionally writes that approval for already-reviewed projects; for director storyboard-image handoffs, use review-ui or review-autopilot so publishReady is derived from locked scene candidates, artifact-backed 4K stills, and final assembly approvals.

Full operator guide: docs/PRODUCTION_WORKFLOW.md. Handoff checklist: docs/OPERATOR_HANDOFF.md.

🤖 For AI agents

You are landing in a TypeScript/Node 20 CLI repo. If you do nothing else, read these in order:
CLAUDE.md — non-obvious conventions, single-test command, review-state invariant, agent-first orientation.
AGENTS.md — autonomy directive, coding style, commit/PR format, security expectations.
docs/ARCHITECTURE.md — layer map and the canonical project flow.
skills/catalog.json — machine-readable skill surface (don't scrape markdown).
docs/MASTER_PLAN_ALIGNMENT.md — what ships today + honest remaining gaps.
Canonical entry skills — start broad, specialize later: video-framework · brand-presenter Contracts — schemas/video/*.json is the source of truth for artifact shapes. Tests — src/tests/*.test.ts run via node --test dist/tests/*.test.js.
Don't: edit dist/ · drop .js extensions from relative imports (NodeNext ESM requires them) · add silent fallback across materially different provider routes · commit .omx/ / .vclaw/ / secrets.

Agent integration

videoclaw is built as a target for agent hosts, not as an orchestrator.

One-call discovery: vclaw schema --json returns the full command contract.
MCP server: vclaw mcp serve exposes read-only state queries to MCP-aware hosts.
Sample skills: see mcp/skills-pack/ for Claude Code skill templates.

See docs/AGENT_INTEGRATION_RESEARCH.md for the design rationale.

🏗️ Architecture

The system is layered: a thin command-line front, a domain core that does the real work, and an adapter tier that talks to the AI providers. Each stage writes to the artifact/checkpoint/event ledger on disk.

flowchart TB
    U[👤 Operator / AI agent]
    U -->|vclaw video ...| CLI["src/cli/vclaw.ts<br/>single entrypoint"]
    CLI --> CORE["src/video/*<br/>domain modules"]

    CORE --> ART[Artifacts · JSON]
    CORE --> CKPT[Checkpoints · stage state]
    CORE --> EVT["Events · events.jsonl"]
    CORE --> RT[Execution runtime]

    RT --> PP[provider-platform/<br/>route descriptors]
    RT --> PM[pipeline-manifests/<br/>storyboard · director]
    RT --> ADP[Adapter layer]

    ADP -->|built-in binary| NAT["Native transports<br/>Seedance · Veo"]
    ADP -->|command shim| CMD["_SUBMIT_CMD<br/>_POLL_CMD · _CANCEL_CMD"]
    ADP -->|custom override| EXT["VCLAW_*_ADAPTER<br/>external binary"]

    CORE -. validates against .-> SCH["schemas/video/*.json"]
    CORE --> OBS[Obsidian export / sync]
    CORE --> REP[Reports · CSV · Snapshots · Diffs]

CLI layer — argparse + dispatch only; no business logic.
Domain layer (src/video/*) — small, single-purpose modules. Each file owns one concept (artifacts, checkpoints, readiness, execution-plan, execution-runtime, doctor, metrics, next-actions, project-index, obsidian-export, etc.).
Provider platform — route descriptors for veo-useapi, seedance-direct, runway-useapi, dreamina-useapi.
Adapter layer — three resolution strategies (custom binary → built-in adapter with command shim → native in-process transport). Explicit fall-through, never silent.
Schemas — JSON Schema contracts under schemas/video/ are the source of truth for every artifact shape.

🔁 Project lifecycle

flowchart LR
    init([init]) --> brief([brief])
    brief --> sb([storyboard])
    sb --> ready{{readiness}}
    ready -->|storyboard mode| runPlan([plan / produce])
    ready -->|director mode| pre{{director-preflight<br/>content · refs · pronouns}}
    pre -->|hazards| fix[auto-fix or<br/>storyboard-review]
    fix --> pre
    pre -->|pass| gate{{approval gate<br/>VIDEOCLAW_APPROVE_STORYBOARD=1}}
    gate -.awaiting-approval.-> ops[(ops queue:<br/>needs-review)]
    gate -->|approved| runPlan
    runPlan --> execStatus([execute-status<br/>poll adapter])
    execStatus --> ingest[[ingest outputs]]
    ingest --> assets([assets])
    assets --> review([review])
    review --> publish([publish])
    execStatus -. operator .-> cancel([execute-cancel])

On-disk shape of a project

projects/<slug>/
├── project.json                 # manifest: slug, mode, state, metadata, execution profile
├── artifacts/
│   ├── brief.json               # canonical brief
│   ├── storyboard.json          # scenes + character bindings
│   ├── story-bible.json         # deterministic continuity reference (cast · settings · props · timeline)
│   ├── asset-manifest.json      # per-scene assets
│   ├── clone-plan.json          # optional: template → clone decisions
│   ├── execution-plan.json      # selected route + payload
│   ├── execution-report.json    # submit · poll · output ingest
│   ├── readiness.json
│   ├── character-consistency.json
│   ├── review-report.json
│   ├── publish-report.json
│   ├── analyze-output.json
│   └── history/                 # append-only artifact snapshots
├── checkpoints/                 # brief · storyboard · assets · review · publish (+ state)
├── events/events.jsonl          # append-only timeline
├── characters/characters.json   # optional GB-anchored profiles
├── storyboard.md                # director-mode approval review (human-readable)
└── state/                       # derived state cache

🧭 Production modes

Every command accepts --mode storyboard|director. The mode drives the pipeline manifest and the gate semantics.

| Dimension | storyboard mode | director mode | |---|---|---| | Default stage set | init → brief → storyboard → assets → review → publish | Same, plus preflight + approval gate before execution | | Approval gate | none | VIDEOCLAW_APPROVE_STORYBOARD=1 required before provider submission | | Storyboard review file | optional | projects/<slug>/storyboard.md auto-generated with character binding table and cost estimate | | Preflight checks | readiness + character-consistency | + content-hazard detection · GB-id validation · remote-ref probe · pronoun drift · repeated-scene warnings | | Preflight bypasses | n/a | DIRECTOR_AUTO_FIX_CONTENT=1 (rewrite hazards) · SKIP_DIRECTOR_PREFLIGHT=1 | | Ops visibility | active | awaiting-approval → surfaced as needs-review across index · metrics · dashboards |

The normalized review-state ladder (missing → current → stale) flows through status, index, report, export-csv, Obsidian export, dashboards, next-actions, snapshot diffs, and the doctor layer. A stale director review blocks execute/execute-status even if approval is set — review freshness is a first-class runtime invariant.

🔌 Provider routing

flowchart TD
    Start[["route ∈ { veo-useapi · seedance-direct ·<br/>runway-useapi · dreamina-useapi }"]]
    Start --> Q1{"VCLAW_*_ADAPTER set?"}
    Q1 -->|yes| Custom[["Custom adapter binary<br/>stdin → JSON, stdout → JSON"]]
    Q1 -->|no| Q2{"Built-in adapter supports route?<br/>(seedance-direct · veo-useapi · runway-useapi)"}
    Q2 -->|no| Fail([["❌ hard fail<br/>no silent fallback"]])
    Q2 -->|yes| Q3{"_SUBMIT_CMD / _POLL_CMD set?"}
    Q3 -->|yes| Shim[["Command shim<br/>through built-in adapter"]]
    Q3 -->|no| Q4{"Native creds available?<br/>SUTUI_API_KEY (seedance) · local vclaw-cli (veo) · USEAPI_API_TOKEN (runway)"}
    Q4 -->|yes| Native[["✅ Native in-process transport"]]
    Q4 -->|no| Fail

Environment variables

| Variable | Used by | Purpose | |---|---|---| | VCLAW_VEO_USEAPI_ADAPTER | veo-useapi | custom adapter binary override | | VCLAW_SEEDANCE_DIRECT_ADAPTER | seedance-direct | custom adapter binary override | | VCLAW_RUNWAY_USEAPI_ADAPTER | runway-useapi | custom adapter binary override | | VCLAW_SEEDANCE_DIRECT_SUBMIT_CMD · _POLL_CMD · _CANCEL_CMD | seedance-direct | command shim through built-in adapter | | SUTUI_API_KEY | seedance-direct (native) | XSkill API credentials for in-process transport | | VIDEOCLAW_APPROVE_STORYBOARD | director mode | =1 to approve storyboard and allow provider submission | | DIRECTOR_AUTO_FIX_CONTENT | director mode | =1 to rewrite provider-risk phrases in storyboard | | SKIP_DIRECTOR_PREFLIGHT | director mode | =1 to bypass preflight (use sparingly) | | DIRECTOR_STRICT_PROMPT_QUALITY | director mode | =1 to promote prompt-quality warnings to blockers | | DIRECTOR_STRICT_DIALOGUE_FIT | director mode | =1 to promote dialogue-duration warnings to blockers | | GEMINI_API_KEYS · GOOGLE_API_KEYS · GOOGLE_API_KEY | analyze-template / analyze | Gemini key pool (round-robin with per-key cooldown) | | VCLAW_GEMINI_API_ENDPOINT | Gemini pool | override HTTP endpoint (local or alt Gemini-compatible) | | GO_BANANAS_API_KEY | preflight | validate stored character GB-id anchors |

Google Flow inline @-markers (veo-useapi)

useapi.net's Google Flow v1 API (blog 260609) accepts inline @-mention markers in prompt text that anchor a body-slot reference to a position in the prompt:

| Marker | Index range | Endpoint | |---|---|---| | @character_N | 1–7 | POST /videos and POST /images | | @referenceImage_N | 1–7 | POST /videos | | @referenceAudio_N | 1–5 | POST /videos | | @reference_N | 1–10 | POST /images |

Markers are case-insensitive and opt-in (a slot without a marker is fine; a marker without a matching body slot makes the API 400). The grammar is reserved through videoclaw's prompt pipeline (@Name tag resolution preserves the tokens verbatim, like @imageN) and is veo-useapi-route-only — on any other route the tokens are stripped from the scene prompt with a warning. V2V deliberately has no marker (referenceVideo_1 stays flag-only via --ref-video). Helper module: src/video/flow-markers.ts; full details in docs/CLI_REFERENCE.md.

Auto-injection: on veo-useapi, when a scene's prompt tags a character that has a registered Flow ref (flow-characters.json), buildExecutionPayload rewrites the @Name tag into its canonical @character_N marker automatically. Slot order = scene cast order first, then tag-only characters (capped at 7, overflow warned); hand-authored markers pass through untouched. Cast-name matching stays exact-case (the legacy lookup, verbatim) — which is exactly why a tagless prompt produces a byte-identical payload; only @Name tag matching is case-insensitive (@clawbot resolves to a registered Clawbot). Note that a ref-registered character's tag no longer also attaches its loose portrait image — the saved Flow character bundles its identity images. Characters without a Flow ref keep the normal descriptor substitution (including portrait collection).

🧰 Command surface

Full reference: docs/CLI_REFERENCE.md. Condensed groups:

Lifecycle

init · create · auto · iterate · run-pipeline · brief · storyboard · assets · review · publish · approve

Readiness · planning · runtime

readiness · plan (alias execution-plan) · produce (alias execute) · execute-status · execute-cancel · director-preflight · storyboard-review · review-ui · review-autopilot

review-ui is the local human-in-the-loop station for the Seedance storyboard workflow documented in docs/REFERENCE_VIDEO_SEEDANCE_MOTION_DESIGN_WORKFLOW.md. Use its director defaults to save a production ledger that follows the still-frame-first, start/end-frame, bridge-pose, variant-pass, and post-retiming recipe.

review-autopilot lets the agent do that handoff without manual clicks once storyboard still candidates exist. It locks the best available stills, promotes artifact-backed upscaled handoff assets, fills the reference and assembly gates, and writes the same review artifacts as the browser station.

Templates · cloning · storyboard templates

analyze · analyze-template · template-create · template-save · template-list · template-show · template-validate · clone-plan · clone-init · clone-ad · clone-execute · storyboard-from-clone · storyboard-template-list · storyboard-template-show

Character subsystem

character-add · character-list · character-show · character-consistency · character-auto-create · environment-auto-create · character-import-library · find-library · library find · library clean · list-library · seedance-register-assets · flow-register-characters · flow-register-voices · voice-clone · show-bible

Portfolio · ops · reporting

list · index · metrics · workload · next-actions · dependencies · status · doctor-project · doctor-portfolio · report · report-snapshot · report-history · report-diff · trends · export-csv · artifact-history · verify-env · cost-estimate

Metadata

set-meta · set-execution-profile · import-legacy

Post-production

remix-narrated · verify-final · make-vertical · make-square · make-loop · thumbnail · archive-project · motion-overlay · subtitle burn-in

Obsidian

export-obsidian · sync-obsidian · scaffold-obsidian-vault

Multi-shot prompt

multi-shot — scaffold / validate / Gemini-author timecoded multi-shot cinematic prompts, with preset discovery, storyboard scene hydration, provider-shaped defaults, and parsed shot artifacts (docs)

Director & brand layer

director-blueprint — validate + persist the project visual bible (artifacts/project-blueprint.json); filmmaking-prompts auto-appends a prose DIRECTOR addendum when present (docs) · brand-definition — validate + persist the locked brand system (artifacts/brand-definition.json: palette/voice/typography/theme map, strict #RRGGBB hex validation); filmmaking-prompts auto-appends a prose BRAND line when present (docs)

Reference libraries

playbook-list · playbook-show · prompt-lib-list · prompt-lib-show · providers

📖 The story bible & assembly (newest features)

Two recent additions are worth calling out because they fix the two things that most often go wrong in multi-scene AI video — consistency and timing.

Story bible (artifacts/story-bible.json) — a single, machine-readable "continuity reference" for a project: the cast, settings, props, and the full scene timeline, with per-scene continuity notes. It is auto-written every time the storyboard is created, derived deterministically from the brief + storyboard + character profiles, so downstream generation stays consistent across scenes and regenerations. It spends no credits and calls no providers. Full guide: docs/STORY_BIBLE.md.
Assemble: media QC + narration fit — when vclaw video assemble stitches the final MP4 it now (a) ffprobe-checks every clip and the master for codec / audio / duration problems and folds the findings into the report, and (b) fits the narration to the video — speeding the voice slightly when it's a hair too long, otherwise keeping speech natural and looping the visual bed. Full guide: docs/ASSEMBLE.md.
Soundtrack A/B (vclaw video soundtrack) — generate a music bed candidate from every configured backend (Suno via KIE_API_KEY, Lyria via Vertex, Lyria 3 via a Gemini key, FlowMusic — full vocal songs via Lyria 3 Pro on the shared USEAPI_API_TOKEN, with --lyrics/--instrumental), compare them side-by-side in the preview portal, then --select <backend> to lock the winning track into the project (written to soundtrack.json + the manifest soundtrack field). Dry-run plans the candidates with no keys. Full guide: docs/ASSEMBLE.md.
Narration / TTS (vclaw video narrate) — synthesize a narration clip from a script via a TTS backend (gemini-tts, the Gemini API gemini-2.5-flash-preview-tts model — API-key product, resolves a key from GEMINI_API_KEYS / GOOGLE_API_KEYS / GOOGLE_API_KEY) to artifacts/audio/narration.wav + a typed narration.json. --video-duration-ms embeds a narration-fit timing plan; --dry-run estimates duration with no keys. Full guide: docs/ASSEMBLE.md.
Cartoon-show voice clone (vclaw video voice-clone) — build the "blank video with audio" voice reference (the production-learned voice-cloning trick) so a target voice — including your own — locks into a Seedance/Veo generation. A raw MP3/WAV reference drifts to a generic accent; a black-frame video carrying the same audio locks the voice. Renders that clip from --audio (local ffmpeg, no spend) and persists it in voice-clones.json; --character binds it so the character's scene @Name tags auto-route the clip into Seedance reference_videos. Plan-only by default; --execute renders + persists. --slice-seconds splits the recording per line for drift control. Full guide: docs/CLI_REFERENCE.md.
Cartoon-show bible (vclaw video show-bible) — the repeatable cartoon-SHOW asset-library index (show-bible.json): tie the project's characters + locations + voice clones into one reusable world and track the episode list, so a solo creator can make many consistent episodes. Derives the bible from existing artifacts (auto-binding each character's voice clone), or --from-json to persist an authored one; --add-episode "id|title|logline" merges episodes by id; --show prints without writing. Deterministic, no spend — distinct from story-bible (continuity) and director-blueprint (visual direction). Full guide: docs/CLI_REFERENCE.md.
Diegetic stills (vclaw video gen-image) — generate an in-world prop, on-screen screen (UI/dashboard), or overlay graphic (e.g. a "SYSTEM COMPROMISED" alert) into assets/props/. Three backends via --backend: gobananas (default, GO_BANANAS_API_KEY, no OpenAI key), openai (gpt-image), and flow — Google Flow via useapi.net (USEAPI_API_TOKEN + USEAPI_ACCOUNT_EMAIL): imagen-4 / nano-banana / nano-banana-pro auto-selected by reference count, repeated --ref (reference_1..10, local paths upload first) + --character (character_1..7, names resolve via flow-characters.json) slots, --count/--seed, and inline @reference_N/@character_N prompt markers validated before any upload or spend. Per-kind render directives (screens/overlays keep text, props suppress it); --dry-run prints the composed request with no spend. Composite it onto footage with the assemble overlay builders. Full guide: docs/CLI_REFERENCE.md.
Motion-graphics overlays (vclaw video overlay) — composite a graphic onto a clip (time-gated, faded, positioned — pairs with gen-image to drop a generated screen/alert onto footage, font-free + real-render validated), or burn a pulsing --alert / boxed --lower-third caption (FFmpeg drawtext, needs a libfreetype build). --dry-run prints the planned ffmpeg command. Full guide: docs/CLI_REFERENCE.md.
Motion-overlay reels (vclaw video motion-overlay) — turn an existing talking-head video into a reel with motion-graphics overlays synced to the speech via Google Flow's Omni Flash V2V (kinetic typography / icons / metaphors painted on the footage, original voice preserved). Plan/dry by default — ingest → Gemini STT (or --transcript) → sentence-boundary slice into ≤10s takes → per-take overlay-prompt composition → work folder + manifest + --preview review surface, no spend. --execute --confirm-spend renders each take (V2V → audio-restore → clip-stitch). Four layouts (split / overlay / motion-only / avatar-host); the avatar-host layout fills the frame with an identity-locked character host (go-bananas generate_with_character → Veo I2V) and needs --gb-character <Name:ID>. Full guide: docs/MOTION_OVERLAY.md.
Music videos (vclaw video music-video) — the vocal-synced, beat-exact assembler. From a hand-authored config (song + clip registry + B-roll pools + an explicit vocal map or a transcript it auto-classifies into rap / hook / instrumental / outro by word density), it pins each performer to their own vocal time-aligned across B-roll cutaways (lips stay locked to the muxed song), cuts every segment frame-exact (-frames:v, never -t, so there is zero cumulative drift), then concats + applies one grade pass + muxes the song. Fully local — ffmpeg only, no provider, no spend. Plan/dry by default; --execute renders and asserts the built master matches the plan within one frame. Full guide: docs/CLI_REFERENCE.md.
Music-video titles (vclaw video title-card) — burn the titles you see in music videos (a faded lower-third + a centred end card that holds to EOF) onto a finished cut. Text is rasterized via Pillow + RAQM, so it works on any ffmpeg build (no libfreetype) and any script — including Devanagari/Arabic (vowel marks shape + stack correctly). Each card is a looped PNG input so delayed alpha fades animate. Fully local, no spend; --dry-run plans, omit to render. Full guide: docs/CLI_REFERENCE.md.
HD finish / upscale (vclaw video finish) — upscale a rendered cut to a clean HD master via Topaz (hosted Proteus/Gaia/Starlight through the apiz/xskill aggregator, or a local Topaz CLI). The anti-plastic "detail-not-sharp" recipe (denoise + halo off, film grain kept + clamped to the real 0.1 cap, detail recovery high) avoids waxy skin. Hosted backends are paid → refuses without --confirm-spend (--dry-run plans free); topaz-local is free. Hosted needs APIZ_API_KEY/XSKILL_API_KEY. Full guide: docs/CLI_REFERENCE.md.
Audio-driven lip-sync (vclaw video lipsync) — a still/keyframe + a vocal track → a lip-synced talking-head clip via OmniHuman v1.5 (apiz/xskill). Uploads image+audio → submits → awaits → downloads → normalizes to CFR fps + even dims (OmniHuman's 25fps/odd dims otherwise break frame-accurate seeking). Drives an external vocal (a rapper's verse, a singer's hook) — the lane's way to put a performer's real vocal on their face. Audio cap enforced up front (1080p≤30s, 720p≤60s). Paid → refuses without --confirm-spend (--dry-run plans free); needs APIZ_API_KEY/XSKILL_API_KEY. Full guide: docs/CLI_REFERENCE.md.

🧠 Skills ecosystem

The repo bundles a curated skills library — agent-invokable workflows split into video (production) and workflow (orchestration) categories. Skills are not equal: a small hierarchy keeps the surface sane.

| Role | Examples | When you reach for it | |---|---|---| | Canonical entry | video-framework, brand-presenter | Generic / unspecified video request — the entry skill routes into a specialist. | | Specialist | video-storyboard, video-clone-ad, movie-director, video-post, ... | The mode is clearly known up front. | | Compatibility alias | davendra-presenter, nex-presenter, bunty | Personal/brand presets that delegate into brand-presenter. | | Workflow | doctor, pipeline, worker, studio-mode, ... | Orchestration, debugging, ops — independent of any one production mode. |

Rule of thumb: start at a canonical entry, specialize only when the mode is clearly known.

Quick skill map

| Skill | Role | One-liner | |---|---|---| | video-framework | canonical | Routes across copy/create/narrated/presentation/long-form/film/UGC. | | brand-presenter | canonical (generic) | Slide deck → narrated presenter video over a branded host profile. | | video-storyboard | native clean-room | Brief or clone plan → scene-by-scene storyboard artifact. | | video-analyze-template | native clean-room | Reference video → reusable template packet (Gemini auto-mode). | | video-clone-ad | native clean-room | Saved template → new product/brand via clone-execute. | | video-thumbnail-lab | native clean-room | Final render → thumbnail + platform variants. | | movie-director | imported | Multi-scene Director-mode (12 genres, two-phase approval, structured entry modes). | | video-replicator | imported (deep) | 7-mode legacy pipeline (COPY/CREATE/NARRATED/PRESENTATION/LONG-FORM/FILM/UGC). | | video-post | imported | Post-render verify, variants, thumbnails, archive. | | higgsfield-generate | external bridge | Higgsfield CLI bridge for Marketing Studio, product photoshoots, Soul IDs, and virality scoring. | | character-creator | imported | Go Bananas characters with multi-view reference sheets. | | character-library | imported | Audit / patch / delete entries in the shared GB library. | | seedance-prompts | imported | Seedance prompt reference library (incl. music-video patterns). | | youtube-audio | imported | YouTube → MP3/MP4 via yt-dlp + FFmpeg. | | ugc | imported | Belief-driven UGC campaign generator (E5 method). |

Compatibility aliases (all delegate into brand-presenter): davendra-presenter · nex-presenter · bunty

| Group | Skills | |---|---| | Multi-agent orchestration | worker · pipeline · studio-mode | | Diagnostics & exploration | doctor · build-fix · deepsearch · deep-interview | | Review & governance | review | | Operational utilities | ai-slop-cleaner · configure-notifications · skill · note · help · web-clone |

Generic orchestration skills (autopilot, ralph/ralph-init/ralplan, team, cancel, trace, hud, git-master, code-review, security-review, omx-setup) were culled — they duplicated the operator's global plugin set with zero repo-specific content.

📖 Full per-skill reference with descriptions, key features, and when-to-reach-for guidance: docs/SKILLS.md · machine-readable index: skills/catalog.json

🗂️ Obsidian operator workspace

The repo writes a vault of machine-generated notes that mirrors canonical project state — dashboards, queues, metrics, health, timelines, dependencies, and per-project notes — all regenerated from one command.

Obsidian is a view, not the source of truth. The repo state on disk is canonical; the vault is a regenerable rendering of it.

What you get

A control plane that isn't a terminal — browse the active queue, blockers, owners, dependencies, and review-state ladder from a normal Obsidian sidebar.
12 dashboard notes — Dashboard, Active, Needs Review, Blocked, Complete, Metrics, Health, Next Actions, Dependencies, Timeline, Changes, Owner Workload.
One project note per project — rich frontmatter (lifecycle state, owner, priority, due risk, blockers, character bindings, review-state, execution profile, genre, runtime) plus body sections for stage status, recent events, artifact links, and cost estimates.
Honest health visibility — backed by the same doctor-portfolio and metrics machinery that drives reporting, including missing-approval and stale-review counts.
Zero lock-in — plain markdown files anywhere you point --output-dir. Delete and rebuild any time.

Three commands

vclaw video scaffold-obsidian-vault --output-dir ./ops/obsidian       # one-time scaffold
vclaw video export-obsidian --project my-project --output-dir ./ops/obsidian/Projects   # single-project export
vclaw video sync-obsidian --root . --output-dir ./ops/obsidian        # full regenerate (the common case)

📖 Full operator guide — vault layout, every dashboard note explained, frontmatter schema, daily loop, common workflows: docs/OBSIDIAN.md

📦 Artifacts & schemas

Every stage writes a canonical JSON artifact under projects/<slug>/artifacts/. Schemas under schemas/video/ are the machine-readable source of truth. Key artifacts:

brief → brief.json
storyboard → storyboard.json (+ optional storyboard.md review)
story bible → story-bible.json (deterministic continuity reference — cast, settings, props, scene timeline, and continuity notes derived from brief + storyboard + character profiles; auto-written at storyboard time so downstream generation stays consistent across scenes and regenerations)
asset manifest → asset-manifest.json
readiness → readiness.json
clone plan → clone-plan.json
execution plan → execution-plan.json
execution report → execution-report.json
review report → review-report.json
publish report → publish-report.json
analyze output → analyze-output.json
character consistency → character-consistency.json

Artifacts are append-only via artifacts/history/ and every write emits a machine-readable event to events/events.jsonl.

✅ Testing & smoke matrix

Unit + CLI contract tests run via node:test:

npm test                         # build + full suite
npm run test:node                # rerun compiled tests only
node --test dist/tests/cli-full-flow.test.js      # single test file

Reproducible end-to-end smokes — each builds first:

| Command | Covers | Run after | |---|---|---| | npm run smoke:runtime | init → brief → storyboard → assets → plan → produce dry-run → status → report → Obsidian | runtime / artifact changes | | npm run smoke:native-veo | Native Veo (veo-useapi) path | changes to built-in Veo path | | npm run smoke:character-hydration | Create-time cast hydration + approval-gate cost | character-profile or cost changes | | npm run smoke:execution-cancel | Submit → cancel → failed-assets transition | adapter cancel or project cancel changes | | npm run smoke:portfolio | init → brief → storyboard → plan → index → report → export-csv | index/report/CSV visibility changes | | npm run smoke:story-bible-image | create → storyboard continuity bible + content-fix propagation, image-only path | story-bible artifact or storyboard-time continuity changes | | npm run e2e:image-storyboard | Go Bananas still manifest → local image assets → scene candidates → selections → reference sheet → readiness/preflight/plan plus Review UI API checks for request queues, candidate recording, media proxy, artifact-backed upscales, and final review decision, with no video generation. Uses a temporary root unless --root is passed to the script. | review UI or image-storyboard workflow changes | | npm run e2e:image-storyboard:examples | Image-storyboard E2E plus non-video example smokes; writes a human-readable prompt, command, API, and artifact ledger in the run root. Uses a temporary root by default. | before testing live provider credentials |

Guardrails — fast local sanity checks:

| Command | Watches | |---|---| | npm run check:movie-director-wrappers | Bundled Director helper scripts target the clean-room CLI | | npm run check:cleanroom-docs | Clean-room-facing docs/skills don't reference stale legacy paths | | npm run check:skill-frontdoor | Repo-local skill front door stays consistent | | npm run check:release-readiness-lite | One-shot: generated-artifact ignore guard + build + tests + smokes + isolated image-storyboard E2E + guardrails |

Use check:release-readiness-lite as the pre-flight before any non-trivial change lands. It also checks that local verification output folders, Playwright state, and Review UI screenshots stay ignored so release diffs remain source-only unless a fixture update is intentional.

📍 What's shipped

The master plan has 50+ implemented slices across lifecycle state, ops visibility, execution planning, adapter-backed runtime, character hydration, director approval gate, review-freshness enforcement, create-time parity, cost visibility, environment verification, post-production utilities, and packaged release-readiness. Themes:

Lifecycle & contracts — canonical artifacts, stage checkpoints, pipeline manifests, stage guards, legacy import bridge.
Portfolio ops — index · metrics · next-actions · workload · dependencies · readiness · doctor · scorecards.
Reporting — report · snapshots · history · diffs · trends · CSV export · Obsidian export/sync/dashboards.
Runtime — adapter submission, dry-run, polling, output ingest, native Seedance transport, native Veo transport, execution cancel.
Character subsystem — project profiles, GB-id anchors, consistency enforcement, library hygiene, auto-create, import-library, cast provenance.
Director lane — storyboard-first approval, preflight hazards, review freshness ladder, review-as-runtime-invariant, cost visibility.
Continuity & assembly — deterministic story bible at storyboard time, assemble media-QC (ffprobe clips + master), and narration-fit timing planner.
Prompt-quality preflight — six Seedance-handbook anti-pattern checks (adjective soup, multiple actions, multiple camera moves, style-word overload, literary emotion, overlong prompts) via director-preflight; warnings by default, DIRECTOR_STRICT_PROMPT_QUALITY=1 to block.
Dialogue preflight — duration-aware dialogue fit checks via director-preflight; warnings by default, DIRECTOR_STRICT_DIALOGUE_FIT=1 to block.
Reference sheets — role-tagged sheets (identity, outfit-material, environment, motion-camera, palette-mood) with closed role vocabularies, per-scene bindings, Go Bananas refs, and identity-per-character-bound-scene enforcement in director readiness/preflight.
Scene candidates — per-scene append-only candidate registry + mutable selection ledger, produce --scene <n> partial reruns, chain-from-prev with hard-fail on missing upstream, selection-coverage stage guards on review/publish, per-scene Obsidian notes, and a migration helper for legacy single-generation projects.
Generation telemetry — route/task/config/cost/timing/output events recorded after execute and poll, with completed Seedance USD samples feeding cost-estimate.
Clone workflow enrichment — analyze/template/clone artifacts carry style layers, beat compression, technical notes, dialogue notes, and workflow checklists.
Front door — video create / auto / iterate / run-pipeline / approve with genre-aware defaults.
Release-readiness — packaged one-build smoke bundle + doc guardrails.

Honest remaining gaps (tracked in docs/MASTER_PLAN_ALIGNMENT.md):

Deeper video create parity with the legacy Director/movie surface (richer decomposition/tuning).
Historical project migration from shallow stage-guess → structured reconciliation.
Provider-contract hardening (shared error taxonomy, richer recovery guidance).

Current status: npm test green · check:release-readiness-lite passing.

📚 Documentation map

Read it on the site: everything below is rendered (with search + diagrams) at videoclaw-docs.vercel.app — jump to the Guide, a page for every feature, the skills catalog, or the full reference. The links below open the same docs on the live site; each also exists as raw markdown under docs/.

| # | Doc | What it gives you | |---|---|---| | 1 | Production workflow | Operator-first workflow: make, review/fix, and manage video projects | | 2 | Architecture | Layer map + canonical flow | | 3 | CLI reference | Full command reference | | 4 | Skills | Comprehensive per-skill reference with features and when-to-reach-for guidance | | 5 | Story bible | Story bible reference — the deterministic continuity artifact, when it's generated, its shape, and how downstream stages use it | | 6 | Assemble | vclaw video assemble operator guide — pipeline stages, media QC, narration fit, API keys, dry-run vs real render, validation status | | 7 | Obsidian | Obsidian operator workspace deep guide — vault layout, dashboard notes, frontmatter schema, daily loop | | 8 | Reference sheets | Reference sheets operator guide — 5 sheet types, role vocabularies, CLI commands, readiness/preflight semantics, GB integration | | 9 | Scene candidates | Scene candidates operator guide — append-only candidates + mutable selection, 9 CLI commands, partial reruns, chain-from-prev, migration | | 10 | Prompt quality | Prompt-quality preflight operator guide — Seedance-handbook anti-pattern checks, thresholds, strict-mode blocking | | 11 | Generation telemetry | Generation event ledger + cost-estimate telemetry behavior | | 12 | Operations | Day-to-day maintenance loop | | 13 | Templates | Template store + clone bridge | | 14 | Migration | Legacy → clean-room moves | | 15 | Deprecation | Alias + deprecation status | | 16 | Release readiness | Release checklist | | 17 | Master plan alignment | What's shipped + remaining gaps |

Skill deep-dives (also indexed in docs/SKILLS.md):

skills/video-framework/SKILL.md · skills/brand-presenter/SKILL.md — canonical entries
skills/video-storyboard/SKILL.md · skills/video-clone-ad/SKILL.md · skills/video-analyze-template/SKILL.md — clean-room native specialists
Full catalog in skills/README.md · machine-readable in skills/catalog.json

🧬 Where it came from

videoclaw (the current videoclaw-v3 repo) is the merged successor of two predecessor codebases: the older videoclaw package (which had a heavy orchestration layer on top of a video pipeline) and the clean-room vclaw-video-core rebuild (which kept only the pipeline, with strict on-disk artifacts and approval gates). This repo takes the clean-room core as its foundation, drops the old orchestration layer (Claude Code, Codex, and the OMC plugin cover those workflows natively now), and ports forward the pieces worth keeping: the vclaw-cli Bun package for Google Flow + UseAPI, the Runway transport, a curated Python pipeline, and the Omni Flash backend additions. See MERGE_PLAN.md for the full rationale and per-phase commits.

🧭 Principles

Clean-room implementation only. No code inherited from the legacy repo; every module against an explicit contract.
Video-first command surface. One vclaw video ... namespace that mirrors the production flow.
Explicit stage artifacts. Every stage writes machine-readable JSON; the state on disk is the source of truth.
No silent fallback across materially different provider paths. Fail hard and say what went wrong.

🤝 Contributing

Read AGENTS.md for the autonomy directive, coding style, and commit protocol, then CLAUDE.md for non-obvious conventions (NodeNext ESM .js extensions, review-state ladder, dist/ is generated, etc.). The expected pre-flight before a non-trivial change lands:

npm run check:release-readiness-lite

🪪 License

Source-available under a custom proprietary license — see LICENSE.

✅ Free for personal, educational, research, evaluation, and non-commercial internal use
💼 Commercial / production use requires a paid license — contact the repository owner via github.com/davendra/videoclaw

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

videoclaw

📚 Full documentation site → videoclaw-docs.vercel.app

🗺️ Explore the docs

🎯 What is videoclaw? (in plain English)

Who is it for?

✨ What you can do with it

🪜 How it works, step by step

🧩 The main parts, explained simply

🚀 Quickstart

🎬 Production workflow (the three common jobs)

🤖 For AI agents

Agent integration

🏗️ Architecture

🔁 Project lifecycle

On-disk shape of a project

🧭 Production modes

🔌 Provider routing

Environment variables

Google Flow inline @-markers (veo-useapi)

🧰 Command surface

Lifecycle

Readiness · planning · runtime

Templates · cloning · storyboard templates

Character subsystem

Portfolio · ops · reporting

Metadata

Post-production

Obsidian

Multi-shot prompt

Director & brand layer

Reference libraries

📖 The story bible & assembly (newest features)

🧠 Skills ecosystem

Quick skill map

🗂️ Obsidian operator workspace

What you get

Three commands

📦 Artifacts & schemas

✅ Testing & smoke matrix

📍 What's shipped

📚 Documentation map

🧬 Where it came from

🧭 Principles

🤝 Contributing

🪪 License