@argo-video/cli
v0.13.0
Published
Turn Playwright demo scripts into polished product demo videos with AI voiceover
Downloads
1,832
Readme
___
_____ / /
/ _ | _ __ __ _ ___ / /
/ /_| | | '__|/ _` |/ _ \ / /
/ ___ | | | | (_| | (_) |/ /
/__/ |_| |_| \__, |\___//__/
__/ |
|___/@argo-video/cli
Turn Playwright demo scripts into polished product demo videos with AI voiceover.
Write a demo script with Playwright. Add a scenes manifest. Run one command. Get an MP4 with overlays and narration.
Showcase
This demo was recorded by Argo, using Argo. Yes, really.
How it works
TTS Record Align Export
─── ────── ───── ──────
Kokoro Playwright Place clips ffmpeg
generates captures at scene merges
voice browser + timestamps video +
clips scene marks audio
│ │
▼ ▼
.scenes.json → narration-aligned.wav → final.mp4Quick start
# Install
npm i -D @argo-video/cli
# Initialize project
npx argo init
# Edit your demo script (or convert an existing Playwright test)
vim demos/example.demo.ts
npx argo init --from tests/checkout.spec.ts # auto-convert
# Run the full pipeline
npx argo pipeline example
# Or run steps individually
npx argo record example
npx argo tts generate demos/example.scenes.json
npx argo export exampleWriting a demo
A demo is two files: a script and a scenes manifest.
Demo script (demos/my-feature.demo.ts)
import { test } from '@argo-video/cli';
import { showOverlay, withOverlay } from '@argo-video/cli';
test('my-feature', async ({ page, narration }) => {
await page.goto('/');
narration.mark('intro');
await showOverlay(page, 'intro', narration.durationFor('intro'));
narration.mark('action');
await withOverlay(page, 'action', async () => {
await page.click('#get-started');
await page.waitForTimeout(narration.durationFor('action'));
});
narration.mark('done');
await showOverlay(page, 'done', narration.durationFor('done'));
});Scenes manifest (demos/my-feature.scenes.json)
[
{
"scene": "intro",
"text": "Welcome to our product — let me show you around.",
"overlay": { "type": "lower-third", "text": "Welcome to our product", "placement": "bottom-center", "motion": "fade-in", "autoBackground": true }
},
{
"scene": "action",
"text": "Just click get started and you're off.",
"overlay": { "type": "headline-card", "title": "Watch this", "placement": "top-right", "motion": "slide-in" }
},
{
"scene": "done",
"text": "And that's all there is to it.",
"voice": "af_heart",
"overlay": { "type": "callout", "text": "That's it!", "placement": "top-left", "motion": "fade-in" }
}
]Each scene in the manifest maps to a narration.mark() call in the script. The text field is spoken narration; the optional overlay sub-object defines what appears on screen. Argo records the timestamp of each mark, generates TTS clips, and aligns them to produce the final narrated video.
Configuration
argo.config.mjs
import { defineConfig } from '@argo-video/cli';
export default defineConfig({
baseURL: 'http://localhost:3000',
demosDir: 'demos',
outputDir: 'videos',
tts: { defaultVoice: 'af_heart', defaultSpeed: 1.0 },
video: {
width: 1920, height: 1080, fps: 30,
browser: 'webkit', // webkit > firefox > chromium on macOS
// deviceScaleFactor: 2, // enable after webkit 2x fix
},
export: { preset: 'slow', crf: 16 },
overlays: {
autoBackground: true,
// defaultPlacement: 'top-right',
},
});Tip: Use
browser: 'webkit'for sharper video on macOS. Chromium has a known video capture quality issue. SetdeviceScaleFactor: 2for retina-quality recordings (captured at 2x, downscaled with lanczos in export).
playwright.config.ts
Argo scaffolds this for you via argo init. The key settings:
import { defineConfig } from '@playwright/test';
import config from './argo.config.mjs';
const scale = Math.max(1, Math.round(config.video?.deviceScaleFactor ?? 1));
const width = config.video?.width ?? 1920;
const height = config.video?.height ?? 1080;
export default defineConfig({
preserveOutput: 'always',
projects: [{
name: 'demos',
testDir: 'demos',
testMatch: '**/*.demo.ts',
use: {
browserName: config.video?.browser ?? 'chromium',
baseURL: process.env.BASE_URL || config.baseURL || 'http://localhost:3000',
viewport: { width, height },
deviceScaleFactor: scale,
video: { mode: 'on', size: { width: width * scale, height: height * scale } },
},
}],
});CLI
argo init Scaffold demo files + config
argo init --from <test> Convert Playwright test to Argo demo
argo record <demo> Record browser session
argo tts generate <manifest> Generate TTS clips from manifest
argo export <demo> Merge video + audio to MP4
argo pipeline <demo> Run all steps end-to-end
argo validate <demo> Check scene name consistency (no TTS/recording)
argo preview <demo> Browser-based editor for voiceover, overlays, timing
argo doctor Check environment (ffmpeg, Playwright, config)
argo --config <path> <command> Use a custom config file
Options:
--browser <engine> chromium | webkit | firefox (overrides config)
--base-url <url> Override baseURL from config
--headed Run browser in visible mode
--port <number> Preview server port (default: auto)API
Argo exports Playwright fixtures and helpers for use in demo scripts:
import { test, expect, demoType } from '@argo-video/cli';
import { showOverlay, hideOverlay, withOverlay } from '@argo-video/cli';
import { showConfetti } from '@argo-video/cli';
import { spotlight, focusRing, dimAround, zoomTo, resetCamera } from '@argo-video/cli';
import { showCaption, hideCaption, withCaption } from '@argo-video/cli';
import { defineConfig, demosProject, engines } from '@argo-video/cli';| Export | Description |
|--------|-------------|
| test | Playwright test with narration fixture injected |
| expect | Re-exported from Playwright |
| demoType(page, selectorOrLocator, text, delay?) | Type character-by-character — accepts CSS selector or Playwright Locator |
| showOverlay(page, scene, durationMs) | Show overlay from manifest for a fixed duration |
| showOverlay(page, scene, cue, durationMs) | Show overlay with inline cue (backward compat) |
| withOverlay(page, scene, action) | Show overlay from manifest during an async action |
| withOverlay(page, scene, cue, action) | Show overlay with inline cue during action (backward compat) |
| hideOverlay(page, zone?) | Remove overlay from a zone |
| showConfetti(page, opts?) | Non-blocking confetti animation (spread: 'burst' \| 'rain', emoji: '🎃' or emoji: ['🎄', '⭐'] for emoji mode, wait: true to block) |
| spotlight(page, selector, opts?) | Dark overlay with hole around target element |
| focusRing(page, selector, opts?) | Pulsing glow border on target |
| dimAround(page, selector, opts?) | Fade sibling elements to highlight target |
| zoomTo(page, selector, opts?) | Scale viewport centered on target |
| resetCamera(page) | Clear all active camera effects |
| showCaption(page, scene, text, durationMs) | Show a simple text caption |
| withCaption(page, scene, text, action) | Show caption during an async action |
| hideCaption(page) | Remove caption |
| narration.mark(scene) | Record a scene timestamp |
| narration.durationFor(scene, opts?) | Compute hold duration from TTS clip length |
| defineConfig(userConfig) | Create config with defaults |
| demosProject(options) | Create Playwright project entry |
Requirements
- Node.js >= 18
- Playwright >= 1.40 (peer dependency)
- ffmpeg — system install required for export
# Install ffmpeg
brew install ffmpeg # macOS
apt install ffmpeg # Linux
choco install ffmpeg # WindowsHow the pipeline works
TTS — Generates WAV clips from the scenes manifest. Kokoro is the default (local, free), but you can swap in OpenAI, ElevenLabs, Gemini, Sarvam, or mlx-audio via
engines.*factories. Clips are cached by content hash in.argo/<demo>/clips/.import { defineConfig, engines } from '@argo-video/cli'; export default defineConfig({ tts: { engine: engines.openai({ model: 'tts-1-hd' }) }, });| Engine | Type | Install | API Key | |--------|------|---------|---------| |
engines.kokoro()| local | built-in | none | |engines.mlxAudio()| local |pip install mlx-audio| none | |engines.openai()| cloud |npm i openai|OPENAI_API_KEY| |engines.elevenlabs()| cloud |npm i @elevenlabs/elevenlabs-js|ELEVENLABS_API_KEY| |engines.gemini()| cloud |npm i @google/genai|GEMINI_API_KEY| |engines.sarvam()| cloud |npm i sarvamai|SARVAM_API_KEY| |engines.transformers()| local | built-in | none |Transformers.js — Use any HuggingFace
text-to-speechmodel locally. Supertonic, or any future ONNX TTS model:tts: { engine: engines.transformers({ model: 'onnx-community/Supertonic-TTS-ONNX', speakerEmbeddings: 'https://huggingface.co/.../voices/F1.bin', numInferenceSteps: 10, }), }Voice cloning — Clone your own voice locally with mlx-audio. Record a 15-second clip, and every demo sounds like you — privately, no data leaves your machine:
# Record a reference clip (macOS) bash $(npm root)/@argo-video/cli/scripts/record-voice-ref.sh assets/ref-voice.wav # Preview cloned voice against your manifest bash $(npm root)/@argo-video/cli/scripts/voice-clone-preview.sh \ --ref-audio assets/ref-voice.wav \ --ref-text "Transcript of what I said." \ --voiceover demos/showcase.scenes.json --playtts: { engine: engines.mlxAudio({ model: 'mlx-community/Qwen3-TTS-12Hz-0.6B-Base-bf16', refAudio: './assets/ref-voice.wav', refText: 'Transcript of what I said in the clip.', }), }Record — Playwright runs the demo script in a real browser. The
narrationfixture records timestamps for eachmark()call. Video is captured at native resolution.Align — Each TTS clip is placed at its scene's recorded timestamp. Overlapping clips are pushed forward with a 100ms gap. All clips are mixed into a single
narration-aligned.wav.Export — ffmpeg combines the screen recording (WebM) with the aligned narration (WAV) into an H.264 MP4 with chapter markers. Subtitle files (
.srt+.vtt) and a scene report are generated alongside the video.
Example
A self-contained example is in example/ — it records a demo of Argo's own showcase page:
cd example && npm install && npx playwright install webkit
npm run serve # in one terminal
npm run demo # in anotherLLM Skill
Argo ships as a Claude Code skill so LLMs can create demo videos autonomously. Install it as a plugin:
# In Claude Code
/plugin marketplace add shreyaskarnik/argoThe skill teaches Claude how to write demo scripts, scenes manifests, overlay cues, and run the pipeline — no manual guidance needed.
License
MIT
