framevox
v0.4.2
Published
Video production CLI — HyperFrames compositions + Gemini, Piper, ElevenLabs TTS
Maintainers
Readme
FrameVOX
Video production CLI — a thin wrapper around HyperFrames (HTML → MP4) plus TTS (Gemini, Piper, ElevenLabs).
Framevox does not replace HyperFrames. It bundles the pieces agents and humans trip over: scaffold from templates, voice generation with MD5 sanity checks, lint-before-render, API key storage, and production docs.
Without framevox vs with FrameVOX
Manual (HyperFrames + curl + ffmpeg):
# scaffold — copy files by hand or hyperframes init, then wire assets yourself
mkdir my-promo && cd my-promo
# …create index.html, voice.json, logos, timing…
# voice — curl Gemini, decode base64 PCM, ffmpeg to MP3, hope the API didn't fail silently
GEMINI_API_KEY=… curl -s -X POST "https://generativelanguage.googleapis.com/…" -d '…' -o /tmp/res.json
jq -r '…' /tmp/res.json | base64 -d > /tmp/voice.pcm
ffmpeg -y -f s16le -ar 24000 -ac 1 -i /tmp/voice.pcm -codec:a libmp3lame voice.mp3
# render — separate lint step, ensure hyperframes is installed
npx hyperframes lint
npx hyperframes render --output output.mp4With FrameVOX:
npx framevox init my-promo --template minimal-mobile
npx framevox add-key gemini YOUR_GEMINI_KEY
npx framevox voice
npx framevox renderSame output class — one project folder, one voice.mp3, one MP4. Framevox collapses ~10 manual steps (and several failure modes) into four commands. HyperFrames is installed automatically as a dependency.
CLI vs skill (two layers)
| Layer | What it is | Framevox | HyperFrames |
| --------- | ------------------------------- | ------------------------------------ | --------------------------------------------- |
| CLI | npm executable (npx …) | init, voice, render, templates | lint, inspect, render, preview |
| Skill | Agent instructions (SKILL.md) | workflow, templates, TTS keys | HTML composition rules, GSAP, data-* timing |
Framevox is skill + CLI. It bundles the HyperFrames CLI as a dependency. It does not replace the HyperFrames skill — agents still need that to edit index.html correctly.
After installing FrameVOX, run once:
npx framevox setupFramevox detects which agent apps you have (Claude Code, Cursor, Codex, Antigravity, OpenCode) and installs skills only there — no APX required. Skills land in each app's global skills dir (e.g. ~/.cursor/skills/framevox).
npm install -g framevox prints a first-run hint. After upgrades, framevox update reinstalls from npm and syncs the framevox skill to detected apps. Check state anytime:
framevox status
framevox update --checkQuick start
npx framevox init my-promo --template minimal-mobile
npx framevox add-key gemini YOUR_GEMINI_KEY
npx framevox voice
npx framevox renderTemplates
Template layers: project .framevox/templates/ → user ~/.framevox/templates/ → builtin.
Builtin templates are grouped by family — each family shares one style.css and ships mobile + desktop variants:
templates/promo/
├── style.css ← shared by mobile + desktop
├── family.json
├── mobile/ ← init as promo-mobile
└── desktop/ ← init as promo-desktopframevox templates # list (project > user > builtin)
framevox templates --json # for agents
framevox templates add promo # copy entire family into .framevox/templates/
framevox templates install promo # install family globally
framevox init my-reel --template promo-mobileEach builtin variant includes preview.mp4 — show users before choosing.
| Family | Variants | Duration | Scenes | Demo language | CTA style · Example |
| ----------- | -------------------------- | -------- | ------ | -------------- | ----------------------------------------- |
| minimal | minimal-mobile/desktop | 17s | 3 | Español (AR) | Free forever · Ledgerly (invoicing) |
| promo | promo-mobile/desktop | 18s | 5 | English | Product launch · Crewdesk (scheduling)|
| studio | studio-mobile/desktop | 18s | 5 | Español (AR) | Breaking news · Bitpulse (crypto) |
| immersive | immersive-mobile/desktop | 18s | 5 | Français | Atelier (creative studio) · hero photo bg |
Brand-specific layouts (e.g. Appsi) belong in the project's .framevox/templates/ — copy a family and customize there.
TTS Providers
| Command | Effect |
| ----------------------------------- | ----------------------------- |
| framevox add-key gemini KEY | Gemini 2.5 Flash TTS (remote) |
| framevox add-key elevenlabs KEY | ElevenLabs (remote) |
| framevox add-key piper-voice NAME | Piper (local, offline) |
| framevox keys | Show all configured providers |
Keys live in ~/.framevox/.env (never commit). Set with framevox add-key gemini YOUR_KEY.
Gemini also falls back to ~/.claude/skills/video-docs-builder/.env.
Requirements
- Node.js ≥ 22
- ffmpeg (for PCM→MP3 conversion)
- Chrome/Chromium (HyperFrames uses headless Chrome for rendering)
- piper binary (only if using Piper provider)
Project structure after init
my-promo/
├── index.html ← composition (edit BRAND/COPY comments)
├── voice.json ← voice script (prompt + text or scenes)
├── DESIGN.md ← fill brand colors + product info first
├── assets/ ← logo.png, images
├── voice.mp3 ← generated by framevox voice
├── output.mp4 ← generated by framevox render
├── splash.png ← splash frame (auto after render; or framevox extract splash)
├── RECIPE.md ← generated by framevox recipe
└── .framevox/
└── config.json ← provider, format, last render metadataAll commands
framevox init [name] [--template T] # scaffold project
framevox voice [--provider P] [--voice V] [--scene id] # --scene = regen one multi-scene segment
framevox render [--out file] [--quality Q] [--no-lint] [--no-splash] [--splash-out file] # lint + render; splash.png by default
framevox extract splash [--out file] [--from mp4] # splash PNG from output.mp4 or composition snapshot
framevox lint # lint only
framevox preview # open studio in browser
framevox recipe [title] # generate RECIPE.md
framevox add-key <name> <value> # store API key
framevox keys # list key status
framevox templates # list templates
framevox templates add <name> # copy to .framevox/templates/
framevox templates install <name> # install to ~/.framevox/templates/
framevox status # install state + detected agents
framevox setup # first-time setup (detected agent apps)
framevox setup --skip-hf-skills # sync framevox skill only
framevox update # npm update + skill sync
framevox update --check # check for newer npm versionVoice script format (voice.json)
One file for accent, tone, and spoken copy:
{
"prompt": "Leé en español rioplatense, tono cálido y conversacional:",
"text": "Bueno che, mirá esto... [eloquent]¡Es una locura![/eloquent] ... [sad]Y después todo cambió.[/sad]"
}prompt — style guide only (accent, locale, pace). Framevox merges built-in delivery rules for Gemini automatically.
text — single audio (promos ≤ ~20s). scenes — multi-scene array when copy is longer or you need framevox voice --scene N.
{
"prompt": "Leé en español rioplatense, tono ágil:",
"gap": 0.3,
"scenes": [
{ "id": "hook", "text": "Primera línea..." },
{ "id": "cta", "text": "Cierre. miapp punto com." }
]
}Gemini TTS: do not set
seedortemperature: 0— they can yieldfinishReason: OTHERwith no audio. Prefer singletext+...pauses for promos ≤ ~50s (one API call).
Never commit secrets in voice.json.
Emotion tags (Gemini)
Paired open/close tags change delivery inside the span only:
| Tag | Effect |
|-----|--------|
| [eloquent]...[/eloquent] | Strong emphasis, theatrical |
| [sad]...[/sad] | Subdued, melancholic |
| [whisper]...[/whisper] | Quiet, intimate |
| [excited]...[/excited] | High energy |
Outside tags → neutral tone. Tag labels are never spoken aloud.
Ellipsis ...
Long pause (~1s+) between phrases. Useful for air and scene breathing in a single audio file. Framevox can later detect silences to split segments with the same voice.
Voice production methods
| Method | When | How |
|--------|------|-----|
| Single audio | Script ≤ ~20s, fluid narration | voice.json with "text" → framevox voice |
| Single + tags | One voice, mood shifts | Tags + ... in text (Gemini) |
| Multi-scene concat | Longer copy, per-scene regen | voice.json with "scenes" + optional gap |
| Multi-scene timed | Narration must hit visual beats | Explicit start per scene in scenes[]; overlaps auto-bumped |
| Regen one scene | Fix one line without redoing others | framevox voice --scene 2 or --scene hook |
| ElevenLabs | English, premium, long form | --provider elevenlabs |
| Piper | Offline, no API | --provider piper |
After framevox voice, read .framevox/voice-timeline.json — measured times, pauses, collisions.
Two cut strategies (not opposites)
| | Multi-scene (voice.json → scenes) | Single audio (voice.json → text + ...) |
|---|---|---|
| Cut when | Before TTS — you split text | After TTS — silencedetect finds pauses |
| API calls | N | 1 |
| Same voice | ~similar per call | identical |
| Regen part 2 | --scene 2 | re-run whole script (split: planned) |
Use multi-scene for long copy or per-line fixes. Use single + ... for fluid promos ≤20s.
Rules source: src/providers/gemini/rules.js (auto-merged on every Gemini call).
