ai-video-generator
v0.2.0
Published
Generate narrated short-form videos from a prompt with Cloudflare Workers AI, Pexels, and ffmpeg.
Downloads
252
Maintainers
Readme
AI Video Generator CLI
TypeScript CLI for generating narrated short-form videos with Cloudflare Workers AI, Pexels, and ffmpeg. It is packaged for public npm publishing, works with npx, and supports global install through npm install -g.
What it uses
| Purpose | Provider | Model / API |
| --- | --- | --- |
| Script + visual planning | Cloudflare Workers AI | @cf/moonshotai/kimi-k2.6 |
| Reference-image understanding | Cloudflare Workers AI | @cf/moonshotai/kimi-k2.6 (image_url message parts) |
| Render review + iteration | Cloudflare Workers AI | @cf/google/gemma-4-26b-a4b-it |
| Image generation fallback | Cloudflare Workers AI | @cf/black-forest-labs/flux-1-schnell |
| Speech-to-text | Cloudflare Workers AI | @cf/openai/whisper |
| Text-to-speech | Cloudflare Workers AI | @cf/myshell-ai/melotts |
| Stock videos / photos | Pexels | Videos API + Photos API |
| Rendering | Local binary | ffmpeg / ffprobe |
Requirements
- Node.js 20.18+
ffmpegandffprobeon yourPATH- Cloudflare Account ID and Workers AI API token
- Optional: Pexels API key if you want stock footage or stock images
Install
NPX
npx ai-video-generator init --profile hybridGlobal install
npm install -g ai-video-generator
ai-video-generator init --profile hybridLocal development
npm install
npm run checkOnboarding profiles
Use init --profile <profile> to create .env and .env.example.
The profile defines the available source pool. Kimi chooses the actual source for each cut/scene from that pool.
| Profile | Needs Cloudflare | Needs Pexels | Source pool Kimi can choose from |
| --- | --- | --- | --- |
| hybrid | Yes | Optional | stock_video, stock_image, ai_image |
| stock-video | Yes | Yes | stock_video only |
| stock-image | Yes | Yes | stock_image only |
| ai-image | Yes | No | ai_image only |
After init, edit .env and then run:
ai-video-generator doctorThen verify the reviewer path against a generated sample video:
ai-video-generator review-sampleUsage
Basic generation
ai-video-generator generate "5 surprising facts about volcanoes"Use reference images for Kimi K2.6 vision guidance
ai-video-generator generate "cinematic travel teaser for Iceland" \
--reference-image ./refs/ice-1.jpg ./refs/ice-2.jpgUse an AI-image-only source pool
npx ai-video-generator init --profile ai-image
ai-video-generator generate "space startup launch trailer"Landscape output
ai-video-generator generate "history of electric cars" \
--orientation landscape \
--output ./out/history-of-evs.mp4Commands
generate
ai-video-generator generate <prompt...> [options]Options:
--output <path>: output.mp4path or output directory--orientation <portrait|landscape>--reference-image <paths...>: local images passed to Kimi K2.6 asimage_urlcontent--no-captions--keep-temp
init
ai-video-generator init --profile hybridCreates or refreshes:
.env.example.env(unless it already exists, or you pass--force)
doctor
ai-video-generator doctorChecks:
ffmpeg/ffprobe.envpresence- Cloudflare credentials
- Pexels requirements for the selected source profile
review-sample
ai-video-generator review-sampleCreates a short text-only sample video, extracts review frames with ffmpeg, and checks that Gemma can read the rendered content correctly.
Caption styles
init now writes a TikTok-style caption preset by default for portrait videos. That preset uses uppercase display text, automatic multi-line balancing to avoid horizontal overflow, and word-by-word highlight overlays timed from Whisper timestamps.
Set CAPTION_STYLE=classic if you want the older single-layer subtitle look instead. You can tune the active-word color and styling through the CAPTION_* keys in .env.
Review loop
By default, every render goes through a Gemma review pass. If Gemma finds problems, the feedback is fed back into Kimi and the generator reruns the story + visual plan up to the configured retry limit.
Workers AI currently accepts the Gemma reviewer reliably through sampled video frames (image_url parts). The published model docs describe direct video/file support, but the live runtime rejected file parts during integration testing, so this CLI uses ffmpeg frame extraction for the review step.
.env reference
Core keys:
CLOUDFLARE_ACCOUNT_ID=...
CLOUDFLARE_API_TOKEN=...
CLOUDFLARE_KIMI_MODEL=@cf/moonshotai/kimi-k2.6
CLOUDFLARE_GEMMA_REVIEW_MODEL=@cf/google/gemma-4-26b-a4b-it
CLOUDFLARE_FLUX_MODEL=@cf/black-forest-labs/flux-1-schnell
CLOUDFLARE_WHISPER_MODEL=@cf/openai/whisper
CLOUDFLARE_MELOTTS_MODEL=@cf/myshell-ai/melotts
VIDEO_REVIEW_ENABLED=true
VIDEO_REVIEW_MAX_ITERATIONS=1
VISUAL_SOURCE_PROFILE=hybrid
PEXELS_API_KEY=...Render keys:
VIDEO_ORIENTATION=portrait
VIDEO_FPS=30
CAPTIONS_ENABLED=true
CAPTION_STYLE=tiktok
CAPTION_FONT_SIZE=72
CAPTION_FONT_COLOR=white
CAPTION_HIGHLIGHT_COLOR=green
CAPTION_FONT_FACE=NanumSquareRound
CAPTION_STROKE_WIDTH=4.5
CAPTION_STROKE_COLOR=black
CAPTION_BOLD=true
CAPTION_SHADOW_DEPTH=0
CAPTION_POSITION=bottom_center
CAPTION_MAX_WORDS=6
CAPTION_MAX_CHARS=28
CAPTION_MAX_DURATION_SECONDS=4.2Output
Each run writes:
- an
.mp4video - a
.jsonmanifest next to the video with the generated script, cues, visual plan, and review history
If you pass --keep-temp, the ffmpeg working directory is kept for inspection.
Publish notes
The package is configured for public npm publishing:
- package name:
ai-video-generator - CLI binary:
ai-video-generator publishConfig.access=publicprepackruns the TypeScript build before publishing- license: MIT
First publish:
npm publish --access publicLicense
MIT.
Development
npm install
npm run build
npm test
npm run check