video-maker-mcp
v0.1.12
Published
Local MCP storyboard video renderer for Codex-driven video generation.
Readme
video-maker-mcp
Local MCP storyboard video renderer for Codex-driven short video generation.
video-maker-mcp does not run its own LLM. Codex writes the script and storyboard, then this MCP server generates an AI/stock mixed asset set: Doubao Seedream for cover and conceptual images, plus Pexels/Pixabay for stock photos and stock video B-roll. It validates assets, generates Doubao/Volcengine narration, and renders a 9:16 MP4 with ffmpeg. The requested duration is treated as a target; final video length follows the generated narration audio so speech is not cut off or padded with silence.
Install With Codex
Paste this into Codex:
请帮我安装 video-maker-mcp。执行:
npx -y video-maker-mcp@latest install --host codex
如果提示缺少 ffmpeg,请按提示安装 ffmpeg,然后重新运行:
npx -y video-maker-mcp@latest doctor
如果提示缺少图片配置,请执行:
npx -y video-maker-mcp@latest configure-image
如果提示缺少 stock 素材配置,请执行:
npx -y video-maker-mcp@latest configure-stock
如果提示缺少 TTS 配置,请执行:
npx -y video-maker-mcp@latest configure-tts
最后确认 Codex 能看到 video-maker MCP server。Manual command:
npx -y video-maker-mcp@latest install --host codex
npx -y video-maker-mcp@latest configure-image
npx -y video-maker-mcp@latest configure-stock
npx -y video-maker-mcp@latest configure-tts
npx -y video-maker-mcp@latest doctorThe installer configures Codex with:
codex mcp add video-maker -- npx -y video-maker-mcp@latest serveIt does not silently install system packages and does not bundle ffmpeg-static.
Requirements
- Node.js 20+
- Codex CLI
- ffmpeg on PATH
- Doubao Seedream image generation credentials
- Pexels or Pixabay stock media credentials
- Doubao/Volcengine TTS credentials
ffmpeg install hints:
# macOS
brew install ffmpeg
# Ubuntu/Debian
sudo apt update && sudo apt install -y ffmpeg
# Windows
winget install Gyan.FFmpegAsset Environment
Video asset generation is AI/stock mixed by default. Configure Seedream and at least one stock provider before creating videos:
npx -y video-maker-mcp@latest configure-imageThe command prompts for the API key locally, does not require pasting the key into an LLM chat, and writes the configuration to ~/.video-maker/.env. By default it does not generate a test image, to avoid spending image-generation quota. To run a smoke test:
npx -y video-maker-mcp@latest configure-image --testIt writes:
VIDEO_MAKER_IMAGE_API_KEY=your-ark-api-key
VIDEO_MAKER_IMAGE_MODEL=doubao-seedream-5-0-260128
VIDEO_MAKER_IMAGE_ENDPOINT=https://ark.cn-beijing.volces.com/api/v3/images/generations
VIDEO_MAKER_IMAGE_SIZE=2KVIDEO_MAKER_IMAGE_API_KEY can also be supplied as ARK_API_KEY or VOLCENGINE_API_KEY, but the VIDEO_MAKER_* key is preferred for this tool.
Stock media uses Pexels and/or Pixabay:
npx -y video-maker-mcp@latest configure-stock# At least one is required for the default AI/stock mixed flow.
VIDEO_MAKER_PEXELS_API_KEY=your-pexels-api-key
VIDEO_MAKER_PIXABAY_API_KEY=your-pixabay-api-key
# Optional. Defaults to pexels, then falls back to pixabay if both are set.
VIDEO_MAKER_STOCK_PROVIDER=pexelsPEXELS_API_KEY and PIXABAY_API_KEY are also accepted, but VIDEO_MAKER_* names are preferred for this tool. Pexels API requests use the Authorization header; Pixabay requests use the key query parameter. Downloaded media is stored locally under the project assets/ directory before rendering.
Quick image smoke test:
npm run build
node dist/cli/index.js image-test --prompt "一张 9:16 竖屏科技短视频封面,真实摄影质感,无文字,无水印。"
open ~/Downloads/video-maker/seedream-image-test.pngimage-test --out accepts either a directory or a .png file:
node dist/cli/index.js image-test --out ~/Downloads/video-maker
node dist/cli/index.js image-test --out ~/Downloads/my-seedream-test.pngTTS Environment
Narrated video generation requires Doubao/Volcengine TTS credentials before generate_audio can work.
Recommended setup:
npx -y video-maker-mcp@latest configure-ttsThe command prompts for the API key locally, does not require pasting the key into an LLM chat, writes the configuration to ~/.video-maker/.env, and runs a short MP3 smoke test.
It writes:
VIDEO_MAKER_TTS_API_KEY=your-api-key
VIDEO_MAKER_TTS_VOICE_ID=zh_female_vv_uranus_bigtts
VIDEO_MAKER_TTS_RESOURCE_ID=seed-tts-2.0
VIDEO_MAKER_TTS_ENDPOINT=https://openspeech.bytedance.com/api/v3/tts/unidirectionalFor local development, you can also put these values in a project-root .env file. The CLI loads the project .env first, then fills missing values from ~/.video-maker/.env. To load another env file, set VIDEO_MAKER_ENV_FILE=/path/to/file.
The default TTS integration uses the current Doubao/Volcengine V3 HTTP unidirectional API with X-Api-Key. This matches the new console API Key page.
Legacy V1 AppID + Access Token is still accepted as a fallback when VIDEO_MAKER_TTS_API_KEY is not set, but new accounts should use VIDEO_MAKER_TTS_API_KEY.
Optional:
VIDEO_MAKER_WORKSPACE=$HOME/.video-maker
VIDEO_MAKER_EXPORT_DIR=$HOME/Downloads/video-makerQuick TTS smoke test:
npm run build
node dist/cli/index.js tts-test --text "你好,这是一段豆包语音测试。"
open ~/Downloads/video-maker/doubao-tts-test.mp3tts-test --out accepts either a directory or an .mp3 file:
node dist/cli/index.js tts-test --out ~/Downloads/video-maker
node dist/cli/index.js tts-test --out ~/Downloads/my-voice-test.mp3Codex Workflow
Once installed, ask Codex:
用 video-maker 把下面文稿生成一个约 60 秒竖屏中文解说视频。请先检查环境,然后创建项目,生成带 shots 的分镜计划,素材要 AI 图 + stock photo + stock video 混合:封面和概念画面用 AI,真实环境和 B-roll 用 stock video,背景/人物/商业场景可用 stock photo。然后生成素材、验证素材、生成语音,最后渲染 mp4,并把最终文件导出到 ~/Downloads/video-maker。The MCP workflow is:
check_environmentcreate_video_projectget_video_plan_schemasave_video_planlist_required_assetsgenerate_assets- Optional fallback:
assert_image_generation_contract+ host image generation - Optional fallback bridge:
save_generated_image verify_assets- Optional:
create_asset_contact_sheet generate_audiorender_video- Optional:
export_video
Important asset contract:
- Default: call
generate_assets. The MCP server generates an AI/stock mixed asset set and writes files directly to the required project asset paths. generate_assetsuses Doubao Seedream for AI image assets, Pexels/Pixabay for stock photo assets, and Pexels/Pixabay stock video for.mp4assets. The MCP default isconcurrency: 2; useconcurrency: 1if the image provider rate-limits.- Do not treat stock as an optional fallback.
save_video_planrejects plans that do not include AI images, stock photos, and stock videos in the required mix. - Every actual visual unit must use an explicit
mediaSource; do not useauto. - Minimum mix: at least 25% AI image visual units, at least 25%
stock_photoimage visual units, and at least 15%stock_videovisual units. - For stock video shots, set
mediaType: "video"and use an.mp4assetPath, for exampleassets/scene_001_shot_02.mp4. - For AI images or stock photos, use image paths such as
.png,.jpg, or.webp. - Provide concise English
stockQueryvalues for stock shots, for example"city traffic night","business meeting close up", or"factory production line". - Host image generation is only a fallback for failed AI image assets listed in
generate_assets.fallbackAllowedAssetPaths, and only when the user explicitly agrees. - Host fallback must never be used for
stock_photoorstock_videopaths. Those must come from Pexels/Pixabay throughgenerate_assets. - Do not call ad-hoc external image URLs/APIs from the host to bypass
generate_assets. - Host fallback requires
generate_image_to_file(prompt, absolutePath): call a real image model and save the generated bitmap directly to the requested local file path. - Fallback bridge: if the host image tool exposes a real generated image as a temporary local file or base64/data URL, call
save_generated_image(projectId, assetPath, sourcePath|imageBase64)to let MCP persist it to the required asset path. - If the host can only show generated images in chat and cannot expose a local file path or bytes/base64, stop. Do not continue with fallback graphics.
list_required_assetsreturns the asset paths the tool will use, includingassets/cover.png, scene image paths such asassets/scene_001.png, and stock video paths such asassets/scene_001_shot_02.mp4.- The cover must be a dedicated, content-rich 9:16 cover image generated for the video. It should not be a first-frame extraction.
render_videouses the dedicated cover as a short opening segment whenassets/cover.pngis available. It does not compose title text, color blocks, or poster typography over the cover; the cover artwork should be finished by the AI image model itself.- Cover prompt guidance: make the AI-generated cover cinematic, premium, stable, and design-led. Prefer a strong central subject, layered depth, controlled color, high-end lighting, mobile-thumbnail readability, and elegant negative space. Avoid readable text, fake letters, logos, UI screenshots, infographic templates, collage grids, and generic stock-photo composition.
- Do not substitute SVGs, chart screenshots, CSS drawings, HTML/canvas/sharp scripted graphics, preview strips, first-frame extractions, or manually created placeholder graphics.
verify_assetsrejects missing files, unreadable files, wrong aspect ratios, tiny files, suspicious placeholder-sized images, and PNG files with same-name SVG sources such asscene_001.svg.create_asset_contact_sheetcreatesoutput/asset_contact_sheet.pngfrom the exact files thatrender_videowill use. Video assets are represented by their first frame. Check this if the preview does not match the intended generated assets.- If you intentionally use very small stylized images, override the size threshold with
VIDEO_MAKER_MIN_ASSET_BYTES.
save_generated_image accepts exactly one source:
{
"projectId": "vid_...",
"assetPath": "assets/scene_001.png",
"sourcePath": "/tmp/host-generated-image.png"
}or:
{
"projectId": "vid_...",
"assetPath": "assets/scene_001.png",
"imageBase64": "data:image/png;base64,..."
}It only writes to asset paths returned by list_required_assets, rejects SVG input, normalizes the bitmap to the target format, and immediately runs asset validation.
Visual Timeline
For richer videos, prefer scenes[].shots[]. A scene is a narration block; a shot is a visual cut inside that block.
Recommended density:
- 30 seconds: 10-18 shots
- 60 seconds: 18-28 shots
- 90 seconds: 28-40 shots
Shot example:
{
"id": "shot_01",
"durationWeight": 1,
"visualPrompt": "vertical editorial photo, tense courtroom and AI search interface atmosphere, no readable text",
"assetPath": "assets/scene_001_shot_01.png",
"mediaSource": "ai",
"mediaType": "image",
"motion": "pan_left",
"transition": "smoothleft",
"overlay": {
"badge": "CASE",
"headline": "责任边界",
"metric": "30s",
"position": "top"
}
}Mixed stock video example:
{
"id": "shot_02",
"durationWeight": 1,
"visualPrompt": "real handheld vertical business district B-roll, commuters and glass buildings, documentary tone",
"assetPath": "assets/scene_001_shot_02.mp4",
"mediaSource": "stock_video",
"mediaType": "video",
"stockQuery": "business district commuters vertical video",
"motion": "still",
"transition": "smoothleft",
"overlay": {
"badge": "B-ROLL",
"headline": "真实现场",
"position": "top"
}
}Recommended source mix:
- Cover: AI image.
- Conceptual or impossible visuals: AI image.
- Real-world motion, atmosphere, city, factory, product use, crowd, nature, technology B-roll: stock video.
- Contextual stills, business scenes, object/background shots: stock photo.
- 30 seconds: target 10-18 shots, usually 3-6 AI image shots, 3-7 stock video shots, 2-5 stock photo shots.
- 60 seconds: target 18-28 shots, usually 5-10 AI image shots, 6-12 stock video shots, 4-8 stock photo shots.
If Seedream returns 429 or another temporary provider error, generate_assets retries AI image requests with exponential backoff and still attempts all stock photo/video downloads. It returns failed and fallbackAllowedAssetPaths instead of abandoning the whole mixed asset job.
Supported motion values:
push_in, pull_back, pan_left, pan_right, tilt_up, tilt_down, stillSupported transition values:
fade, wipeleft, wiperight, slideleft, slideright, smoothleft, noneThe renderer applies Ken Burns-style pan/zoom over the full shot duration, then uses ffmpeg xfade between segments. Overlay text is burned through ASS subtitles, not generated into the image.
Timing Model
The video is audio-driven:
durationSecinvideo_plan.jsonis used as a relative weight for each scene.- If a scene has
shots[], eachdurationWeightdivides that scene's allocated time across faster visual cuts. - After
generate_audio,render_videoreads the real narration duration withffprobe. - The dedicated cover opening, shot durations, and subtitle timing are scaled to fill the audio duration.
- The final MP4 duration is set to the audio duration, so narration is not cut off and the video does not continue after speech ends.
- Duration requests are soft targets unless the user explicitly asks for an exact duration.
generate_audioreturns adurationPolicy; ifdurationPolicy.isAcceptableistrue, do not rewrite or regenerate narration only to get closer to the target duration.
Subtitle Alignment
Subtitles are generated from provider timestamps, not from rough scene durations:
generate_audioasks Doubao/Volcengine TTS for subtitle timestamps.- Timestamped words are saved to
audio/alignment.json. render_videorefuses to render if the project has audio but no alignment file.output/subtitles.srtis built from the returned word timings, grouped into readable caption cues.output/subtitles.assis also generated and burned into the MP4. It uses dynamic font sizing, two-line Chinese wrapping, and shot overlay text. Cover title composition is intentionally not added by the renderer.- If the selected TTS model or voice does not return timestamps,
generate_audiofails instead of falling back to estimated subtitles.
Output
Projects are stored under:
~/.video-maker/projects/<projectId>Final files:
output/final.mp4output/subtitles.srtoutput/subtitles.assassets/cover.pngmanifest.jsonvideo_plan.json
For user-facing delivery, ask Codex to pass outputDir to render_video or call export_video after rendering. If no export directory is provided, export_video copies files to:
~/Downloads/video-maker/<projectId>When available, export_video also copies the dedicated cover to:
cover.pngLocal Development
npm install
npm run typecheck
npm run build
node dist/cli/index.js doctor