vertex-image-video-mcp
v1.2.0
Published
MCP server for image and video generation via the Google Vertex AI API
Readme
vertex-image-video-mcp
An MCP server for generating images and videos using the Google Vertex AI API (Vertex AI Express Mode).
Tools
generate_image
Generates an image from a text prompt using Gemini image models on Vertex AI.
| Parameter | Type | Required | Description |
|---|---|---|---|
| prompt | string | yes | Text description of the image to generate |
| model | enum | yes | See models table below |
| aspect_ratio | enum | no | auto (default), 1:1, 9:16, 16:9, 3:4, 4:3, 3:2, 2:3, 5:4, 4:5, 21:9 — supported by all models |
| resolution | enum | no | 512, 1k (default), 2k, 4k — nano_banana_2 and nano_banana_pro only; nano_banana_pro does not support 512; nano_banana does not support resolution changes |
| negative_prompt | string | no | What to exclude from the image |
| reference_images | array | no | Up to 5 objects of { data: string, mimeType: string } — base64 images used as visual context/reference for generation |
aspect_ratio and resolution are enforced as native API parameters (not text hints), so output dimensions are consistent and predictable.
Image models:
| Key | Model ID |
|---|---|
| nano_banana | gemini-2.5-flash-image |
| nano_banana_2 | gemini-3.1-flash-image-preview |
| nano_banana_pro | gemini-3-pro-image-preview |
Always returns both the image as base64 (rendered in display clients like Claude Desktop) and a text message with the path to the saved file (usable by coding agents like Claude Code, Kilo Code, Roo Code, Cline, etc.).
generate_video
Generates a video from a text prompt using Veo models on Vertex AI. Video generation is asynchronous and polls until complete (up to 5 minutes).
The agent will ask the user for duration_seconds, resolution, and aspect_ratio before generating if they have not been specified.
| Parameter | Type | Required | Description |
|---|---|---|---|
| prompt | string | no* | Text description of the video to generate. Optional if start_frame_image is provided |
| model | enum | yes | See models table below |
| duration_seconds | integer | yes | 4, 6, or 8 seconds. Must be 8 when using start/end frames |
| resolution | enum | yes | 720p (all durations), 1080p or 4k (8s only) |
| aspect_ratio | enum | yes | 16:9 or 9:16 |
| start_frame_image | object | no | { data: string, mimeType: string } — base64 image to use as the first frame (image-to-video). Must be paired with end_frame_image |
| end_frame_image | object | no | { data: string, mimeType: string } — base64 image to use as the last frame. Must be paired with start_frame_image |
Start/end frame notes:
- Both
start_frame_imageandend_frame_imagemust be provided together — you cannot use one without the other - Requires
duration_seconds: 8 - The
datafield accepts raw base64 or a full data URL (data:image/jpeg;base64,...) — the prefix is stripped automatically
Video models:
| Key | Model ID |
|---|---|
| veo_3_1 | veo-3.1-generate-001 |
| veo_3_1_fast | veo-3.1-fast-generate-001 |
| veo_3_1_lite | veo-3.1-lite-generate-001 |
Always returns both the video as a base64 blob resource (for display clients) and a text message with the path to the saved file (usable by coding agents like Claude Code, Kilo Code, Roo Code, Cline, etc.).
Requirements
- A Vertex AI Express Mode API key — create one in the Google Cloud Console. This key is tied to your GCP project and uses the Vertex AI endpoint (
aiplatform.googleapis.com), not the Gemini Developer API. - Your GCP project ID (required for video generation)
- Node.js 18+
Configuration
Environment variables
| Variable | Required | Description |
|---|---|---|
| GOOGLE_CLOUD_API_KEY | Yes | Vertex AI Express Mode API key |
| GOOGLE_CLOUD_PROJECT | Yes | GCP project ID (e.g. my-gcp-project) |
| GOOGLE_CLOUD_LOCATION | No | Region, defaults to us-central1 |
Claude Desktop
Edit ~/Library/Application Support/Claude/claude_desktop_config.json (macOS) or %APPDATA%\Claude\claude_desktop_config.json (Windows):
{
"mcpServers": {
"vertex-image-video": {
"command": "npx",
"args": ["-y", "vertex-image-video-mcp"],
"env": {
"GOOGLE_CLOUD_API_KEY": "your-api-key-here",
"GOOGLE_CLOUD_PROJECT": "your-gcp-project-id",
"GOOGLE_CLOUD_LOCATION": "us-central1"
}
}
}
}Claude Code
Edit ~/.claude/settings.json:
{
"mcpServers": {
"vertex-image-video": {
"command": "npx",
"args": ["-y", "vertex-image-video-mcp"],
"env": {
"GOOGLE_CLOUD_API_KEY": "your-api-key-here",
"GOOGLE_CLOUD_PROJECT": "your-gcp-project-id",
"GOOGLE_CLOUD_LOCATION": "us-central1"
}
}
}
}Local development
npm install
npm run dev # run with tsx (no build needed)
npm run build # compile to dist/
npm start # run compiled output