gstudioaimcp

v1.1.0

Published

a month ago

MCP server for Google AI Studio media generation — Nano Banana images, Veo video, Gemini TTS — saved to disk, returned as file paths.

0High
0Medium
0Low

matheusccastro

mcp model-context-protocol gemini google-genai veo nano-banana ai-studio tts image-generation video-generation bun

gstudioaimcp

An MCP server for Google AI Studio media generation — Nano Banana images, Veo video, Gemini TTS — saved to disk, returned as file paths.

Features

Four tools: list_models, generate_image, generate_video, generate_speech.
Curated model catalog (Gemini 2.5 Flash Image, Gemini 3 Pro Image, Veo 3.0 / 3.1, Gemini 2.5 TTS) — discoverable at runtime via list_models.
Image edit and image-to-video accept either a file path or an inline { base64, mimeType }.
Outputs are written to disk with stable filenames; tool results contain paths plus the SDK's usageMetadata (token counts) when present.
Single TToolError taxonomy across every failure mode — see src/constants.ts (ERROR_CODE).
Bun-first, TypeScript, ESM. Runtime deps: @google/genai, @modelcontextprotocol/sdk, zod.

Requirements

Bun ≥ 1.3. Node.js is not supported in v1 (uses Bun.write / Bun.file and native tsconfig.json path resolution).
A Google AI Studio API key — https://aistudio.google.com/apikey.
Paid tier for Veo. Free-tier keys get BAD_REQUEST from generate_video. generate_image and generate_speech work on the free tier within its rate limits.

Install

From npm (recommended)

bun install -g gstudioaimcp
# or: npm install -g gstudioaimcp

Installs the gstudioaimcp CLI on your PATH. You still need Bun ≥ 1.3 installed locally, since the binary's shebang is #!/usr/bin/env bun.

From source

git clone https://github.com/matheusccastroo/gstudioaimcp.git
cd gstudioaimcp
bun install
cp .env.template .env
# edit .env and set ENV_VAR_GEMINI_API_KEY

Configuration

All configuration is via environment variables. The annotated source of truth is .env.template — copy it to .env and edit. Bun auto-loads .env when invoked from the project directory.

| Variable | Default | Notes | | -------------------------------- | ---------------------------------------------- | ------------------------------------------------------------------------------------------------------------------- | | ENV_VAR_GEMINI_API_KEY | (required) | Plain GEMINI_API_KEY also works (SDK fallback). | | ENV_VAR_OUTPUT_DIR | (none — must be set OR passed per tool call) | Subdirs images/, videos/, audio/ are auto-created. | | ENV_VAR_RETRY_ATTEMPTS | 5 | Total attempts including the first. Set 1 to disable retries (see Gotchas — needed to see full API error bodies). | | ENV_VAR_REQUEST_TIMEOUT_MS | 60000 | Per-request timeout. | | ENV_VAR_VIDEO_POLL_INTERVAL_MS | 10000 | Veo poll cadence. Polling is a free control-plane call. | | ENV_VAR_VIDEO_POLL_TIMEOUT_MS | 900000 | 15 min cap before POLL_TIMEOUT. |

Wire into an MCP client

Test first with MCP Inspector

bun run mcp-inspector

Opens a UI you can use to list and invoke each tool manually. Faster feedback loop than going through Claude.

Claude Code

After bun install -g gstudioaimcp:

claude mcp add gstudioaimcp -s user \
  -e ENV_VAR_GEMINI_API_KEY=YOUR_KEY \
  -e ENV_VAR_OUTPUT_DIR=$HOME/genai-out \
  -- gstudioaimcp

Restart Claude Code, then verify with /mcp. Scopes: -s local (current project only, default), -s project (committed to .mcp.json for teammates), -s user (global, shown above).

If you're running from a source clone instead, point at the local entrypoint:

claude mcp add gstudioaimcp -s user \
  -e ENV_VAR_GEMINI_API_KEY=YOUR_KEY \
  -e ENV_VAR_OUTPUT_DIR=$HOME/genai-out \
  -- bun run /absolute/path/to/gstudioaimcp/src/index.ts

Other MCP clients

Any stdio-compatible client can spawn the gstudioaimcp binary (after npm install -g gstudioaimcp) or run bun run src/index.ts from a clone.

Heads-up on client choice for image-edit / image-to-video. The input image is read by reference (absolute filesystem path), so the client must run on the same machine that holds the file. Claude Code (CLI), Cursor, Continue, MCP Inspector — all fine. Claude Desktop is NOT a good fit for these flows: files dropped into Desktop land in Claude's server-side sandbox (/home/claude/...) which the locally-spawned MCP server cannot read. See Gotchas below.

Usage

Once connected, prompt naturally — the LLM will pick the right tool from the schema descriptions.

| Ask | Tool invoked | | --------------------------------------------------------- | ---------------------------------------------------------- | | "Generate an image of a calico cat on a beach, 16:9." | generate_image { prompt, aspectRatio: "16:9" } | | "Make a 5-second video of a sunflower field at sunset." | generate_video { prompt, personGeneration: "allow_all" } | | "Speak this in voice Kore: 'Hello from Gemini'." | generate_speech { prompt, voiceName: "Kore" } |

A concrete request/response for generate_image:

Request:

{
  "name": "generate_image",
  "arguments": {
    "prompt": "A calico kitten napping in sunshine, soft focus",
    "aspectRatio": "16:9",
    "outputDir": "/Users/me/genai-out"
  }
}

Response (tool result):

{
  "paths": [
    "/Users/me/genai-out/images/generate_image_gemini-2.5-flash-image_2026-05-15T17-42-11-204Z_0.png"
  ],
  "model": "gemini-2.5-flash-image",
  "usage": {
    "promptTokenCount": 12,
    "candidatesTokenCount": 1290,
    "totalTokenCount": 1302
  }
}

Errors come back via MCP isError: true with the TToolError JSON in content[0].text.

Tools

Call list_models for the current per-model constraints (aspect ratios, image sizes, voices, person-generation modes, etc.) — that's the authoritative runtime view.

`list_models`

Inputs: capability: "image" | "video" | "audio".
Output: { capability, models: TModelInfo[] }.

`generate_image`

Inputs: prompt, model?, inputImages?, aspectRatio?, imageSize?, outputDir?.
Output: { paths: string[], model, usage? }.
Notes: imageSize is gemini-3-pro-image-preview only.

`generate_video`

Inputs: prompt, model?, inputImage?, aspectRatio?, personGeneration?, negativePrompt?, numberOfVideos?, outputDir?.
Output: { paths: string[], model }.
Notes: Long-running. The tool polls Veo internally until done and downloads each MP4.

`generate_speech`

Inputs: prompt, model?, voiceName? XOR speakers? (multi-speaker, exactly two), outputDir?.
Output: { paths: string[], model, usage? }.
Notes: Writes a .wav (PCM16 LE wrapped with the sample rate the API actually returned).

Gotchas

Veo personGeneration is mode-dependent. Text-to-video requires allow_all. Image-to-video (when you pass inputImage) requires allow_adult. In EU / UK / CH / MENA, only allow_adult is accepted in any mode. dont_allow is a Veo 2.0 value and is rejected by Veo 3.x. The schema's .describe() and list_models notes spell this out — but you'll still see it.
Veo is paid-tier only. Free-tier keys fail with BAD_REQUEST. Polling is free; the kickoff is billed per second of video produced. See https://ai.google.dev/pricing.
SDK retry layer hides 4xx error bodies. When ENV_VAR_RETRY_ATTEMPTS > 1 (the default), the SDK's p-retry wrapper throws a generic AbortError for non-retryable 4xx responses and never reads the response body. We map this to BAD_REQUEST with a hint, but the actual API explanation is lost. To see the full message, set ENV_VAR_RETRY_ATTEMPTS=1 and re-run — the SDK then constructs a proper ApiError with the body. Use this for debugging; switch back to 5 for normal operation.
imageSize is Pro-only. Passing it with gemini-2.5-flash-image returns INVALID_CONFIG from preflight.
TTS voice XOR. voiceName and speakers are mutually exclusive — preflight rejects passing both. speakers must contain exactly two entries.
outputDir resolution. Tool input wins; falls back to ENV_VAR_OUTPUT_DIR. If both are absent the call fails with MISSING_OUTPUT_DIR — no silent default.
No retries on download. Veo URIs are short-lived signed URLs; retrying after backoff can land on an expired URL.
No jitter, no Retry-After. Inherited from the SDK's p-retry defaults. Don't run several parallel clients hard against the rate limit.
Fail-fast on multi-output. If numberOfVideos > 1 (or multi-image generation) and any output is blocked or empty, the whole call fails and partials are dropped.
Input images need local filesystem access — Claude Desktop won't work for them. inputImages / inputImage accept a path on the user's local machine. Claude Desktop's file-drop puts the file in Claude's server-side sandbox (/home/claude/...) which the locally-spawned MCP server cannot read; you'll get INVALID_INPUT_FILE. Use a client that runs alongside your files (Claude Code, Cursor, Continue, etc.). If you must use Claude Desktop, save the dropped file to your local disk manually first (drag the chat preview into Finder) and pass that absolute path.
Base64 inputs are supported but expensive — prefer { path }. inputImages / inputImage accept either { path } (free) or { base64, mimeType }. The base64 variant inflates the LLM's output-token usage on every call and can rack up cost quickly; only use it when no local path is available.
Call the tools serially, not in parallel. Recommend the client issue one tool call at a time. Parallel calls compound against rate limits (no jitter, no Retry-After honored — see above), and concurrent Veo polls eat the same quota for no benefit. Most MCP clients (Claude Code, Claude Desktop) already do this by default.
Bun-only for v1. No Node fallback yet.

Errors

Every tool failure returns a JSON TToolError in the MCP content payload, with isError: true:

{
  code: TErrorCode;        // see ERROR_CODE in src/constants.ts
  message: string;
  retryable: boolean;
  httpStatus?: number;
  details?: Record<string, unknown>;
}

The most common codes you'll see: MISSING_API_KEY, MISSING_OUTPUT_DIR, INVALID_CONFIG, BAD_REQUEST, AUTH_FAILED, QUOTA_EXCEEDED, CONTENT_BLOCKED_SAFETY, CONTENT_BLOCKED_PROHIBITED, EMPTY_RESPONSE, VIDEO_OP_FAILED, POLL_TIMEOUT. The full set and the retryable subset live in src/constants.ts (ERROR_CODE, RETRYABLE_CODES).

Development

bun install
bun test              # 31 tests across wav / errors / models
bun run typecheck     # tsc --noEmit
bun run lint          # eslint
bun run start         # spawn the MCP server on stdio
bun run mcp-inspector # GUI tester
bun run build         # bundle to ./dist (Bun target, deps inlined)

Lefthook runs prettier → eslint → tsc on every commit, and commitlint on the message. Don't bypass hooks; if a hook fails, fix the cause and create a new commit (don't amend).

Contributing

Before opening a PR, read AGENTS.md, ai/CODE.md, ai/BUN.md, and ai/COMMIT.md. Short summary:

Code style (enforced by ESLint / Prettier):

No interface — use type, T-prefixed (TThing).
No console.* — use logStderr from src/log.ts. The MCP stdio transport owns stdout.
No relative imports except ./types / ./types.* — use the @src/* alias.
Constants live in src/constants.ts as as const objects.
Tool input types are derived via z.infer<typeof FooInputSchema>; don't duplicate them in types.ts.
Functional style, avoid mutation when it isn't expensive to do so.

Commits:

Conventional Commits (feat:, fix:, chore:, test:, docs:, refactor:).
No co-authors. Break work into small logical commits.
Run hooks; don't pass --no-verify.

Adding a model:

Add the id to MODEL in src/constants.ts, then to the matching *_MODELS array (IMAGE_MODELS / VIDEO_MODELS / AUDIO_MODELS).
Add the entry to ALL_MODELS in src/tools/listModels.ts with default, description, and any per-model notes.
If it brings new constraints (e.g. a different aspect-ratio set), add a constant and reference it.

Adding a tool:

Create src/tools/<name>.ts exporting <name>InputShape, <name>InputSchema, and the handler.
Register it in src/index.ts via server.registerTool(...). Include annotations — set destructiveHint: false for tools that create files rather than overwrite, readOnlyHint: true for pure lookups.
Add an error code to ERROR_CODE if your tool can fail in a new way; map it in src/errors.ts.
Tests: cover at minimum schema parsing, the error mapping, and any byte-level helpers (tests/*.test.ts).

License

MIT.

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

gstudioaimcp

Features

Requirements

Install

From npm (recommended)

From source

Configuration

Wire into an MCP client

Test first with MCP Inspector

Claude Code

Other MCP clients

Usage

Tools

list_models

generate_image

generate_video

generate_speech

Gotchas

Errors

Development

Contributing

License

`list_models`

`generate_image`

`generate_video`

`generate_speech`