gstudioaimcp
v1.1.0
Published
MCP server for Google AI Studio media generation — Nano Banana images, Veo video, Gemini TTS — saved to disk, returned as file paths.
Maintainers
Readme
gstudioaimcp
An MCP server for Google AI Studio media generation — Nano Banana images, Veo video, Gemini TTS — saved to disk, returned as file paths.
Features
- Four tools:
list_models,generate_image,generate_video,generate_speech. - Curated model catalog (Gemini 2.5 Flash Image, Gemini 3 Pro Image, Veo 3.0 / 3.1, Gemini 2.5 TTS) — discoverable at runtime via
list_models. - Image edit and image-to-video accept either a file path or an inline
{ base64, mimeType }. - Outputs are written to disk with stable filenames; tool results contain paths plus the SDK's
usageMetadata(token counts) when present. - Single
TToolErrortaxonomy across every failure mode — seesrc/constants.ts(ERROR_CODE). - Bun-first, TypeScript, ESM. Runtime deps:
@google/genai,@modelcontextprotocol/sdk,zod.
Requirements
- Bun ≥ 1.3. Node.js is not supported in v1 (uses
Bun.write/Bun.fileand nativetsconfig.jsonpath resolution). - A Google AI Studio API key — https://aistudio.google.com/apikey.
- Paid tier for Veo. Free-tier keys get
BAD_REQUESTfromgenerate_video.generate_imageandgenerate_speechwork on the free tier within its rate limits.
Install
From npm (recommended)
bun install -g gstudioaimcp
# or: npm install -g gstudioaimcpInstalls the gstudioaimcp CLI on your PATH. You still need Bun ≥ 1.3 installed locally, since the binary's shebang is #!/usr/bin/env bun.
From source
git clone https://github.com/matheusccastroo/gstudioaimcp.git
cd gstudioaimcp
bun install
cp .env.template .env
# edit .env and set ENV_VAR_GEMINI_API_KEYConfiguration
All configuration is via environment variables. The annotated source of truth is .env.template — copy it to .env and edit. Bun auto-loads .env when invoked from the project directory.
| Variable | Default | Notes |
| -------------------------------- | ---------------------------------------------- | ------------------------------------------------------------------------------------------------------------------- |
| ENV_VAR_GEMINI_API_KEY | (required) | Plain GEMINI_API_KEY also works (SDK fallback). |
| ENV_VAR_OUTPUT_DIR | (none — must be set OR passed per tool call) | Subdirs images/, videos/, audio/ are auto-created. |
| ENV_VAR_RETRY_ATTEMPTS | 5 | Total attempts including the first. Set 1 to disable retries (see Gotchas — needed to see full API error bodies). |
| ENV_VAR_REQUEST_TIMEOUT_MS | 60000 | Per-request timeout. |
| ENV_VAR_VIDEO_POLL_INTERVAL_MS | 10000 | Veo poll cadence. Polling is a free control-plane call. |
| ENV_VAR_VIDEO_POLL_TIMEOUT_MS | 900000 | 15 min cap before POLL_TIMEOUT. |
Wire into an MCP client
Test first with MCP Inspector
bun run mcp-inspectorOpens a UI you can use to list and invoke each tool manually. Faster feedback loop than going through Claude.
Claude Code
After bun install -g gstudioaimcp:
claude mcp add gstudioaimcp -s user \
-e ENV_VAR_GEMINI_API_KEY=YOUR_KEY \
-e ENV_VAR_OUTPUT_DIR=$HOME/genai-out \
-- gstudioaimcpRestart Claude Code, then verify with /mcp. Scopes: -s local (current project only, default), -s project (committed to .mcp.json for teammates), -s user (global, shown above).
If you're running from a source clone instead, point at the local entrypoint:
claude mcp add gstudioaimcp -s user \
-e ENV_VAR_GEMINI_API_KEY=YOUR_KEY \
-e ENV_VAR_OUTPUT_DIR=$HOME/genai-out \
-- bun run /absolute/path/to/gstudioaimcp/src/index.tsOther MCP clients
Any stdio-compatible client can spawn the gstudioaimcp binary (after npm install -g gstudioaimcp) or run bun run src/index.ts from a clone.
Heads-up on client choice for image-edit / image-to-video. The input image is read by reference (absolute filesystem path), so the client must run on the same machine that holds the file. Claude Code (CLI), Cursor, Continue, MCP Inspector — all fine. Claude Desktop is NOT a good fit for these flows: files dropped into Desktop land in Claude's server-side sandbox (
/home/claude/...) which the locally-spawned MCP server cannot read. See Gotchas below.
Usage
Once connected, prompt naturally — the LLM will pick the right tool from the schema descriptions.
| Ask | Tool invoked |
| --------------------------------------------------------- | ---------------------------------------------------------- |
| "Generate an image of a calico cat on a beach, 16:9." | generate_image { prompt, aspectRatio: "16:9" } |
| "Make a 5-second video of a sunflower field at sunset." | generate_video { prompt, personGeneration: "allow_all" } |
| "Speak this in voice Kore: 'Hello from Gemini'." | generate_speech { prompt, voiceName: "Kore" } |
A concrete request/response for generate_image:
Request:
{
"name": "generate_image",
"arguments": {
"prompt": "A calico kitten napping in sunshine, soft focus",
"aspectRatio": "16:9",
"outputDir": "/Users/me/genai-out"
}
}Response (tool result):
{
"paths": [
"/Users/me/genai-out/images/generate_image_gemini-2.5-flash-image_2026-05-15T17-42-11-204Z_0.png"
],
"model": "gemini-2.5-flash-image",
"usage": {
"promptTokenCount": 12,
"candidatesTokenCount": 1290,
"totalTokenCount": 1302
}
}Errors come back via MCP isError: true with the TToolError JSON in content[0].text.
Tools
Call list_models for the current per-model constraints (aspect ratios, image sizes, voices, person-generation modes, etc.) — that's the authoritative runtime view.
list_models
- Inputs:
capability: "image" | "video" | "audio". - Output:
{ capability, models: TModelInfo[] }.
generate_image
- Inputs:
prompt,model?,inputImages?,aspectRatio?,imageSize?,outputDir?. - Output:
{ paths: string[], model, usage? }. - Notes:
imageSizeisgemini-3-pro-image-previewonly.
generate_video
- Inputs:
prompt,model?,inputImage?,aspectRatio?,personGeneration?,negativePrompt?,numberOfVideos?,outputDir?. - Output:
{ paths: string[], model }. - Notes: Long-running. The tool polls Veo internally until done and downloads each MP4.
generate_speech
- Inputs:
prompt,model?,voiceName?XORspeakers?(multi-speaker, exactly two),outputDir?. - Output:
{ paths: string[], model, usage? }. - Notes: Writes a
.wav(PCM16 LE wrapped with the sample rate the API actually returned).
Gotchas
- Veo
personGenerationis mode-dependent. Text-to-video requiresallow_all. Image-to-video (when you passinputImage) requiresallow_adult. In EU / UK / CH / MENA, onlyallow_adultis accepted in any mode.dont_allowis a Veo 2.0 value and is rejected by Veo 3.x. The schema's.describe()andlist_modelsnotes spell this out — but you'll still see it. - Veo is paid-tier only. Free-tier keys fail with
BAD_REQUEST. Polling is free; the kickoff is billed per second of video produced. See https://ai.google.dev/pricing. - SDK retry layer hides 4xx error bodies. When
ENV_VAR_RETRY_ATTEMPTS > 1(the default), the SDK'sp-retrywrapper throws a genericAbortErrorfor non-retryable 4xx responses and never reads the response body. We map this toBAD_REQUESTwith a hint, but the actual API explanation is lost. To see the full message, setENV_VAR_RETRY_ATTEMPTS=1and re-run — the SDK then constructs a properApiErrorwith the body. Use this for debugging; switch back to5for normal operation. imageSizeis Pro-only. Passing it withgemini-2.5-flash-imagereturnsINVALID_CONFIGfrom preflight.- TTS voice XOR.
voiceNameandspeakersare mutually exclusive — preflight rejects passing both.speakersmust contain exactly two entries. outputDirresolution. Tool input wins; falls back toENV_VAR_OUTPUT_DIR. If both are absent the call fails withMISSING_OUTPUT_DIR— no silent default.- No retries on download. Veo URIs are short-lived signed URLs; retrying after backoff can land on an expired URL.
- No jitter, no
Retry-After. Inherited from the SDK'sp-retrydefaults. Don't run several parallel clients hard against the rate limit. - Fail-fast on multi-output. If
numberOfVideos > 1(or multi-image generation) and any output is blocked or empty, the whole call fails and partials are dropped. - Input images need local filesystem access — Claude Desktop won't work for them.
inputImages/inputImageaccept a path on the user's local machine. Claude Desktop's file-drop puts the file in Claude's server-side sandbox (/home/claude/...) which the locally-spawned MCP server cannot read; you'll getINVALID_INPUT_FILE. Use a client that runs alongside your files (Claude Code, Cursor, Continue, etc.). If you must use Claude Desktop, save the dropped file to your local disk manually first (drag the chat preview into Finder) and pass that absolute path. - Base64 inputs are supported but expensive — prefer
{ path }.inputImages/inputImageaccept either{ path }(free) or{ base64, mimeType }. The base64 variant inflates the LLM's output-token usage on every call and can rack up cost quickly; only use it when no local path is available. - Call the tools serially, not in parallel. Recommend the client issue one tool call at a time. Parallel calls compound against rate limits (no jitter, no
Retry-Afterhonored — see above), and concurrent Veo polls eat the same quota for no benefit. Most MCP clients (Claude Code, Claude Desktop) already do this by default. - Bun-only for v1. No Node fallback yet.
Errors
Every tool failure returns a JSON TToolError in the MCP content payload, with isError: true:
{
code: TErrorCode; // see ERROR_CODE in src/constants.ts
message: string;
retryable: boolean;
httpStatus?: number;
details?: Record<string, unknown>;
}The most common codes you'll see: MISSING_API_KEY, MISSING_OUTPUT_DIR, INVALID_CONFIG, BAD_REQUEST, AUTH_FAILED, QUOTA_EXCEEDED, CONTENT_BLOCKED_SAFETY, CONTENT_BLOCKED_PROHIBITED, EMPTY_RESPONSE, VIDEO_OP_FAILED, POLL_TIMEOUT. The full set and the retryable subset live in src/constants.ts (ERROR_CODE, RETRYABLE_CODES).
Development
bun install
bun test # 31 tests across wav / errors / models
bun run typecheck # tsc --noEmit
bun run lint # eslint
bun run start # spawn the MCP server on stdio
bun run mcp-inspector # GUI tester
bun run build # bundle to ./dist (Bun target, deps inlined)Lefthook runs prettier → eslint → tsc on every commit, and commitlint on the message. Don't bypass hooks; if a hook fails, fix the cause and create a new commit (don't amend).
Contributing
Before opening a PR, read AGENTS.md, ai/CODE.md, ai/BUN.md, and ai/COMMIT.md. Short summary:
Code style (enforced by ESLint / Prettier):
- No
interface— usetype,T-prefixed (TThing). - No
console.*— uselogStderrfromsrc/log.ts. The MCP stdio transport owns stdout. - No relative imports except
./types/./types.*— use the@src/*alias. - Constants live in
src/constants.tsasas constobjects. - Tool input types are derived via
z.infer<typeof FooInputSchema>; don't duplicate them intypes.ts. - Functional style, avoid mutation when it isn't expensive to do so.
Commits:
- Conventional Commits (
feat:,fix:,chore:,test:,docs:,refactor:). - No co-authors. Break work into small logical commits.
- Run hooks; don't pass
--no-verify.
Adding a model:
- Add the id to
MODELinsrc/constants.ts, then to the matching*_MODELSarray (IMAGE_MODELS/VIDEO_MODELS/AUDIO_MODELS). - Add the entry to
ALL_MODELSinsrc/tools/listModels.tswithdefault,description, and any per-modelnotes. - If it brings new constraints (e.g. a different aspect-ratio set), add a constant and reference it.
Adding a tool:
- Create
src/tools/<name>.tsexporting<name>InputShape,<name>InputSchema, and the handler. - Register it in
src/index.tsviaserver.registerTool(...). Includeannotations— setdestructiveHint: falsefor tools that create files rather than overwrite,readOnlyHint: truefor pure lookups. - Add an error code to
ERROR_CODEif your tool can fail in a new way; map it insrc/errors.ts. - Tests: cover at minimum schema parsing, the error mapping, and any byte-level helpers (
tests/*.test.ts).
License
MIT.
