@lpenguin/nanobanana-mcp

v0.1.4

Published

3 months ago

MCP server for creating and editing images using Google's Gemini 2.5 Flash Image model (nanobanana)

0High
0Medium
0Low

lpenguin

mcp mcp-server image image-editing canvas nanobanana

nanobanana-mcp

A Model Context Protocol (MCP) server for creating and editing images using Google's Gemini 2.5 Flash Image model (nicknamed "nanobanana"). Built with TypeScript, this server provides powerful AI-driven image generation and manipulation capabilities through a simple MCP interface.

Features

Text-to-Image Generation: Create high-quality images from detailed text descriptions
Image Editing: Modify existing images using natural language prompts
Multi-Image Composition: Combine multiple images into creative compositions
Flexible Aspect Ratios: Support for various aspect ratios (1:1, 16:9, 9:16, and more)
Powered by Google Gemini: Uses the state-of-the-art Gemini 2.5 Flash Image model

Prerequisites

Node.js >= 18.0.0
A Google Gemini API key (get one from Google AI Studio)

Installation

npm install -g @lpenguin/nanobanana-mcp

Or install locally in your project:

npm install @lpenguin/nanobanana-mcp

Configuration

Set your Google Gemini API key as an environment variable:

export GEMINI_API_KEY="your-api-key-here"

Usage

As an MCP Server

Add to your MCP client configuration:

{
  "mcpServers": {
    "nanobanana": {
      "command": "npx",
      "args": ["@lpenguin/nanobanana-mcp"],
      "env": {
        "GEMINI_API_KEY": "your-api-key-here"
      }
    }
  }
}

Or if installed globally:

{
  "mcpServers": {
    "nanobanana": {
      "command": "nanobanana-mcp",
      "env": {
        "GEMINI_API_KEY": "your-api-key-here"
      }
    }
  }
}

Available Tools

generate_image

Generate a new image from a text prompt using Google's Gemini 2.5 Flash Image model.

Parameters:

prompt (string, required): Detailed text description of the image to generate. Be specific about style, composition, lighting, colors, and mood.
outputPath (string, required): Path to save the generated image (PNG format)
aspectRatio (string, optional): Aspect ratio for the image - "1:1", "2:3", "3:2", "3:4", "4:3", "4:5", "5:4", "9:16", "16:9", "21:9" (default: "1:1")

Example:

{
  "prompt": "A photorealistic close-up portrait of an elderly Japanese ceramicist with deep, sun-etched wrinkles and a warm, knowing smile. He is carefully inspecting a freshly glazed tea bowl. The setting is his rustic, sun-drenched workshop. The scene is illuminated by soft, golden hour light streaming through a window.",
  "outputPath": "./images/ceramicist.png",
  "aspectRatio": "4:3"
}

edit_image

Edit an existing image using text prompts. Add, remove, or modify elements while preserving the original style and composition.

Parameters:

inputPath (string, required): Path to the input image file
prompt (string, required): Detailed description of what to change, add, or remove. Be specific about preserving unchanged elements.
outputPath (string, required): Path to save the edited image (PNG format)
aspectRatio (string, optional): Aspect ratio for the output image (default: matches input)

Example:

{
  "inputPath": "./images/cat.png",
  "prompt": "Add a small, knitted wizard hat on the cat's head. Make it look like it's sitting comfortably and matches the soft lighting of the photo.",
  "outputPath": "./images/cat_with_hat.png"
}

composite_images

Combine multiple images into a single composition using text prompts. Perfect for product mockups, style transfer, and creative collages.

Parameters:

imagePaths (array of strings, required): Array of paths to input images (up to 3 images recommended)
prompt (string, required): Detailed description of how to combine the images. Reference images by their order (first, second, third).
outputPath (string, required): Path to save the composite image (PNG format)
aspectRatio (string, optional): Aspect ratio for the output image (default: "1:1")

Example:

{
  "imagePaths": ["./images/dress.png", "./images/model.png"],
  "prompt": "Create a professional e-commerce fashion photo. Take the blue floral dress from the first image and let the woman from the second image wear it. Generate a realistic, full-body shot.",
  "outputPath": "./images/fashion_shot.png",
  "aspectRatio": "2:3"
}

Prompting Tips

For best results when generating or editing images:

Be Descriptive: Use detailed, narrative descriptions rather than keyword lists
Specify Style: Mention artistic style, photography terms, lighting, and mood
Use Photography Terms: For realistic images, mention camera angles, lens types, and lighting
Preserve Details: When editing, explicitly state what should remain unchanged
Reference Order: When compositing, refer to images as "first", "second", "third"

Example Prompts

Photorealistic:

A photorealistic close-up portrait of an elderly Japanese ceramicist with deep, 
sun-etched wrinkles and a warm, knowing smile. Captured with an 85mm portrait lens.

Stylized:

A kawaii-style sticker of a happy red panda wearing a tiny bamboo hat. Bold, clean 
outlines, simple cel-shading, vibrant color palette. White background.

Logo/Text:

Create a modern, minimalist logo for a coffee shop called 'The Daily Grind'. 
Clean, bold, sans-serif font. Simple coffee bean icon. Black and white.

Development

Setup

# Clone the repository
git clone https://github.com/lpenguin/nanobanana-mcp.git
cd nanobanana-mcp

# Install dependencies
npm install

# Build
npm run build

# Run integration tests (requires GEMINI_API_KEY)
npm test

# Run linter
npm run lint

# Fix linting issues
npm run lint:fix

# Watch mode for development
npm run watch

Testing

The project includes integration tests that verify the MCP server works correctly:

# Set your API key
export GEMINI_API_KEY="your-api-key-here"

# Run the integration test
npm test

The integration test:

Verifies the server can start successfully
Tests MCP protocol communication
Lists all available tools
Does not execute actual image operations (no API calls)

A GitHub Actions workflow also runs these tests on every push and pull request.

Project Structure

nanobanana-mcp/
├── src/
│   └── index.ts              # Main server implementation
├── dist/                     # Compiled output (generated)
├── .github/
│   └── workflows/
│       ├── publish.yml       # CI/CD workflow
│       └── integration-test.yml  # Integration test workflow
├── package.json              # Package configuration
├── tsconfig.json             # TypeScript configuration
├── .eslintrc.json            # ESLint configuration
├── .gitignore                # Git ignore rules
├── test-integration.js       # Integration test script
└── README.md                 # This file

About Gemini 2.5 Flash Image (Nanobanana)

This MCP server uses Google's Gemini 2.5 Flash Image model, nicknamed "nanobanana" by the developer community. Key features:

Text-to-Image: Generate images from descriptions
Image Editing: Add, remove, or modify elements conversationally
Multi-Image Composition: Combine multiple images
Iterative Refinement: Make progressive adjustments
High-Fidelity Text: Accurate text rendering in images
SynthID Watermark: All generated images include a watermark

Aspect Ratios

Supported aspect ratios and their resolutions:

| Aspect Ratio | Resolution | Use Case | |--------------|------------|----------| | 1:1 | 1024x1024 | Square images, social media | | 2:3 | 832x1248 | Portrait photography | | 3:2 | 1248x832 | Landscape photography | | 3:4 | 864x1184 | Portrait mode | | 4:3 | 1184x864 | Standard display | | 4:5 | 896x1152 | Instagram portrait | | 5:4 | 1152x896 | Medium format | | 9:16 | 768x1344 | Vertical video, stories | | 16:9 | 1344x768 | Widescreen, presentations | | 21:9 | 1536x672 | Ultra-wide, cinematic |

Publishing

The package is automatically published to npm when a new version tag is pushed:

# Update version in package.json, then:
git tag v0.1.0
git push origin v0.1.0

The GitHub Actions workflows will:

Run linting
Build the project
Run integration tests
Publish to npm (if pushing a version tag)

API Costs

Image generation with Gemini 2.5 Flash Image is token-based:

$30 per 1 million tokens for image output
Each image output is tokenized at 1290 tokens (flat rate, up to 1024x1024px)
Approximately $0.039 per image

Limitations

Best performance with English, Spanish (MX), Japanese, Chinese, and Hindi
Image generation does not support audio or video inputs
Model works best with up to 3 input images for composition
Uploading images of children not supported in EEA, CH, and UK
All generated images include a SynthID watermark

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

nanobanana-mcp

Features

Prerequisites

Installation

Configuration

Usage

As an MCP Server

Available Tools

generate_image

edit_image

composite_images

Prompting Tips

Example Prompts

Development

Setup

Testing

Project Structure

About Gemini 2.5 Flash Image (Nanobanana)

Aspect Ratios

Publishing

API Costs

Limitations

Resources

License

Contributing