auu-uivision-mcp

v1.0.4

Published

6 months ago

MCP server for analyzing software UI screenshots using Google Gemini API with support for multiple models including gemini-2.0-flash

0High
0Medium
0Low

superauu

mcp ui vision gemini screenshot analysis interface design accessibility model-context-protocol

UI Vision Analyzer MCP Server

A Model Context Protocol (MCP) server that analyzes software user interface screenshots using Google's Gemini AI vision capabilities. This server provides detailed descriptions of UI elements, layout structure, functionality, and accessibility information.

Features

UI Analysis: Comprehensive analysis of software interfaces including buttons, forms, navigation, and layout
Multiple Input Sources: Support for local files, base64 data, and image URLs
Flexible Prompts: Use default UI analysis prompts or provide custom analysis instructions
Multiple Models: Support for various Gemini models (2.0 Flash, 1.5 Pro, 1.5 Flash, etc.)
Format Support: PNG, JPEG, GIF, WebP, BMP image formats
Size Validation: Configurable image size limits with validation
Error Handling: Comprehensive error reporting and validation

Installation

Global Installation

npm install -g auu-uivision-mcp

Local Installation

npm install auu-uivision-mcp

Direct Usage with npx

# Set API key and run
export GEMINI_API_KEY=your_gemini_api_key_here
npx auu-uivision-mcp

# Or on Windows:
set GEMINI_API_KEY=your_gemini_api_key_here
npx auu-uivision-mcp

Development Installation

git clone https://github.com/superauu/auu-uivision-mcp.git
cd auu-uivision-mcp
npm install
npm run build

Configuration

Required Environment Variables

You must set the GEMINI_API_KEY environment variable before running the server.

Linux/macOS:

export GEMINI_API_KEY=your_gemini_api_key_here
npx auu-uivision-mcp

Windows (Command Prompt):

set GEMINI_API_KEY=your_gemini_api_key_here
npx auu-uivision-mcp

Windows (PowerShell):

$env:GEMINI_API_KEY="your_gemini_api_key_here"
npx auu-uivision-mcp

Get your API key from: https://makersuite.google.com/app/apikey

Optional Environment Variables

# Configure the default Gemini model to use
# Available models: gemini-2.0-flash, gemini-1.5-pro, gemini-1.5-flash, gemini-1.0-pro
export GEMINI_MODEL=gemini-2.0-flash

# Maximum image size in bytes (default: 10MB)
export MAX_IMAGE_SIZE=20971520

# Supported image formats (default: png,jpg,jpeg,gif,webp,bmp)
export SUPPORTED_FORMATS=png,jpg,jpeg,gif,webp,bmp

Usage

MCP Client Configuration

Add this server to your MCP client configuration:

Claude Desktop

{
  "mcpServers": {
    "uivision-analyzer": {
      "command": "auu-uivision-mcp",
      "env": {
        "GEMINI_API_KEY": "your_api_key_here"
      }
    }
  }
}

Cline (VS Code Extension)

{
  "mcpServers": {
    "uivision-analyzer": {
      "command": "npx",
      "args": ["auu-uivision-mcp"],
      "env": {
        "GEMINI_API_KEY": "your_api_key_here"
      }
    }
  }
}

Available Tools

analyze_ui_screenshot

Analyzes a software UI screenshot and provides detailed description.

Parameters:

image_path (string, optional): Local file path to the screenshot
image_base64 (string, optional): Base64 encoded image data
image_url (string, optional): URL of the image to analyze
prompt (string, optional): Custom analysis prompt
model (string, optional): Gemini model to use

Example Usage:

// Analyze local file
{
  "tool": "analyze_ui_screenshot",
  "arguments": {
    "image_path": "/path/to/screenshot.png",
    "model": "gemini-2.0-flash"
  }
}

// Analyze image from URL
{
  "tool": "analyze_ui_screenshot", 
  "arguments": {
    "image_url": "https://example.com/screenshot.jpg",
    "prompt": "Focus on accessibility issues and color contrast"
  }
}

// Analyze base64 image
{
  "tool": "analyze_ui_screenshot",
  "arguments": {
    "image_base64": "data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAA...",
    "model": "gemini-1.5-pro"
  }
}

Response Format:

{
  "description": "Overall description of the interface and its purpose",
  "elements": [
    {
      "type": "button",
      "description": "Primary call-to-action button with 'Get Started' text",
      "position": { "x": 250, "y": 400, "width": 120, "height": 40 },
      "text": "Get Started",
      "interactive": true
    }
  ],
  "layout": {
    "structure": "Centered card layout",
    "organization": "Vertical flow with clear visual hierarchy",
    "responsiveness": "Appears to be responsive with adaptive containers",
    "visualHierarchy": "Clear with prominent headline and supporting elements"
  },
  "functionality": [
    "User registration/signup workflow",
    "Social login integration",
    "Form validation and error handling"
  ],
  "accessibility": {
    "colorContrast": "Good contrast ratios for text readability",
    "textReadability": "Clear fonts with appropriate sizing",
    "navigationClarity": "Logical tab order and keyboard navigation",
    "altTextStatus": "Images appear to have descriptive alt text"
  }
}

Development

Project Structure

src/
├── index.ts              # Main MCP server entry point
├── gemini-client.ts      # Gemini API integration
├── image-processor.ts    # Image handling utilities  
├── config.ts            # Environment configuration
└── types.ts             # TypeScript type definitions

Scripts

# Development with auto-reload
npm run dev:watch

# Development without auto-reload
npm run dev

# Build for production
npm run build

# Start production server
npm start

Environment Setup

Copy .env.example to .env
Add your Gemini API key
Install dependencies: npm install
Build the project: npm run build

Supported Gemini Models

gemini-2.5-pro - Latest high-quality model with advanced reasoning capabilities
gemini-2.0-flash (default) - Fast, efficient for most UI analysis tasks
gemini-1.5-pro - Higher quality analysis, slower processing
gemini-1.5-flash - Balanced speed and quality
gemini-1.0-pro - Legacy model support

Model Selection via Environment Variables

You can set the default model using the GEMINI_MODEL environment variable:

# Use the latest high-quality model
GEMINI_MODEL=gemini-2.5-pro

# Or use the fast default model
GEMINI_MODEL=gemini-2.0-flash

You can also specify a different model per request using the model parameter:

{
  "tool": "analyze_ui_screenshot",
  "arguments": {
    "image_path": "/path/to/screenshot.png",
    "model": "gemini-2.5-pro"  // Override default model for this request
  }
}

Image Requirements

Formats: PNG, JPEG, GIF, WebP, BMP
Maximum Size: 10MB (configurable)
Recommended Resolution: 1920x1080 or higher for best results
Content: Clear screenshots without excessive compression artifacts

Error Handling

The server provides detailed error messages for common issues:

Missing API Key: Configure GEMINI_API_KEY environment variable
Invalid Image: Unsupported format or corrupted file
Size Limits: Image exceeds maximum allowed size
Network Errors: Failed to download images from URLs
API Errors: Gemini API quota limits or service issues

API Rate Limits

Gemini API has usage quotas and rate limits
Consider implementing caching for repeated analysis
Monitor your API usage in the Google Cloud Console

Troubleshooting

Common Issues

"GEMINI_API_KEY environment variable is required"
- Set the GEMINI_API_KEY in your .env file or environment variables
- Get an API key from https://makersuite.google.com/app/apikey
"Failed to connect to Gemini API"
- Verify your API key is valid and active
- Check network connectivity
- Ensure API is enabled in your Google Cloud project
"Image size exceeds maximum allowed size"
- Reduce image size or increase MAX_IMAGE_SIZE limit
- Compress images before analysis
"Unsupported image format"
- Use supported formats: PNG, JPEG, GIF, WebP, BMP
- Convert images to supported format before analysis

Debug Mode

Enable debug logging by setting:

DEBUG=uivision:* auu-uivision-mcp

License

MIT License - see LICENSE file for details.

Contributing

Fork the repository
Create a feature branch
Make your changes
Add tests if applicable
Submit a pull request

Support

Create issues on GitHub for bug reports
Check the documentation for common solutions
Review the error messages for specific guidance

Changelog

v1.0.0

Initial release
UI screenshot analysis with Gemini API
Support for multiple image input sources
Configurable models and parameters
Comprehensive error handling