auu-uivision-mcp
v1.0.4
Published
MCP server for analyzing software UI screenshots using Google Gemini API with support for multiple models including gemini-2.0-flash
Maintainers
Readme
UI Vision Analyzer MCP Server
A Model Context Protocol (MCP) server that analyzes software user interface screenshots using Google's Gemini AI vision capabilities. This server provides detailed descriptions of UI elements, layout structure, functionality, and accessibility information.
Features
- UI Analysis: Comprehensive analysis of software interfaces including buttons, forms, navigation, and layout
- Multiple Input Sources: Support for local files, base64 data, and image URLs
- Flexible Prompts: Use default UI analysis prompts or provide custom analysis instructions
- Multiple Models: Support for various Gemini models (2.0 Flash, 1.5 Pro, 1.5 Flash, etc.)
- Format Support: PNG, JPEG, GIF, WebP, BMP image formats
- Size Validation: Configurable image size limits with validation
- Error Handling: Comprehensive error reporting and validation
Installation
Global Installation
npm install -g auu-uivision-mcpLocal Installation
npm install auu-uivision-mcpDirect Usage with npx
# Set API key and run
export GEMINI_API_KEY=your_gemini_api_key_here
npx auu-uivision-mcp
# Or on Windows:
set GEMINI_API_KEY=your_gemini_api_key_here
npx auu-uivision-mcpDevelopment Installation
git clone https://github.com/superauu/auu-uivision-mcp.git
cd auu-uivision-mcp
npm install
npm run buildConfiguration
Required Environment Variables
You must set the GEMINI_API_KEY environment variable before running the server.
Linux/macOS:
export GEMINI_API_KEY=your_gemini_api_key_here
npx auu-uivision-mcpWindows (Command Prompt):
set GEMINI_API_KEY=your_gemini_api_key_here
npx auu-uivision-mcpWindows (PowerShell):
$env:GEMINI_API_KEY="your_gemini_api_key_here"
npx auu-uivision-mcpGet your API key from: https://makersuite.google.com/app/apikey
Optional Environment Variables
# Configure the default Gemini model to use
# Available models: gemini-2.0-flash, gemini-1.5-pro, gemini-1.5-flash, gemini-1.0-pro
export GEMINI_MODEL=gemini-2.0-flash
# Maximum image size in bytes (default: 10MB)
export MAX_IMAGE_SIZE=20971520
# Supported image formats (default: png,jpg,jpeg,gif,webp,bmp)
export SUPPORTED_FORMATS=png,jpg,jpeg,gif,webp,bmpUsage
MCP Client Configuration
Add this server to your MCP client configuration:
Claude Desktop
{
"mcpServers": {
"uivision-analyzer": {
"command": "auu-uivision-mcp",
"env": {
"GEMINI_API_KEY": "your_api_key_here"
}
}
}
}Cline (VS Code Extension)
{
"mcpServers": {
"uivision-analyzer": {
"command": "npx",
"args": ["auu-uivision-mcp"],
"env": {
"GEMINI_API_KEY": "your_api_key_here"
}
}
}
}Available Tools
analyze_ui_screenshot
Analyzes a software UI screenshot and provides detailed description.
Parameters:
image_path(string, optional): Local file path to the screenshotimage_base64(string, optional): Base64 encoded image dataimage_url(string, optional): URL of the image to analyzeprompt(string, optional): Custom analysis promptmodel(string, optional): Gemini model to use
Example Usage:
// Analyze local file
{
"tool": "analyze_ui_screenshot",
"arguments": {
"image_path": "/path/to/screenshot.png",
"model": "gemini-2.0-flash"
}
}
// Analyze image from URL
{
"tool": "analyze_ui_screenshot",
"arguments": {
"image_url": "https://example.com/screenshot.jpg",
"prompt": "Focus on accessibility issues and color contrast"
}
}
// Analyze base64 image
{
"tool": "analyze_ui_screenshot",
"arguments": {
"image_base64": "data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAA...",
"model": "gemini-1.5-pro"
}
}Response Format:
{
"description": "Overall description of the interface and its purpose",
"elements": [
{
"type": "button",
"description": "Primary call-to-action button with 'Get Started' text",
"position": { "x": 250, "y": 400, "width": 120, "height": 40 },
"text": "Get Started",
"interactive": true
}
],
"layout": {
"structure": "Centered card layout",
"organization": "Vertical flow with clear visual hierarchy",
"responsiveness": "Appears to be responsive with adaptive containers",
"visualHierarchy": "Clear with prominent headline and supporting elements"
},
"functionality": [
"User registration/signup workflow",
"Social login integration",
"Form validation and error handling"
],
"accessibility": {
"colorContrast": "Good contrast ratios for text readability",
"textReadability": "Clear fonts with appropriate sizing",
"navigationClarity": "Logical tab order and keyboard navigation",
"altTextStatus": "Images appear to have descriptive alt text"
}
}Development
Project Structure
src/
├── index.ts # Main MCP server entry point
├── gemini-client.ts # Gemini API integration
├── image-processor.ts # Image handling utilities
├── config.ts # Environment configuration
└── types.ts # TypeScript type definitionsScripts
# Development with auto-reload
npm run dev:watch
# Development without auto-reload
npm run dev
# Build for production
npm run build
# Start production server
npm startEnvironment Setup
- Copy
.env.exampleto.env - Add your Gemini API key
- Install dependencies:
npm install - Build the project:
npm run build
Supported Gemini Models
gemini-2.5-pro- Latest high-quality model with advanced reasoning capabilitiesgemini-2.0-flash(default) - Fast, efficient for most UI analysis tasksgemini-1.5-pro- Higher quality analysis, slower processinggemini-1.5-flash- Balanced speed and qualitygemini-1.0-pro- Legacy model support
Model Selection via Environment Variables
You can set the default model using the GEMINI_MODEL environment variable:
# Use the latest high-quality model
GEMINI_MODEL=gemini-2.5-pro
# Or use the fast default model
GEMINI_MODEL=gemini-2.0-flashYou can also specify a different model per request using the model parameter:
{
"tool": "analyze_ui_screenshot",
"arguments": {
"image_path": "/path/to/screenshot.png",
"model": "gemini-2.5-pro" // Override default model for this request
}
}Image Requirements
- Formats: PNG, JPEG, GIF, WebP, BMP
- Maximum Size: 10MB (configurable)
- Recommended Resolution: 1920x1080 or higher for best results
- Content: Clear screenshots without excessive compression artifacts
Error Handling
The server provides detailed error messages for common issues:
- Missing API Key: Configure
GEMINI_API_KEYenvironment variable - Invalid Image: Unsupported format or corrupted file
- Size Limits: Image exceeds maximum allowed size
- Network Errors: Failed to download images from URLs
- API Errors: Gemini API quota limits or service issues
API Rate Limits
- Gemini API has usage quotas and rate limits
- Consider implementing caching for repeated analysis
- Monitor your API usage in the Google Cloud Console
Troubleshooting
Common Issues
"GEMINI_API_KEY environment variable is required"
- Set the
GEMINI_API_KEYin your.envfile or environment variables - Get an API key from https://makersuite.google.com/app/apikey
- Set the
"Failed to connect to Gemini API"
- Verify your API key is valid and active
- Check network connectivity
- Ensure API is enabled in your Google Cloud project
"Image size exceeds maximum allowed size"
- Reduce image size or increase
MAX_IMAGE_SIZElimit - Compress images before analysis
- Reduce image size or increase
"Unsupported image format"
- Use supported formats: PNG, JPEG, GIF, WebP, BMP
- Convert images to supported format before analysis
Debug Mode
Enable debug logging by setting:
DEBUG=uivision:* auu-uivision-mcpLicense
MIT License - see LICENSE file for details.
Contributing
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests if applicable
- Submit a pull request
Support
- Create issues on GitHub for bug reports
- Check the documentation for common solutions
- Review the error messages for specific guidance
Changelog
v1.0.0
- Initial release
- UI screenshot analysis with Gemini API
- Support for multiple image input sources
- Configurable models and parameters
- Comprehensive error handling
