@haisto/vision-mcp
v0.0.2
Published
Multi-model vision understanding MCP server. Supports OpenAI-compatible and Anthropic vision models.
Maintainers
Readme
vision-mcp
Multi-model vision understanding MCP (Model Context Protocol) server. Supports OpenAI-compatible and Anthropic vision models for image analysis, text extraction, UI comparison, diagram understanding, and more.
Features
- Multi-model support: OpenAI-compatible APIs and Anthropic Claude
- Image analysis: General image understanding, OCR/text extraction
- UI analysis: Convert UI screenshots to code/prompts/specs, diff check
- Error diagnosis: Analyze error screenshots and stack traces
- Technical diagram analysis: Architecture diagrams, flowcharts, UML, ER diagrams
- Data visualization analysis: Charts, graphs, dashboards
- Video analysis: Analyze video content
- Streaming + auto-fallback: Uses streaming API, falls back to non-streaming on failure
Tools
| Tool | Description |
|------|-------------|
| ui_to_artifact | Convert UI screenshots into code, prompts, or design specs |
| text_extraction | OCR and text extraction from screenshots |
| error_diagnosis | Analyze error messages and stack traces |
| diagram_analysis | Understand architecture diagrams, flowcharts, UML |
| data_viz_analysis | Extract insights from charts and dashboards |
| ui_diff_check | Compare two UI screenshots for visual differences |
| general_image_analysis | General-purpose image understanding |
| video_analysis | Analyze video content |
Quick Start
1. Install
npm install -g @haisto/vision-mcpOr run directly without installing:
npx -y @haisto/vision-mcp2. Configure
Set environment variables for your chosen provider:
OpenAI-compatible (default):
MODEL_PROVIDER=openai
OPENAI_API_KEY=sk-...
OPENAI_BASE_URL=https://api.openai.com/v1
OPENAI_MODEL=gpt-4oAnthropic:
MODEL_PROVIDER=anthropic
ANTHROPIC_API_KEY=sk-ant-...
ANTHROPIC_MODEL=claude-sonnet-4-20250514Common settings:
MAX_TOKENS=32768
TEMPERATURE=0.8
TOP_P=0.6
FILE_LOG_ENABLED=true # optional, enable file logging
FILE_LOG_PATH=/path/to/log.log # optional, default ~/.vision-mcp/3. Add to MCP Client
Claude Desktop (claude_desktop_config.json):
{
"mcpServers": {
"vision-mcp": {
"command": "node",
"args": ["build/index.js"],
"env": {
"OPENAI_API_KEY": "sk-...",
"OPENAI_MODEL": "gpt-4o"
}
}
}
}Development
git clone https://github.com/haisto/vision-mcp.git
cd vision-mcp
npm install
npm run build
npm run debug # debug with --inspectEnvironment Variables
| Variable | Default | Required | Description |
|----------|---------|----------|-------------|
| MODEL_PROVIDER | openai | No | openai or anthropic |
| OPENAI_API_KEY | — | No | API key for OpenAI-compatible provider |
| OPENAI_BASE_URL | https://api.openai.com/v1 | No | Base URL for OpenAI-compatible API |
| OPENAI_MODEL | gpt-4o | No | Model name for OpenAI-compatible |
| ANTHROPIC_API_KEY | — | No | API key for Anthropic |
| ANTHROPIC_BASE_URL | https://api.anthropic.com/v1 | No | Base URL for Anthropic API |
| ANTHROPIC_MODEL | claude-sonnet-4-20250514 | No | Model name for Anthropic |
| MAX_TOKENS | 32768 | No | Maximum output tokens |
| TEMPERATURE | 0.8 | No | Sampling temperature |
| TOP_P | 0.6 | No | Nucleus sampling parameter |
| FILE_LOG_ENABLED | false | No | Enable file logging |
| FILE_LOG_PATH | — | No | Custom log file path |
License
MIT
