@jkumonpm/vision-mcp
v1.0.2
Published
AI Vision MCP server — analyze images with Gemini AI (free tier)
Maintainers
Readme
@jkumonpm/vision-mcp
AI Vision MCP server. Analyze images, extract text (OCR), and detect objects using Gemini AI.
Free tier model: gemini-3-flash-preview.
Install
npm install -g @jkumonpm/vision-mcpUsage
Claude Desktop / Cursor / OpenCode
Add to your MCP config:
{
"mcpServers": {
"vision": {
"command": "npx",
"args": ["-y", "@jkumonpm/vision-mcp"],
"env": {
"GEMINI_API_KEY": "your-api-key-here"
}
}
}
}Note: You need a Google Gemini API key. Get one for free at Google AI Studio.
Tools
analyze_image — Analyze Image
Analyze one or more images with detailed AI description (objects, environment, colors, mood, etc.).
Input:
{
"image_paths": ["path/to/image.jpg"],
"prompt": "Optional custom prompt (uses detailed analysis by default)"
}Output:
這張圖片是一個典型的終端使用者介面(TUI)截圖...
1. 物體識別:軟體介面元件、標題、方法選擇區...
2. 環境描述:純黑色背景、開發者環境...
...extract_text — Extract Text (OCR)
Extract all text from an image using OCR.
Input:
{
"image_path": "path/to/image.jpg"
}Output:
APIReq
Method: GET POST PUT DELETE PATCH
URL: > https://
Headers: | Content-Type: application/json
...detect_objects — Detect Objects
Detect and list all recognizable objects in an image.
Input:
{
"image_path": "path/to/image.jpg"
}Output:
- 電腦螢幕:顯示 APIReq 介面
- 游標:白色矩形塊狀游標
- 文字:APIReq, Method, URL, Headers, Body
...Supported Image Formats
- JPEG / JPG
- PNG
- GIF
- WebP
- BMP
- SVG
- TIFF
Max file size: 15MB per image. Max images: 5 per request.
Design
| Feature | Why |
|---------|-----|
| Free tier model | Uses gemini-3-flash-preview — no cost for users |
| Environment variable | API key via GEMINI_API_KEY or GOOGLE_API_KEY |
| Inline data | Images sent as base64, no upload needed |
| File size check | Rejects files >15MB to avoid API errors |
| Structured prompts | Detailed analysis, OCR, and object detection prompts |
License
MIT
