@jkumonpm/vision-mcp

v1.0.2

Published

3 months ago

AI Vision MCP server — analyze images with Gemini AI (free tier)

0High
0Medium
0Low

jkumonpm

mcp model-context-protocol vision image-analysis gemini google-ai ocr

@jkumonpm/vision-mcp

AI Vision MCP server. Analyze images, extract text (OCR), and detect objects using Gemini AI.

Free tier model: gemini-3-flash-preview.

Install

npm install -g @jkumonpm/vision-mcp

Usage

Claude Desktop / Cursor / OpenCode

Add to your MCP config:

{
  "mcpServers": {
    "vision": {
      "command": "npx",
      "args": ["-y", "@jkumonpm/vision-mcp"],
      "env": {
        "GEMINI_API_KEY": "your-api-key-here"
      }
    }
  }
}

Note: You need a Google Gemini API key. Get one for free at Google AI Studio.

Tools

`analyze_image` — Analyze Image

Analyze one or more images with detailed AI description (objects, environment, colors, mood, etc.).

Input:

{
  "image_paths": ["path/to/image.jpg"],
  "prompt": "Optional custom prompt (uses detailed analysis by default)"
}

Output:

這張圖片是一個典型的終端使用者介面（TUI）截圖...
1. 物體識別：軟體介面元件、標題、方法選擇區...
2. 環境描述：純黑色背景、開發者環境...
...

`extract_text` — Extract Text (OCR)

Extract all text from an image using OCR.

Input:

{
  "image_path": "path/to/image.jpg"
}

Output:

APIReq
Method: GET POST PUT DELETE PATCH
URL: > https://
Headers: | Content-Type: application/json
...

`detect_objects` — Detect Objects

Detect and list all recognizable objects in an image.

Input:

{
  "image_path": "path/to/image.jpg"
}

Output:

- 電腦螢幕：顯示 APIReq 介面
- 游標：白色矩形塊狀游標
- 文字：APIReq, Method, URL, Headers, Body
...

Supported Image Formats

JPEG / JPG
PNG
GIF
WebP
BMP
SVG
TIFF

Max file size: 15MB per image. Max images: 5 per request.

Design

| Feature | Why | |---------|-----| | Free tier model | Uses gemini-3-flash-preview — no cost for users | | Environment variable | API key via GEMINI_API_KEY or GOOGLE_API_KEY | | Inline data | Images sent as base64, no upload needed | | File size check | Rejects files >15MB to avoid API errors | | Structured prompts | Detailed analysis, OCR, and object detection prompts |

License

MIT

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

@jkumonpm/vision-mcp

Install

Usage

Claude Desktop / Cursor / OpenCode

Tools

analyze_image — Analyze Image

extract_text — Extract Text (OCR)

detect_objects — Detect Objects

Supported Image Formats

Design

License

`analyze_image` — Analyze Image

`extract_text` — Extract Text (OCR)

`detect_objects` — Detect Objects