mcp-plugin-vision

v0.1.0

Published

10 days ago

MCP server that bridges text-only models with vision capabilities — image recognition and web page reading via any OpenAI-compatible vision model.

Downloads

297

0High
0Medium
0Low

loyalyang

mcp-plugin-vision

MCP server for Claude Code / Claude Desktop that bridges text-only models with vision capabilities:

Image recognition — read images from local path, clipboard, Claude upload, URL, or base64, then send to an OpenAI-compatible vision model
Web page reading — fetch web links, extract readable text, then ask the model to summarize or answer questions

Perfect for using models like DeepSeek V4 that don't natively support image input.

No model included. You bring your own OpenAI-compatible API endpoint and key.

Quick Start (npx, recommended)

Add to your Claude Code .claude.json or Claude Desktop mcpServers config:

{
  "mcpServers": {
    "vision-web-bridge": {
      "command": "npx",
      "args": ["-y", "mcp-plugin-vision"],
      "env": {
        "MODEL_BASE_URL": "https://api.openai.com/v1",
        "MODEL_API_KEY": "your-api-key",
        "MODEL_NAME": "gpt-4o",
        "ALLOW_LOCAL_IMAGE_PATHS": "true",
        "ALLOW_CLIPBOARD_IMAGES": "true"
      }
    }
  }
}

Restart Claude Code, then check /mcp — vision-web-bridge should show ✔ connected.

Provider Examples

Xiaomi MiMo:

"env": {
  "MODEL_BASE_URL": "https://api.xiaomimimo.com/v1",
  "MODEL_API_KEY": "sk-...",
  "MODEL_NAME": "mimo-v2-omni"
}

SiliconFlow:

"env": {
  "MODEL_BASE_URL": "https://api.siliconflow.cn/v1",
  "MODEL_API_KEY": "sk-...",
  "MODEL_NAME": "Qwen/Qwen3-VL-8B-Instruct"
}

OpenAI:

"env": {
  "MODEL_BASE_URL": "https://api.openai.com/v1",
  "MODEL_API_KEY": "sk-...",
  "MODEL_NAME": "gpt-4o"
}

Gemini (via OpenAI-compatible layer):

"env": {
  "MODEL_BASE_URL": "https://generativelanguage.googleapis.com/v1beta/openai",
  "MODEL_API_KEY": "AIza...",
  "MODEL_NAME": "gemini-2.0-flash"
}

Any OpenAI-compatible /v1 endpoint works.

Alternative: Local Install

If you prefer to run from a local checkout:

git clone https://github.com/dangpolly927-eng/mcp-plugin-vision.git
cd mcp-plugin-vision
npm install

Then configure with local path:

{
  "mcpServers": {
    "vision-web-bridge": {
      "command": "node",
      "args": ["D:\\path\\to\\mcp-plugin-vision\\src\\server.mjs"],
      "env": {
        "MODEL_BASE_URL": "https://api.example.com/v1",
        "MODEL_API_KEY": "your-api-key",
        "MODEL_NAME": "your-vision-model"
      }
    }
  }
}

You can also use a .env file with --env-file-if-exists instead of inline env vars. See .env.example.

Requirements

Node.js >= 20

Capabilities

| Capability | macOS | Windows | Linux | | --- | --- | --- | --- | | MCP server | Supported | Supported | Supported | | Local image path | Supported | Supported | Supported | | Clipboard image | Supported | PowerShell / WinForms | wl-paste / xclip | | Claude upload image | Supported | Best effort | Best effort | | Web page reading | Supported | Supported | Supported |

Security Defaults

All dangerous features are opt-in (disabled by default):

| Feature | Default | | --- | --- | | ALLOW_LOCAL_IMAGE_PATHS | false | | ALLOW_CLIPBOARD_IMAGES | false | | ALLOW_PRIVATE_NETWORK_URLS | false | | USE_JINA_READER | false |

Set to "true" in env vars to enable.

Tools

read_image_with_model — Read image from local path, clipboard, URL, base64, or latest Claude upload
read_links_with_model — Fetch and summarize web page content

Usage

Read the latest image uploaded to the Claude client:

Use read_image_with_model with use_latest_upload=true.

Read the current clipboard image:

Use read_image_with_model with use_clipboard=true and use_latest_upload=false.

Read a local image path after enabling ALLOW_LOCAL_IMAGE_PATHS=true:

Use read_image_with_model with image_path="/absolute/path/to/image.png".

Read web links:

Use read_links_with_model to summarize https://example.com/article

Tool Details

`read_image_with_model`

Supported image sources:

latest Claude upload;
public image URL;
base64 image;
data URL;
local image path, opt-in only;
clipboard image, opt-in only.

The tool returns the model response and a non-sensitive source label such as latest uploaded image, clipboard image, or local image path.

`read_links_with_model`

The tool extracts URLs from the user input, fetches readable page content locally, and asks the configured model to summarize or answer questions.

Private-network URLs are blocked by default. Optional Jina Reader fallback can be enabled with USE_JINA_READER=true, which sends the URL to Jina Reader.

Environment Variables

| Variable | Default | Description | | --- | --- | --- | | MODEL_BASE_URL | https://api.example.com/v1 | OpenAI-compatible /v1 endpoint | | OPENAI_BASE_URL | unset | Fallback base URL if MODEL_BASE_URL is not set | | MODEL_API_KEY | unset | API key for the model provider | | MODEL_NAME | replace-with-your-vision-model | Chat or vision model name | | CLAUDE_UPLOAD_DIRS | client-specific defaults | Override upload directories | | CLAUDE_UPLOAD_DIRS_DELIMITER | platform default | Directory list delimiter | | ALLOW_LOCAL_IMAGE_PATHS | false | Allow explicit local image paths | | ALLOW_CLIPBOARD_IMAGES | false | Allow reading image data from clipboard | | ALLOW_PRIVATE_NETWORK_URLS | false | Allow private-network web and image URLs | | USE_JINA_READER | false | Allow Jina Reader fallback | | MAX_IMAGE_BYTES | 10485760 | Maximum image size in bytes |

Development

npm test
npm run check:secrets

Before publishing, run:

npm pack --dry-run

Check the file list carefully. .env, logs, images, local screenshots, and personal paths must not be included.

Windows Notes

Use full absolute paths in claude_desktop_config.json.
Save JSON config as UTF-8 without BOM.
Restart Claude from the system tray after editing config.
Clipboard image reading uses PowerShell / Windows Forms.

License

MIT

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

mcp-plugin-vision

Quick Start (npx, recommended)

Provider Examples

Alternative: Local Install

Requirements

Capabilities

Security Defaults

Tools

Usage

Tool Details

read_image_with_model

read_links_with_model

Environment Variables

Development

Windows Notes

License

`read_image_with_model`

`read_links_with_model`