mcp-plugin-vision
v0.1.0
Published
MCP server that bridges text-only models with vision capabilities — image recognition and web page reading via any OpenAI-compatible vision model.
Downloads
297
Readme
mcp-plugin-vision
MCP server for Claude Code / Claude Desktop that bridges text-only models with vision capabilities:
- Image recognition — read images from local path, clipboard, Claude upload, URL, or base64, then send to an OpenAI-compatible vision model
- Web page reading — fetch web links, extract readable text, then ask the model to summarize or answer questions
Perfect for using models like DeepSeek V4 that don't natively support image input.
No model included. You bring your own OpenAI-compatible API endpoint and key.
Quick Start (npx, recommended)
Add to your Claude Code .claude.json or Claude Desktop mcpServers config:
{
"mcpServers": {
"vision-web-bridge": {
"command": "npx",
"args": ["-y", "mcp-plugin-vision"],
"env": {
"MODEL_BASE_URL": "https://api.openai.com/v1",
"MODEL_API_KEY": "your-api-key",
"MODEL_NAME": "gpt-4o",
"ALLOW_LOCAL_IMAGE_PATHS": "true",
"ALLOW_CLIPBOARD_IMAGES": "true"
}
}
}
}Restart Claude Code, then check /mcp — vision-web-bridge should show ✔ connected.
Provider Examples
Xiaomi MiMo:
"env": {
"MODEL_BASE_URL": "https://api.xiaomimimo.com/v1",
"MODEL_API_KEY": "sk-...",
"MODEL_NAME": "mimo-v2-omni"
}SiliconFlow:
"env": {
"MODEL_BASE_URL": "https://api.siliconflow.cn/v1",
"MODEL_API_KEY": "sk-...",
"MODEL_NAME": "Qwen/Qwen3-VL-8B-Instruct"
}OpenAI:
"env": {
"MODEL_BASE_URL": "https://api.openai.com/v1",
"MODEL_API_KEY": "sk-...",
"MODEL_NAME": "gpt-4o"
}Gemini (via OpenAI-compatible layer):
"env": {
"MODEL_BASE_URL": "https://generativelanguage.googleapis.com/v1beta/openai",
"MODEL_API_KEY": "AIza...",
"MODEL_NAME": "gemini-2.0-flash"
}Any OpenAI-compatible /v1 endpoint works.
Alternative: Local Install
If you prefer to run from a local checkout:
git clone https://github.com/dangpolly927-eng/mcp-plugin-vision.git
cd mcp-plugin-vision
npm installThen configure with local path:
{
"mcpServers": {
"vision-web-bridge": {
"command": "node",
"args": ["D:\\path\\to\\mcp-plugin-vision\\src\\server.mjs"],
"env": {
"MODEL_BASE_URL": "https://api.example.com/v1",
"MODEL_API_KEY": "your-api-key",
"MODEL_NAME": "your-vision-model"
}
}
}
}You can also use a .env file with --env-file-if-exists instead of inline env vars. See .env.example.
Requirements
- Node.js >= 20
Capabilities
| Capability | macOS | Windows | Linux |
| --- | --- | --- | --- |
| MCP server | Supported | Supported | Supported |
| Local image path | Supported | Supported | Supported |
| Clipboard image | Supported | PowerShell / WinForms | wl-paste / xclip |
| Claude upload image | Supported | Best effort | Best effort |
| Web page reading | Supported | Supported | Supported |
Security Defaults
All dangerous features are opt-in (disabled by default):
| Feature | Default |
| --- | --- |
| ALLOW_LOCAL_IMAGE_PATHS | false |
| ALLOW_CLIPBOARD_IMAGES | false |
| ALLOW_PRIVATE_NETWORK_URLS | false |
| USE_JINA_READER | false |
Set to "true" in env vars to enable.
Tools
read_image_with_model— Read image from local path, clipboard, URL, base64, or latest Claude uploadread_links_with_model— Fetch and summarize web page content
Usage
Read the latest image uploaded to the Claude client:
Use read_image_with_model with use_latest_upload=true.Read the current clipboard image:
Use read_image_with_model with use_clipboard=true and use_latest_upload=false.Read a local image path after enabling ALLOW_LOCAL_IMAGE_PATHS=true:
Use read_image_with_model with image_path="/absolute/path/to/image.png".Read web links:
Use read_links_with_model to summarize https://example.com/articleTool Details
read_image_with_model
Supported image sources:
- latest Claude upload;
- public image URL;
- base64 image;
- data URL;
- local image path, opt-in only;
- clipboard image, opt-in only.
The tool returns the model response and a non-sensitive source label such as latest uploaded image, clipboard image, or local image path.
read_links_with_model
The tool extracts URLs from the user input, fetches readable page content locally, and asks the configured model to summarize or answer questions.
Private-network URLs are blocked by default. Optional Jina Reader fallback can be enabled with USE_JINA_READER=true, which sends the URL to Jina Reader.
Environment Variables
| Variable | Default | Description |
| --- | --- | --- |
| MODEL_BASE_URL | https://api.example.com/v1 | OpenAI-compatible /v1 endpoint |
| OPENAI_BASE_URL | unset | Fallback base URL if MODEL_BASE_URL is not set |
| MODEL_API_KEY | unset | API key for the model provider |
| MODEL_NAME | replace-with-your-vision-model | Chat or vision model name |
| CLAUDE_UPLOAD_DIRS | client-specific defaults | Override upload directories |
| CLAUDE_UPLOAD_DIRS_DELIMITER | platform default | Directory list delimiter |
| ALLOW_LOCAL_IMAGE_PATHS | false | Allow explicit local image paths |
| ALLOW_CLIPBOARD_IMAGES | false | Allow reading image data from clipboard |
| ALLOW_PRIVATE_NETWORK_URLS | false | Allow private-network web and image URLs |
| USE_JINA_READER | false | Allow Jina Reader fallback |
| MAX_IMAGE_BYTES | 10485760 | Maximum image size in bytes |
Development
npm test
npm run check:secretsBefore publishing, run:
npm pack --dry-runCheck the file list carefully. .env, logs, images, local screenshots, and personal paths must not be included.
Windows Notes
- Use full absolute paths in
claude_desktop_config.json. - Save JSON config as UTF-8 without BOM.
- Restart Claude from the system tray after editing config.
- Clipboard image reading uses PowerShell / Windows Forms.
License
MIT
