0eye-vision-mcp
v1.0.3
Published
Give any text-only LLM the power of vision instantly
Readme
0eye-vision-MCP 👁️
Give any text-only LLM the power of vision — instantly.
Most powerful LLMs are blind. They can reason, write, and code — but show them an image and they're lost. 0eye-vision-MCP is an MCP server that bridges that gap. Drop an image path, get back a rich natural language description, and feed it into any text model you're building with.
How it works
- You pass an image file path + a prompt to the tool
- The server reads the image and encodes it to base64
- It sends the encoding to OpenRouter's vision API (powered by Gemini, GPT-4V, etc.)
- OpenRouter returns a detailed description
- Your text-only LLM now "sees" the image through that description
Your App → MCP Tool → base64 encoder → OpenRouter Vision API → description → Your LLMUse Cases
🤖 Augment text-only models Running GPT-3.5, LLaMA 3, Mistral, Mixtral, Phi-2, DeepSeek, or BLOOM? None of these understand images natively. Use 0eye-vision-MCP to give them eyes.
🧪 Rapid prototyping Testing a custom LLM pipeline and need vision capability without retraining? Drop this MCP server in and get vision in minutes.
🖼️ Automated image analysis pipelines Point it at screenshots, product photos, diagrams, or documents — get structured descriptions you can feed downstream.
🔍 Accessibility tooling Build tools that describe images for visually impaired users, powered by any LLM of your choice.
📊 Document understanding Feed in screenshots of charts, tables, or dashboards and let your LLM reason about the visual data.
Supported text-only models (examples)
- GPT-3.5 Turbo / GPT-3 (Davinci)
- LLaMA 2 / LLaMA 3 (base)
- Mistral 7B / Mixtral 8x7B
- BLOOM, Phi-2, Gemma (text-only)
- DeepSeek LLM
Prerequisites
- Node.js 18+
- An OpenRouter API key with access to a vision model
Setup
git clone https://github.com/swarn007-byte/0eye-vision-MCP.git
cd 0eye-vision-MCP
npm installCreate your .env file:
cp .env.example .envEdit .env:
OPENROUTER_API_KEY=sk-or-v1-...
OPENROUTER_MODEL=google/gemini-2.0-flash-lite:freeBuild:
npm run buildMCP Client Config
Add this to your MCP host config (Claude Desktop, OpenCode, etc.):
{
"mcpServers": {
"0eye-vision": {
"command": "node",
"args": ["/absolute/path/to/0eye-vision-MCP/dist/index.js"],
"env": {
"OPENROUTER_API_KEY": "sk-or-v1-...",
"OPENROUTER_MODEL": "google/gemini-2.0-flash-lite:free"
}
}
}
}Tool
NoEyeVision
Analyze any image using a vision model and get a natural language description.
| Argument | Type | Required | Description |
|---|---|---|---|
| prompt | string | ✅ | What you want to know about the image |
| image_file | string | ✅ | Absolute path to the image file |
Example:
{
"prompt": "describe what is happening in this image in detail",
"image_file": "/Users/you/screenshots/dashboard.png"
}Testing with MCP Inspector
npx @modelcontextprotocol/inspector node dist/index.jsOpen the browser, connect, and run the NoEyeVision tool directly.
Project Structure
src/
├── index.ts # MCP server entry point, tool registration
├── openRouter.ts # OpenRouter API client
└── base64converter.ts # Image file → base64 encoderLicense
MIT
