0eye-vision-mcp

v1.0.3

Published

10 days ago

Give any text-only LLM the power of vision instantly

0High
0Medium
0Low

swarnshekhar

0eye-vision-MCP 👁️

Give any text-only LLM the power of vision — instantly.

Most powerful LLMs are blind. They can reason, write, and code — but show them an image and they're lost. 0eye-vision-MCP is an MCP server that bridges that gap. Drop an image path, get back a rich natural language description, and feed it into any text model you're building with.

How it works

You pass an image file path + a prompt to the tool
The server reads the image and encodes it to base64
It sends the encoding to OpenRouter's vision API (powered by Gemini, GPT-4V, etc.)
OpenRouter returns a detailed description
Your text-only LLM now "sees" the image through that description

Your App → MCP Tool → base64 encoder → OpenRouter Vision API → description → Your LLM

Use Cases

🤖 Augment text-only models Running GPT-3.5, LLaMA 3, Mistral, Mixtral, Phi-2, DeepSeek, or BLOOM? None of these understand images natively. Use 0eye-vision-MCP to give them eyes.

🧪 Rapid prototyping Testing a custom LLM pipeline and need vision capability without retraining? Drop this MCP server in and get vision in minutes.

🖼️ Automated image analysis pipelines Point it at screenshots, product photos, diagrams, or documents — get structured descriptions you can feed downstream.

🔍 Accessibility tooling Build tools that describe images for visually impaired users, powered by any LLM of your choice.

📊 Document understanding Feed in screenshots of charts, tables, or dashboards and let your LLM reason about the visual data.

Supported text-only models (examples)

GPT-3.5 Turbo / GPT-3 (Davinci)
LLaMA 2 / LLaMA 3 (base)
Mistral 7B / Mixtral 8x7B
BLOOM, Phi-2, Gemma (text-only)
DeepSeek LLM

Prerequisites

Node.js 18+
An OpenRouter API key with access to a vision model

Setup

git clone https://github.com/swarn007-byte/0eye-vision-MCP.git
cd 0eye-vision-MCP
npm install

Create your .env file:

cp .env.example .env

Edit .env:

OPENROUTER_API_KEY=sk-or-v1-...
OPENROUTER_MODEL=google/gemini-2.0-flash-lite:free

Build:

npm run build

MCP Client Config

Add this to your MCP host config (Claude Desktop, OpenCode, etc.):

{
  "mcpServers": {
    "0eye-vision": {
      "command": "node",
      "args": ["/absolute/path/to/0eye-vision-MCP/dist/index.js"],
      "env": {
        "OPENROUTER_API_KEY": "sk-or-v1-...",
        "OPENROUTER_MODEL": "google/gemini-2.0-flash-lite:free"
      }
    }
  }
}

Tool

`NoEyeVision`

Analyze any image using a vision model and get a natural language description.

| Argument | Type | Required | Description | |---|---|---|---| | prompt | string | ✅ | What you want to know about the image | | image_file | string | ✅ | Absolute path to the image file |

Example:

{
  "prompt": "describe what is happening in this image in detail",
  "image_file": "/Users/you/screenshots/dashboard.png"
}

Testing with MCP Inspector

npx @modelcontextprotocol/inspector node dist/index.js

Open the browser, connect, and run the NoEyeVision tool directly.

Project Structure

src/
├── index.ts          # MCP server entry point, tool registration
├── openRouter.ts     # OpenRouter API client
└── base64converter.ts # Image file → base64 encoder

License

MIT

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme