@jettoblack/image_mcp

v1.0.1

Published

5 months ago

MCP server for image summarization using OpenAI-compatible chat completion endpoints

0High
0Medium
0Low

jettoblack

mcp server image summarization openai chat completion vision

Image Summarization MCP Server

A Model Context Protocol (MCP) server that accepts image files and sends them to an OpenAI-compatible chat completion endpoint for analysis, description, and comparison tasks.

Use Case

Many LLMs used for agentic coding are text-only and lack support for image inputs. This tool allows you to use a secondary model dedicated to describing and analyzing images, without having to use a multi-modal LLM for your primary model. It supports both cloud and local LLMs via any server that supports the OpenAI chat completion endpoint (including llama.cpp / llama-swap, Ollama, open-webui, OpenRouter, etc).

For local models, gemma3:4b-it-qat works quite well with a relatively small footprint and fast performance (even on CPU-only).

Features

Accepts images via unified image_url parameter with multiple input formats
Supports custom_prompt to perform specific tasks other than just general description
Sends images to OpenAI-compatible chat completion endpoints
Returns detailed image descriptions
Configurable endpoint URL, API key, and model
Command-line interface for configuration
Comprehensive error handling
TypeScript support

Quick install from NPM

Add this to your global mcp_settings.json or project mcp.json:

  "image_summarization": {
    "command": "npx",
    "args": [
      "-y",
      "@jettoblack/image_mcp",
      "--api-key",
      "key",
      "--base-url",
      "http://localhost:8080/v1",
      "--model",
      "gemma3:4b-it-qat",
      "--timeout",
      "120000",
      "--max-retries",
      "3"
    ],
    "timeout": 300
  }

Replace the base url, API key, model, etc. as required.

Configuration

The MCP server can be configured using environment variables, command-line arguments, or defaults.

Environment Variables

OPENAI_API_KEY: Your API key for the OpenAI-compatible service
OPENAI_BASE_URL: The base URL of the OpenAI-compatible service (default: http://localhost:9292/v1)
OPENAI_MODEL: The model to use for image analysis
OPENAI_TIMEOUT: Request timeout in milliseconds (default: 60000). When running local models you may need to increase this.
OPENAI_MAX_RETRIES: Maximum number of retry attempts (default: 3)

Command Line Arguments

npx -y @jettoblack/image_mcp \
  --api-key your-api-key \
  --base-url https://api.openai.com/v1 \
  --model gpt-4-vision-preview \
  --timeout 60000 \
  --max-retries 5

Configuration Priority

Command-line arguments
Environment variables
Default values

Dev Installation

Clone the repository:

git clone https://github.com/jettoblack/image_mcp.git
cd image_mcp

Install dependencies:

npm install

Build the project:

npm run build

Starting the Server

node build/index.js

The server will start and listen on stdio for MCP protocol communications.

MCP Tool Installation (local build)

Add this to your global mcp_settings.json or project mcp.json:

  "image_summarizer": {
    "command": "node",
    "args": [
      "/path/to/image_mcp/build/index.js",
      "--api-key",
      "key",
      "--base-url",
      "http://localhost:9292/v1",
      "--model",
      "gemma3:4b-it-qat",
      "--timeout",
      "120000",
      "--max-retries",
      "3"
    ],
    "timeout": 300,
  }

Usage

MCP Tools

The server provides two tools for image analysis:

`summarize_image`

Analyzes and describes a single image in detail.

Parameters

image_url (string): URL to the image file to analyze. Supports:
- Absolute file paths
- file:// URLs
- HTTP/HTTPS URLs (will be downloaded and converted to base64)
- Data URLs with base64 encoded image files
custom_prompt (string, optional): Custom prompt to use instead of the default image description prompt

Example Usage

Using file path:

{
  "name": "summarize_image",
  "arguments": {
    "image_url": "/path/to/your/image.jpg"
  }
}

Using file:// URL:

{
  "name": "summarize_image",
  "arguments": {
    "image_url": "file:///path/to/your/image.jpg"
  }
}

Using HTTP/HTTPS URL:

{
  "name": "summarize_image",
  "arguments": {
    "image_url": "https://example.com/image.jpg"
  }
}

Using data URL with base64:

{
  "name": "summarize_image",
  "arguments": {
    "image_url": "data:image/png;base64,iVBORw0KGgoAAAANSUhEUg..."
  }
}

With custom prompt:

{
  "name": "summarize_image",
  "arguments": {
    "image_url": "/path/to/your/image.jpg",
    "custom_prompt": "What objects are visible in this image?"
  }
}

`compare_images`

Compares 2 or more images and describes their similarities and differences.

Parameters

image_urls (array of strings): Array of image URLs to compare (minimum 2 images required). Each URL supports:
- Absolute file paths
- file:// URLs
- HTTP/HTTPS URLs (will be downloaded and converted to base64)
- Data URLs with base64 encoded image files
custom_prompt (string, optional): Custom prompt to use instead of the default image comparison prompt

Example Usage

Comparing two images:

{
  "name": "compare_images",
  "arguments": {
    "image_urls": [
      "/path/to/image1.jpg",
      "/path/to/image2.jpg"
    ]
  }
}

Comparing multiple images with custom prompt:

{
  "name": "compare_images",
  "arguments": {
    "image_urls": [
      "https://example.com/image1.jpg",
      "https://example.com/image2.jpg"
    ],
    "custom_prompt": "Compare these UI screenshots and describe the differences in color themes."
  }
}

Testing

Running Tests

Run the test suite:

npm test

The test suite includes:

Unit tests for image processing functionality
Integration tests that require a mock server
Tests for both summarize_image and compare_images tools

Mock Server Testing

The project includes a mock OpenAI-compatible server for testing purposes.

Start the mock server in a separate terminal:

node tests/mock-server.js

The mock server will start on http://localhost:9293 and provides endpoints for:

GET /v1/models - Lists available models
POST /v1/chat/completions - Mock chat completions with image support
POST /v1/test/image-process - Test endpoint for image processing validation

Set environment variables for the mock server:

export OPENAI_BASE_URL=http://localhost:9293/v1
export OPENAI_API_KEY=test-key
export OPENAI_MODEL=test-model-vision

Run the integration tests:

npm test tests/integration.test.ts

Real OpenAI-Compatible Server Testing

To test with a real OpenAI-compatible endpoint:

Set up your environment variables:

export OPENAI_API_KEY=your-actual-api-key
export OPENAI_BASE_URL=https://api.openai.com/v1
export OPENAI_MODEL=gpt-4-vision-preview

Or for other OpenAI-compatible services:

export OPENAI_API_KEY=your-service-api-key
export OPENAI_BASE_URL=https://your-service-endpoint/v1
export OPENAI_MODEL=your-vision-model

Start the MCP server:

node build/index.js

Send test requests using an MCP client or test the tools directly.

Manual Testing

You can manually test the MCP server using tools like curl or MCP clients:

# Test with a local image file
curl -X POST http://localhost:8080/sse \
  -H "Content-Type: application/json" \
  -d '{
    "jsonrpc": "2.0",
    "id": 1,
    "method": "tools/call",
    "params": {
      "name": "summarize_image",
      "arguments": {
        "image_url": "/path/to/your/test/image.jpg"
      }
    }
  }'

API Reference

OpenAI-Compatible API Integration

The server sends requests to the OpenAI-compatible chat completion endpoint with the following structure:

{
  "model": "your-model",
  "messages": [
    {
      "role": "user",
      "content": [
        {
          "type": "text",
          "text": "Describe this image in detail, including all text."
        },
        {
          "type": "image_url",
          "image_url": {
            "url": "data:image/png;base64,..."
          }
        }
      ]
    }
  ],
  "stream": false
}

Supported Image Formats

JPEG (.jpg, .jpeg)
PNG (.png)
GIF (.gif)
WebP (.webp)
SVG (.svg)
BMP (.bmp)
TIFF (.tiff)

Error Handling

The server includes comprehensive error handling for:

Invalid image files
Unsupported image formats
Missing API keys
Network connectivity issues
API response errors

Development

Project Structure

src/
├── config.ts          # Configuration management
├── image-processor.ts # Image processing utilities
├── index.ts          # Main MCP server
└── openai-client.ts  # OpenAI-compatible API client

Building

npm run build

Testing

npm test

License

This project is licensed under the MIT License.

Contributing

Fork the repository
Create a feature branch
Make your changes
Add tests
Submit a pull request

Support

For issues and questions, please open an issue on the GitHub repository.

Tips

Tips / donations always appreciated to help fund future development.

PayPal: paypal.me/jettoblack
Venmo: venmo.com/u/jettoblack
BTC: bc1qa76jrsvyglxq7t5fxnvfkekjtmp4z82wtm6ywf
ETH: 0x47fc11F09A427540d10a45491d464F02177EAc66

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

Image Summarization MCP Server

Use Case

Features

Quick install from NPM

Configuration

Environment Variables

Command Line Arguments

Configuration Priority

Dev Installation

MCP Tool Installation (local build)

Usage

MCP Tools

summarize_image

Parameters

Example Usage

compare_images

Parameters

Example Usage

Testing

Running Tests

Mock Server Testing

Real OpenAI-Compatible Server Testing

Manual Testing

API Reference

OpenAI-Compatible API Integration

Supported Image Formats

Error Handling

Development

Project Structure

Building

Testing

License

Contributing

Support

Tips

`summarize_image`

`compare_images`