@jettoblack/image_mcp
v1.0.1
Published
MCP server for image summarization using OpenAI-compatible chat completion endpoints
Maintainers
Readme
Image Summarization MCP Server
A Model Context Protocol (MCP) server that accepts image files and sends them to an OpenAI-compatible chat completion endpoint for analysis, description, and comparison tasks.
Use Case
Many LLMs used for agentic coding are text-only and lack support for image inputs. This tool allows you to use a secondary model dedicated to describing and analyzing images, without having to use a multi-modal LLM for your primary model. It supports both cloud and local LLMs via any server that supports the OpenAI chat completion endpoint (including llama.cpp / llama-swap, Ollama, open-webui, OpenRouter, etc).
For local models, gemma3:4b-it-qat works quite well with a relatively small footprint and fast performance (even on CPU-only).
Features
- Accepts images via unified
image_urlparameter with multiple input formats - Supports
custom_promptto perform specific tasks other than just general description - Sends images to OpenAI-compatible chat completion endpoints
- Returns detailed image descriptions
- Configurable endpoint URL, API key, and model
- Command-line interface for configuration
- Comprehensive error handling
- TypeScript support
Quick install from NPM
Add this to your global mcp_settings.json or project mcp.json:
"image_summarization": {
"command": "npx",
"args": [
"-y",
"@jettoblack/image_mcp",
"--api-key",
"key",
"--base-url",
"http://localhost:8080/v1",
"--model",
"gemma3:4b-it-qat",
"--timeout",
"120000",
"--max-retries",
"3"
],
"timeout": 300
}Replace the base url, API key, model, etc. as required.
Configuration
The MCP server can be configured using environment variables, command-line arguments, or defaults.
Environment Variables
OPENAI_API_KEY: Your API key for the OpenAI-compatible serviceOPENAI_BASE_URL: The base URL of the OpenAI-compatible service (default:http://localhost:9292/v1)OPENAI_MODEL: The model to use for image analysisOPENAI_TIMEOUT: Request timeout in milliseconds (default: 60000). When running local models you may need to increase this.OPENAI_MAX_RETRIES: Maximum number of retry attempts (default: 3)
Command Line Arguments
npx -y @jettoblack/image_mcp \
--api-key your-api-key \
--base-url https://api.openai.com/v1 \
--model gpt-4-vision-preview \
--timeout 60000 \
--max-retries 5Configuration Priority
- Command-line arguments
- Environment variables
- Default values
Dev Installation
- Clone the repository:
git clone https://github.com/jettoblack/image_mcp.git
cd image_mcp- Install dependencies:
npm install- Build the project:
npm run build- Starting the Server
node build/index.jsThe server will start and listen on stdio for MCP protocol communications.
MCP Tool Installation (local build)
Add this to your global mcp_settings.json or project mcp.json:
"image_summarizer": {
"command": "node",
"args": [
"/path/to/image_mcp/build/index.js",
"--api-key",
"key",
"--base-url",
"http://localhost:9292/v1",
"--model",
"gemma3:4b-it-qat",
"--timeout",
"120000",
"--max-retries",
"3"
],
"timeout": 300,
}Usage
MCP Tools
The server provides two tools for image analysis:
summarize_image
Analyzes and describes a single image in detail.
Parameters
image_url(string): URL to the image file to analyze. Supports:- Absolute file paths
- file:// URLs
- HTTP/HTTPS URLs (will be downloaded and converted to base64)
- Data URLs with base64 encoded image files
custom_prompt(string, optional): Custom prompt to use instead of the default image description prompt
Example Usage
Using file path:
{
"name": "summarize_image",
"arguments": {
"image_url": "/path/to/your/image.jpg"
}
}Using file:// URL:
{
"name": "summarize_image",
"arguments": {
"image_url": "file:///path/to/your/image.jpg"
}
}Using HTTP/HTTPS URL:
{
"name": "summarize_image",
"arguments": {
"image_url": "https://example.com/image.jpg"
}
}Using data URL with base64:
{
"name": "summarize_image",
"arguments": {
"image_url": "data:image/png;base64,iVBORw0KGgoAAAANSUhEUg..."
}
}With custom prompt:
{
"name": "summarize_image",
"arguments": {
"image_url": "/path/to/your/image.jpg",
"custom_prompt": "What objects are visible in this image?"
}
}compare_images
Compares 2 or more images and describes their similarities and differences.
Parameters
image_urls(array of strings): Array of image URLs to compare (minimum 2 images required). Each URL supports:- Absolute file paths
- file:// URLs
- HTTP/HTTPS URLs (will be downloaded and converted to base64)
- Data URLs with base64 encoded image files
custom_prompt(string, optional): Custom prompt to use instead of the default image comparison prompt
Example Usage
Comparing two images:
{
"name": "compare_images",
"arguments": {
"image_urls": [
"/path/to/image1.jpg",
"/path/to/image2.jpg"
]
}
}Comparing multiple images with custom prompt:
{
"name": "compare_images",
"arguments": {
"image_urls": [
"https://example.com/image1.jpg",
"https://example.com/image2.jpg"
],
"custom_prompt": "Compare these UI screenshots and describe the differences in color themes."
}
}Testing
Running Tests
Run the test suite:
npm testThe test suite includes:
- Unit tests for image processing functionality
- Integration tests that require a mock server
- Tests for both
summarize_imageandcompare_imagestools
Mock Server Testing
The project includes a mock OpenAI-compatible server for testing purposes.
- Start the mock server in a separate terminal:
node tests/mock-server.jsThe mock server will start on http://localhost:9293 and provides endpoints for:
GET /v1/models- Lists available modelsPOST /v1/chat/completions- Mock chat completions with image supportPOST /v1/test/image-process- Test endpoint for image processing validation
- Set environment variables for the mock server:
export OPENAI_BASE_URL=http://localhost:9293/v1
export OPENAI_API_KEY=test-key
export OPENAI_MODEL=test-model-vision- Run the integration tests:
npm test tests/integration.test.tsReal OpenAI-Compatible Server Testing
To test with a real OpenAI-compatible endpoint:
- Set up your environment variables:
export OPENAI_API_KEY=your-actual-api-key
export OPENAI_BASE_URL=https://api.openai.com/v1
export OPENAI_MODEL=gpt-4-vision-previewOr for other OpenAI-compatible services:
export OPENAI_API_KEY=your-service-api-key
export OPENAI_BASE_URL=https://your-service-endpoint/v1
export OPENAI_MODEL=your-vision-model- Start the MCP server:
node build/index.js- Send test requests using an MCP client or test the tools directly.
Manual Testing
You can manually test the MCP server using tools like curl or MCP clients:
# Test with a local image file
curl -X POST http://localhost:8080/sse \
-H "Content-Type: application/json" \
-d '{
"jsonrpc": "2.0",
"id": 1,
"method": "tools/call",
"params": {
"name": "summarize_image",
"arguments": {
"image_url": "/path/to/your/test/image.jpg"
}
}
}'API Reference
OpenAI-Compatible API Integration
The server sends requests to the OpenAI-compatible chat completion endpoint with the following structure:
{
"model": "your-model",
"messages": [
{
"role": "user",
"content": [
{
"type": "text",
"text": "Describe this image in detail, including all text."
},
{
"type": "image_url",
"image_url": {
"url": "data:image/png;base64,..."
}
}
]
}
],
"stream": false
}Supported Image Formats
- JPEG (.jpg, .jpeg)
- PNG (.png)
- GIF (.gif)
- WebP (.webp)
- SVG (.svg)
- BMP (.bmp)
- TIFF (.tiff)
Error Handling
The server includes comprehensive error handling for:
- Invalid image files
- Unsupported image formats
- Missing API keys
- Network connectivity issues
- API response errors
Development
Project Structure
src/
├── config.ts # Configuration management
├── image-processor.ts # Image processing utilities
├── index.ts # Main MCP server
└── openai-client.ts # OpenAI-compatible API clientBuilding
npm run buildTesting
npm testLicense
This project is licensed under the MIT License.
Contributing
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests
- Submit a pull request
Support
For issues and questions, please open an issue on the GitHub repository.
Tips
Tips / donations always appreciated to help fund future development.
- PayPal: paypal.me/jettoblack
- Venmo: venmo.com/u/jettoblack
- BTC: bc1qa76jrsvyglxq7t5fxnvfkekjtmp4z82wtm6ywf
- ETH: 0x47fc11F09A427540d10a45491d464F02177EAc66
