mcp-server-image-extractor

v1.0.8

Published

6 months ago

MCP server for extracting and categorizing images from web pages with intelligent classification

0High
0Medium
0Low

abhiramk65

mcp mcp-server model-context-protocol image-extraction web-scraping image-classification puppeteer cheerio

Image Extractor MCP Server

An MCP (Model Context Protocol) server that extracts and categorizes images from web pages using intelligent heuristics.

Features

Smart Image Extraction: Extracts images from various sources including:
- <img> tags
- CSS background images
- Meta tags (og:image, twitter:image)
- Favicons and touch icons
Intelligent Classification: Categorizes images into three types:
- Icons: Logos, favicons, small brand images
- Products: E-commerce product images
- Other: Banners, article images, decorative content
Dual Extraction Modes:
- Static Mode: Fast extraction using axios and cheerio
- JavaScript Mode: Full rendering with Puppeteer for dynamic sites
Rich Metadata: Returns comprehensive information for each image:
- Absolute URL
- Dimensions (width/height)
- Alt text and title
- Position on page (header/main/footer)
- Surrounding context
- Classification confidence score

Installation

As an MCP Server

npm install -g mcp-server-image-extractor

For Development

# Download and extract the source code
cd image-extractor
npm install
npm run build

MCP Configuration

Add the server to your MCP settings:

Using npx (recommended)

{
  "mcpServers": {
    "image-extractor": {
      "command": "npx",
      "args": ["-y", "mcp-server-image-extractor"],
      "timeout": 120
    }
  }
}

Note: The first run with npx may take longer as it downloads the package. Set a higher timeout (120 seconds) to accommodate this.

Using global installation (faster startup)

First install globally:

npm install -g mcp-server-image-extractor

Then configure:

{
  "mcpServers": {
    "image-extractor": {
      "command": "mcp-server-image-extractor"
    }
  }
}

Using local installation

For development or local testing:

{
  "mcpServers": {
    "image-extractor": {
      "command": "node",
      "args": ["C:/path/to/image-extractor/build/index.js"]
    }
  }
}

Alternative: Using npx with cache

To avoid timeout issues, you can pre-cache the package:

npx mcp-server-image-extractor --version

Then use the standard npx configuration.

Usage

Once connected, you can use the extract_images tool:

Tool Parameters

url (required): The URL to extract images from
useJavaScript (optional): Use Puppeteer for JavaScript-rendered sites (default: false)
includeDataUrls (optional): Include base64 data URLs (default: false)
minSize (optional): Minimum image size in pixels (default: 0)

Example Request

{
  "url": "https://example.com",
  "useJavaScript": false,
  "includeDataUrls": false,
  "minSize": 100
}

Example Response

{
  "url": "https://example.com",
  "timestamp": "2024-01-07T12:00:00Z",
  "images": {
    "icons": [
      {
        "url": "https://example.com/logo.png",
        "alt": "Company Logo",
        "dimensions": { "width": 150, "height": 50 },
        "confidence": 0.95,
        "position": "header",
        "context": "Main navigation area"
      }
    ],
    "products": [
      {
        "url": "https://example.com/product1.jpg",
        "alt": "Product Image",
        "dimensions": { "width": 500, "height": 500 },
        "confidence": 0.88,
        "position": "main",
        "context": "Product gallery, near price $29.99"
      }
    ],
    "other": [
      {
        "url": "https://example.com/banner.jpg",
        "alt": "Hero Banner",
        "dimensions": { "width": 1200, "height": 400 },
        "confidence": 0.75,
        "position": "main",
        "context": "Hero section"
      }
    ]
  },
  "summary": {
    "total": 25,
    "icons": 5,
    "products": 10,
    "other": 10
  }
}

Classification Heuristics

The server uses multiple factors to classify images:

Icon Detection

Small dimensions (< 200x200px)
Located in header/navigation
Filename contains: logo, icon, favicon, brand
Alt text with company/brand names
Meta favicon tags

Product Detection

Medium to large size (> 300x300px)
Square aspect ratio
Located near price/cart elements
Product-related keywords in alt text
E-commerce context patterns

Context Analysis

Examines surrounding HTML elements
Checks for e-commerce patterns
Analyzes parent container classes
Detects proximity to price elements

Development

Project Structure

image-extractor/
├── src/
│   ├── index.ts        # MCP server entry point
│   ├── extractor.ts    # Core extraction logic
│   ├── classifier.ts   # Image classification
│   ├── utils.ts        # Helper functions
│   └── types.ts        # TypeScript types
├── build/              # Compiled JavaScript
├── package.json
└── tsconfig.json

Building

npm run build    # Compile TypeScript
npm run dev      # Watch mode

Testing

npm test         # Run tests (when implemented)

Use Cases

E-commerce Analysis: Extract product images from online stores
Brand Monitoring: Collect logos and brand images from websites
Content Aggregation: Gather images for content curation
Web Scraping: Extract visual content for analysis
SEO Auditing: Analyze image usage and optimization

Limitations

Image dimension detection requires downloading image headers
JavaScript mode is slower but more accurate for dynamic sites
Classification accuracy depends on page structure and naming conventions
Large pages with many images may take longer to process
Puppeteer requires additional system dependencies for headless Chrome

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

MIT