mcp-server-image-extractor
v1.0.8
Published
MCP server for extracting and categorizing images from web pages with intelligent classification
Maintainers
Readme
Image Extractor MCP Server
An MCP (Model Context Protocol) server that extracts and categorizes images from web pages using intelligent heuristics.
Features
Smart Image Extraction: Extracts images from various sources including:
<img>tags- CSS background images
- Meta tags (og:image, twitter:image)
- Favicons and touch icons
Intelligent Classification: Categorizes images into three types:
- Icons: Logos, favicons, small brand images
- Products: E-commerce product images
- Other: Banners, article images, decorative content
Dual Extraction Modes:
- Static Mode: Fast extraction using axios and cheerio
- JavaScript Mode: Full rendering with Puppeteer for dynamic sites
Rich Metadata: Returns comprehensive information for each image:
- Absolute URL
- Dimensions (width/height)
- Alt text and title
- Position on page (header/main/footer)
- Surrounding context
- Classification confidence score
Installation
As an MCP Server
npm install -g mcp-server-image-extractorFor Development
# Download and extract the source code
cd image-extractor
npm install
npm run buildMCP Configuration
Add the server to your MCP settings:
Using npx (recommended)
{
"mcpServers": {
"image-extractor": {
"command": "npx",
"args": ["-y", "mcp-server-image-extractor"],
"timeout": 120
}
}
}Note: The first run with npx may take longer as it downloads the package. Set a higher timeout (120 seconds) to accommodate this.
Using global installation (faster startup)
First install globally:
npm install -g mcp-server-image-extractorThen configure:
{
"mcpServers": {
"image-extractor": {
"command": "mcp-server-image-extractor"
}
}
}Using local installation
For development or local testing:
{
"mcpServers": {
"image-extractor": {
"command": "node",
"args": ["C:/path/to/image-extractor/build/index.js"]
}
}
}Alternative: Using npx with cache
To avoid timeout issues, you can pre-cache the package:
npx mcp-server-image-extractor --versionThen use the standard npx configuration.
Usage
Once connected, you can use the extract_images tool:
Tool Parameters
url(required): The URL to extract images fromuseJavaScript(optional): Use Puppeteer for JavaScript-rendered sites (default: false)includeDataUrls(optional): Include base64 data URLs (default: false)minSize(optional): Minimum image size in pixels (default: 0)
Example Request
{
"url": "https://example.com",
"useJavaScript": false,
"includeDataUrls": false,
"minSize": 100
}Example Response
{
"url": "https://example.com",
"timestamp": "2024-01-07T12:00:00Z",
"images": {
"icons": [
{
"url": "https://example.com/logo.png",
"alt": "Company Logo",
"dimensions": { "width": 150, "height": 50 },
"confidence": 0.95,
"position": "header",
"context": "Main navigation area"
}
],
"products": [
{
"url": "https://example.com/product1.jpg",
"alt": "Product Image",
"dimensions": { "width": 500, "height": 500 },
"confidence": 0.88,
"position": "main",
"context": "Product gallery, near price $29.99"
}
],
"other": [
{
"url": "https://example.com/banner.jpg",
"alt": "Hero Banner",
"dimensions": { "width": 1200, "height": 400 },
"confidence": 0.75,
"position": "main",
"context": "Hero section"
}
]
},
"summary": {
"total": 25,
"icons": 5,
"products": 10,
"other": 10
}
}Classification Heuristics
The server uses multiple factors to classify images:
Icon Detection
- Small dimensions (< 200x200px)
- Located in header/navigation
- Filename contains: logo, icon, favicon, brand
- Alt text with company/brand names
- Meta favicon tags
Product Detection
- Medium to large size (> 300x300px)
- Square aspect ratio
- Located near price/cart elements
- Product-related keywords in alt text
- E-commerce context patterns
Context Analysis
- Examines surrounding HTML elements
- Checks for e-commerce patterns
- Analyzes parent container classes
- Detects proximity to price elements
Development
Project Structure
image-extractor/
├── src/
│ ├── index.ts # MCP server entry point
│ ├── extractor.ts # Core extraction logic
│ ├── classifier.ts # Image classification
│ ├── utils.ts # Helper functions
│ └── types.ts # TypeScript types
├── build/ # Compiled JavaScript
├── package.json
└── tsconfig.jsonBuilding
npm run build # Compile TypeScript
npm run dev # Watch modeTesting
npm test # Run tests (when implemented)Use Cases
- E-commerce Analysis: Extract product images from online stores
- Brand Monitoring: Collect logos and brand images from websites
- Content Aggregation: Gather images for content curation
- Web Scraping: Extract visual content for analysis
- SEO Auditing: Analyze image usage and optimization
Limitations
- Image dimension detection requires downloading image headers
- JavaScript mode is slower but more accurate for dynamic sites
- Classification accuracy depends on page structure and naming conventions
- Large pages with many images may take longer to process
- Puppeteer requires additional system dependencies for headless Chrome
Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
License
MIT
