web-meta-scraper-mcp

v1.3.0

Published

4 months ago

MCP server for web-meta-scraper — extract metadata from URLs and HTML via Claude Code

0High
0Medium
0Low

cmg8431

web-meta-scraper-mcp

English | 한국어

An MCP (Model Context Protocol) server that wraps web-meta-scraper. Use URL metadata extraction tools directly from MCP clients like Claude Code, Claude Desktop, and Cursor.

Setup & Run

Run directly with npx — no installation required.

npx web-meta-scraper-mcp

MCP Client Configuration

Claude Code

claude mcp add web-meta-scraper -- npx -y web-meta-scraper-mcp

Claude Desktop

Add to your claude_desktop_config.json:

{
  "mcpServers": {
    "web-meta-scraper": {
      "command": "npx",
      "args": ["-y", "web-meta-scraper-mcp"]
    }
  }
}

Cursor

Add to Cursor Settings > MCP:

{
  "mcpServers": {
    "web-meta-scraper": {
      "command": "npx",
      "args": ["-y", "web-meta-scraper-mcp"]
    }
  }
}

Available Tools

`scrape_url`

Extract metadata from a URL.

| Parameter | Type | Required | Description | |-----------|------|----------|-------------| | url | string | Yes | The URL to scrape metadata from |

Example request:

{
  "name": "scrape_url",
  "arguments": {
    "url": "https://github.com"
  }
}

Example response:

{
  "metadata": {
    "title": "GitHub: Let's build from here",
    "description": "GitHub is where over 100 million developers shape the future of software.",
    "image": "https://github.githubassets.com/assets/social-card.png",
    "url": "https://github.com",
    "type": "website",
    "siteName": "GitHub"
  },
  "sources": {
    "meta-tags": { "title": "GitHub: Let's build from here", "..." : "..." },
    "open-graph": { "title": "GitHub: Let's build from here", "..." : "..." },
    "twitter": { "..." : "..." },
    "json-ld": { "..." : "..." }
  }
}

`scrape_html`

Extract metadata from a raw HTML string.

| Parameter | Type | Required | Description | |-----------|------|----------|-------------| | html | string | Yes | The raw HTML string to extract metadata from | | url | string | No | Base URL for resolving relative paths |

`batch_scrape`

Scrape metadata from multiple URLs concurrently.

| Parameter | Type | Required | Description | |-----------|------|----------|-------------| | urls | string[] | Yes | List of URLs to scrape | | concurrency | number | No | Number of concurrent requests (default: 5, max: 20) |

`detect_feeds`

Detect RSS and Atom feed links from a web page.

| Parameter | Type | Required | Description | |-----------|------|----------|-------------| | url | string | No | URL to detect feeds from | | html | string | No | Raw HTML string to detect feeds from |

`check_robots`

Check robots meta tag directives and indexing status.

| Parameter | Type | Required | Description | |-----------|------|----------|-------------| | url | string | No | URL to check robots directives from | | html | string | No | Raw HTML string to check robots directives from |

`validate_metadata`

Validate metadata quality and generate an SEO score report (100-point scale).

| Parameter | Type | Required | Description | |-----------|------|----------|-------------| | url | string | No | URL to validate metadata from | | html | string | No | Raw HTML string to validate metadata from |

`extract_content`

Extract main text content from a web page (removes navigation, ads, sidebars).

| Parameter | Type | Required | Description | |-----------|------|----------|-------------| | url | string | No | URL to extract content from | | html | string | No | Raw HTML string to extract content from |

Extracted Metadata

The server automatically extracts and merges metadata from all built-in plugins using priority-based rules:

| Plugin | Fields | |--------|--------| | Meta Tags | title, description, keywords, author, favicon, canonicalUrl | | Open Graph | og:title, og:description, og:image, og:url, og:type, og:site_name, og:locale | | Twitter Cards | twitter:title, twitter:description, twitter:image, twitter:card, twitter:site, twitter:creator | | JSON-LD | Structured data (Article, Product, Organization, etc.) | | Favicons | All icon links with sizes and type | | Feeds | RSS and Atom feed links with title | | Robots | Robots meta directives and indexability flags |

Local Development

# Install dependencies
pnpm install

# Build
pnpm --filter web-meta-scraper-mcp build

# List tools
echo '{"jsonrpc":"2.0","id":1,"method":"initialize","params":{"protocolVersion":"2024-11-05","capabilities":{},"clientInfo":{"name":"test","version":"1.0.0"}}}
{"jsonrpc":"2.0","id":2,"method":"tools/list"}' | node mcp/dist/index.js

# Test with MCP Inspector
npx @modelcontextprotocol/inspector node mcp/dist/index.js

License

MIT

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

web-meta-scraper-mcp

Setup & Run

MCP Client Configuration

Claude Code

Claude Desktop

Cursor

Available Tools

scrape_url

scrape_html

batch_scrape

detect_feeds

check_robots

validate_metadata

extract_content

Extracted Metadata

Local Development

License

`scrape_url`

`scrape_html`

`batch_scrape`

`detect_feeds`

`check_robots`

`validate_metadata`

`extract_content`