agentpay-web-scraper-mcp

v1.1.0

Published

8 days ago

Web Scraper MCP Server — Scrape any web page with CSS selectors, AgentPay Pro upsell

0High
0Medium
0Low

rumblingb

mcp web-scraper agentpay ai model-context-protocol

Web Scraper MCP Server

A lightweight, dependency-minimal web scraping server implementing the Model Context Protocol (MCP). Uses httpx for HTTP requests and regex for HTML parsing — no BeautifulSoup, no lxml, no heavy dependencies.

$19/month — Subscribe at stripe.com

Tools

1. `scrape_url(url, selector?)`

Fetch a single URL and extract structured content:

Title — content of the <title> tag
Meta description — content of <meta name="description">
Text content — all visible text with HTML tags stripped
Links — all <a href="..."> with absolute URLs and link text
Images — all <img src="..."> with absolute URLs and alt text

If a CSS selector is provided, returns matching elements with their text, attributes, nested links, and nested images instead.

Supported selector patterns:

| Pattern | Example | Matches | |---------|---------|---------| | Tag | h1, div, p | All elements with that tag | | Class | .content, .article | Elements with the given class | | ID | #main, #header | Element with the given ID | | Attribute | [href], [data-type] | Elements with the attribute | | Attr = value | [type="submit"] | Elements with exact attribute value | | Combined | div.content#main | All of the above together |

Examples:

# Get everything
scrape_url("https://example.com")

# Extract specific elements
scrape_url("https://example.com", "h1")
scrape_url("https://example.com", ".article-body")
scrape_url("https://example.com", "div.content a")

2. `scrape_urls(urls[])`

Batch scrape multiple URLs concurrently. Returns title, text snippet (first 500 chars), link count, and image count for each URL.

Example:

scrape_urls(["https://example.com", "https://example.org"])

3. `extract_emails(url)`

Extract all email addresses from a web page. Scans both visible text and mailto: links. Deduplicates results.

Example:

extract_emails("https://example.com/contact")

4. `extract_links(url, domain_filter?)`

Extract all hyperlinks from a page. Optionally filter by domain (substring match on hostname).

Examples:

# All links
extract_links("https://example.com")

# Only links pointing to a specific domain
extract_links("https://example.com", "docs.python.org")

Installation

# Clone the repository
git clone https://github.com/nousresearch/web-scraper-mcp.git
cd web-scraper-mcp

# Install dependencies
pip install -r requirements.txt

Usage

As a standalone MCP server

python server.py

This starts a stdio-based MCP server. Connect it to any MCP client (Claude Desktop, VS Code extension, etc.).

With Claude Desktop

Add to your claude_desktop_config.json:

{
  "mcpServers": {
    "web-scraper": {
      "command": "python",
      "args": ["/path/to/web-scraper-mcp/server.py"]
    }
  }
}

With VS Code (Cline / Roo Code)

Add to your MCP settings:

{
  "mcpServers": {
    "web-scraper": {
      "command": "python",
      "args": ["/path/to/web-scraper-mcp/server.py"]
    }
  }
}

Deployment

This server is compatible with Smithery.ai. Deploy with one click or via the provided smithery.yaml.

Dependencies

Minimal and intentionally slim:

| Package | Version | Purpose | |---------|---------|---------| | mcp | >= 1.0.0 | Model Context Protocol SDK | | httpx | >= 0.27.0 | Async HTTP client with connection pooling |

Pricing

$19/month — Subscribe on Stripe

License

MIT

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme