crawl4ai-mcp-sse-stdio

v1.2.1

Published

4 months ago

MCP (Model Context Protocol) server for Crawl4AI - Universal web crawling and data extraction. Supports STDIO, SSE, and HTTP transports.

Downloads

🕷️ Crawl4AI MCP Server

MCP (Model Context Protocol) server for Crawl4AI - Universal web crawling and data extraction for AI agents.

Integrate powerful web scraping capabilities into Claude, ChatGPT, and any MCP-compatible AI assistant.

📑 Table of Contents

🚀 Quick Start

NPM Installation (Recommended)

# Install globally
npm install -g crawl4ai-mcp-sse-stdio

# Run in different modes
npx crawl4ai-mcp --stdio --endpoint https://your-crawl4ai-server.com
npx crawl4ai-mcp --sse --port 3001 --endpoint https://your-crawl4ai-server.com
npx crawl4ai-mcp --http --port 3000 --endpoint https://your-crawl4ai-server.com

# With optional bearer token
npx crawl4ai-mcp --stdio --endpoint https://your-crawl4ai-server.com --bearer-token your-token

With Claude Desktop

Add to your claude_desktop_config.json:

{
  "mcpServers": {
    "crawl4ai": {
      "command": "npx",
      "args": [
        "crawl4ai-mcp-sse-stdio",
        "--stdio",
        "--endpoint", "https://your-crawl4ai-server.com",
        "--bearer-token", "your-optional-token"
      ]
    }
  }
}

🐳 Docker Usage

Docker Hub

Official Docker image available on Docker Hub:

# Pull and run the official Docker image
docker pull stgmt/crawl4ai-mcp:latest
docker run -p 3000:3000 stgmt/crawl4ai-mcp:latest

Docker Compose

Create a docker-compose.yml:

version: '3.8'
services:
  crawl4ai-mcp:
    image: stgmt/crawl4ai-mcp:latest
    ports:
      - "3000:3000"
    environment:
      - CRAWL4AI_ENDPOINT=https://your-crawl4ai-server.com
      - CRAWL4AI_BEARER_TOKEN=your-optional-token
    restart: unless-stopped

Run with:

docker-compose up -d

🛠️ Available Tools

1. `crawl` - Full Web Crawling

Extract complete content from any webpage.

{
  "name": "crawl",
  "arguments": {
    "url": "https://example.com",
    "wait_for": "css:.content",
    "timeout": 30000
  }
}

2. `md` - Markdown Extraction

Get clean markdown content from webpages.

{
  "name": "md", 
  "arguments": {
    "url": "https://docs.example.com",
    "clean": true
  }
}

3. `html` - Raw HTML

Retrieve raw HTML content.

{
  "name": "html",
  "arguments": {
    "url": "https://example.com"
  }
}

4. `screenshot` - Visual Capture

Take screenshots of webpages.

{
  "name": "screenshot",
  "arguments": {
    "url": "https://example.com",
    "full_page": true
  }
}

5. `pdf` - PDF Generation

Convert webpages to PDF.

{
  "name": "pdf",
  "arguments": {
    "url": "https://example.com",
    "format": "A4"
  }
}

6. `execute_js` - JavaScript Execution

Execute JavaScript on webpages.

{
  "name": "execute_js",
  "arguments": {
    "url": "https://example.com",
    "script": "document.title"
  }
}

⚙️ Configuration

Environment Variables

# REQUIRED: Crawl4AI endpoint URL
export CRAWL4AI_ENDPOINT="https://your-crawl4ai-server.com"

# OPTIONAL: Bearer authentication token
export CRAWL4AI_BEARER_TOKEN="your-api-token"

Command Line Options

crawl4ai-mcp --help

Options:
  --stdio              Run in STDIO mode for MCP clients
  --sse                Run in SSE mode for web interfaces
  --http               Run in HTTP mode
  --endpoint ENDPOINT  Crawl4AI API endpoint URL (REQUIRED)
  --bearer-token TOKEN Bearer authentication token (OPTIONAL)
  --port PORT          HTTP server port (default: 3000)
  --sse-port PORT      SSE server port (default: 9001)
  --version, -v        Show version

Basic Commands

# HTTP mode (recommended for testing)
crawl4ai-mcp --http --port 3000 --endpoint https://your-crawl4ai-server.com

# SSE mode (Server-Sent Events)
crawl4ai-mcp --sse --port 3001 --endpoint https://your-crawl4ai-server.com

# STDIO mode (for MCP clients)
crawl4ai-mcp --stdio --endpoint https://your-crawl4ai-server.com

# With optional bearer token
crawl4ai-mcp --http --port 3000 --endpoint https://your-crawl4ai-server.com --bearer-token your-token

🤝 Contributing

We welcome contributions! See CONTRIBUTING.md for guidelines.

📄 License

MIT License - see LICENSE file for details.

🔗 Links

NPM Package: https://www.npmjs.com/package/crawl4ai-mcp-sse-stdio
GitHub Repository: https://github.com/stgmt/crawl4ai-mcp
Docker Hub: https://hub.docker.com/r/stgmt/crawl4ai-mcp

Made with ❤️ for the AI community

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme