npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

@willbohn/spider-mcp

v1.1.0

Published

Model Context Protocol server for Spider Cloud web scraping and crawling API - LinkedIn profiles, anti-bot bypass, and more

Readme

Spider Cloud MCP Server

A high-performance Model Context Protocol (MCP) server that provides comprehensive web scraping, crawling, and data extraction capabilities through the Spider Cloud API. This server enables AI assistants like Claude to interact with web content using Spider Cloud's advanced scraping infrastructure.

🌟 Features

Core Tools

  • spider_scrape - Advanced single-page scraping with JavaScript rendering and anti-bot bypass
  • spider_crawl - Intelligent website crawling with depth control and filtering
  • spider_search - Google-like web search with content fetching capabilities
  • spider_links - Comprehensive link extraction and analysis
  • spider_screenshot - High-quality webpage screenshots with customization
  • spider_transform - HTML to markdown/text conversion with readability processing

Advanced Capabilities

  • 🛡️ Anti-bot Detection Bypass - Stealth mode and advanced evasion techniques
  • 🌐 Premium Proxy Support - Geographic targeting with country-specific proxies
  • 🎭 JavaScript Rendering - Full browser emulation for dynamic content
  • 📊 Metadata Extraction - Comprehensive page metadata and analytics
  • 🔍 CSS Selectors - Precise content targeting and extraction
  • 💾 Cloud Storage - Optional data persistence in Spider Cloud
  • High Performance - Optimized for speed with configurable timeouts
  • 🔒 Secure Authentication - Bearer token authentication with API key
  • 📈 Cost Tracking - Real-time API usage cost monitoring
  • 🐛 Debug Mode - Comprehensive logging for troubleshooting

📋 Prerequisites

🚀 Quick Start

Option 1: Direct from GitHub (Recommended)

# Clone and install
git clone https://github.com/spider-rs/spider-mcp.git
cd spider-mcp
npm install
npm link

# Test the installation
SPIDER_API_KEY=your_key node test.js

Option 2: Direct Path Configuration

Skip installation and point directly to the built files in your MCP client configuration.

⚙️ Configuration

Claude Desktop Setup

Add to your Claude Desktop configuration file:

Windows: %APPDATA%\Claude\claude_desktop_config.json
macOS: ~/Library/Application Support/Claude/claude_desktop_config.json

For Global Installation:

{
  "mcpServers": {
    "spider": {
      "command": "spider-mcp",
      "env": {
        "SPIDER_API_KEY": "your_spider_api_key_here"
      }
    }
  }
}

For Direct Path:

{
  "mcpServers": {
    "spider": {
      "command": "node",
      "args": ["C:\\path\\to\\spider-mcp\\dist\\index.js"],
      "env": {
        "SPIDER_API_KEY": "your_spider_api_key_here"
      }
    }
  }
}

Environment Variables

| Variable | Required | Description | Default | |----------|----------|-------------|---------| | SPIDER_API_KEY | Yes | Your Spider Cloud API key | - | | SPIDER_API_BASE_URL | No | API endpoint URL | https://api.spider.cloud | | SPIDER_REQUEST_TIMEOUT | No | Request timeout in milliseconds | 60000 | | DEBUG | No | Enable debug logging | false |

🛠️ Tool Documentation

spider_scrape

Scrape content from a single URL with advanced options.

Parameters:

  • url (required): Target URL to scrape
  • return_format: Output format (markdown, raw, text, html, screenshot, links)
  • js: Enable JavaScript rendering
  • wait_for: Wait time for page load (0-60000ms)
  • css_selector: CSS selector for specific content
  • proxy_enabled: Use premium proxy
  • proxy_country: Two-letter country code
  • stealth: Enable stealth mode
  • anti_bot: Advanced anti-bot bypass
  • headers: Custom HTTP headers
  • cookies: Cookie string
  • metadata: Include metadata
  • clean_html: Clean and sanitize HTML
  • media: Include media elements

Example:

{
  "url": "https://example.com",
  "return_format": "markdown",
  "js": true,
  "stealth": true,
  "css_selector": ".main-content"
}

spider_crawl

Crawl an entire website with intelligent navigation.

Parameters:

  • url (required): Starting URL
  • limit: Max pages to crawl (1-10000)
  • depth: Max crawl depth (0-10)
  • return_format: Output format
  • whitelist: URL patterns to include
  • blacklist: URL patterns to exclude
  • budget: Crawl budget configuration
  • subdomains: Include subdomains
  • sitemap: Use sitemap.xml
  • respect_robots: Respect robots.txt
  • Plus all proxy and rendering options from scrape

Example:

{
  "url": "https://docs.example.com",
  "limit": 50,
  "depth": 3,
  "whitelist": ["*/api/*"],
  "return_format": "markdown"
}

spider_search

Search the web with Google-like results.

Parameters:

  • query (required): Search query
  • search_limit: Max results (1-100)
  • fetch_page_content: Fetch full content
  • tbs: Time-based search (qdr:d, qdr:w, qdr:m, qdr:y)
  • gl: Country code (e.g., us, uk)
  • hl: Language code (e.g., en, es)
  • safe: SafeSearch level (off, medium, high)
  • Plus content fetching options

Example:

{
  "query": "artificial intelligence news",
  "search_limit": 10,
  "tbs": "qdr:w",
  "gl": "us",
  "fetch_page_content": true
}

spider_links

Extract and analyze links from a webpage.

Parameters:

  • url (required): Target URL
  • limit: Max links (1-5000)
  • depth: Extraction depth (0-5)
  • unique: Return only unique links
  • subdomains: Include subdomain links
  • external: Include external links
  • Plus standard options

spider_screenshot

Capture webpage screenshots.

Parameters:

  • url (required): Target URL
  • fullpage: Full page screenshot
  • viewport_width: Width in pixels (320-3840)
  • viewport_height: Height in pixels (240-2160)
  • format: Image format (png, jpeg, webp)
  • quality: JPEG/WebP quality (0-100)
  • omit_background: Transparent background (PNG only)
  • clip: Region to capture

spider_transform

Transform HTML to clean, readable formats.

Parameters:

  • data (required): HTML/text to transform
  • return_format (required): Target format (markdown, text, raw, clean_html)
  • readability: Apply readability processing
  • clean: Remove unnecessary elements
  • include_links: Include hyperlinks
  • include_images: Include images

🧪 Testing

Run the comprehensive test suite:

# Set your API key
export SPIDER_API_KEY=your_api_key_here

# Run tests
node test.js

# With debug output
DEBUG=true node test.js

📊 API Response Format

All tools return responses in a consistent format:

{
  "success": true,
  "results": [...],
  "count": 10,
  "costs": {
    "total_cost": 0.00012,
    "compute_cost": 0.00008,
    "bandwidth_cost": 0.00004
  },
  "metadata": {
    "duration": 1234,
    "status": 200
  }
}

🔧 Development

Building from Source

npm install
npm run build

Running in Development Mode

npm run dev

Project Structure

spider-mcp/
├── src/
│   └── index.ts        # Main server implementation
├── dist/               # Compiled JavaScript
├── examples/           # Configuration examples
├── package.json        # Dependencies and scripts
├── tsconfig.json       # TypeScript configuration
└── README.md          # This file

🐛 Troubleshooting

Common Issues

"SPIDER_API_KEY environment variable is required"

  • Ensure your API key is set in the environment or configuration
  • Check the key is valid at spider.cloud

"Payment required" error

"Rate limit exceeded"

  • You've hit the API rate limit
  • Wait a few minutes or upgrade your plan

Search tool timeout

  • Search operations can take 15-30 seconds
  • This is normal behavior for comprehensive searches

Debug Mode

Enable detailed logging:

DEBUG=true SPIDER_API_KEY=your_key node dist/index.js

📝 Error Handling

The server provides detailed error messages:

  • 401: Invalid API key
  • 402: Payment required (add credits)
  • 429: Rate limit exceeded
  • 500+: Server errors (contact support)

🔒 Security

  • API keys are never logged or stored
  • All requests use HTTPS
  • Bearer token authentication
  • Input validation on all parameters
  • Sanitized error messages

📈 Performance

  • Configurable timeouts (default: 60s)
  • Automatic retry logic for transient failures
  • Connection pooling for efficiency
  • Response caching at API level
  • Optimized for concurrent requests

🤝 Contributing

Contributions are welcome! Please:

  1. Fork the repository
  2. Create a feature branch
  3. Commit your changes
  4. Push to the branch
  5. Open a Pull Request

📄 License

MIT License - see LICENSE file for details

🔗 Resources

💬 Support


Built with ❤️ for the MCP ecosystem