npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

@jharding_npm/mcp-server-steel-scraper

v1.0.2

Published

MCP server that wraps steel-dev API for web scraping

Readme

MCP Server Steel Scraper

A simple Model Context Protocol (MCP) server that wraps the steel-dev API for visiting websites with browser automation.

Quick Start

  1. Install the package:

    npm install -g @jharding_npm/mcp-server-steel-scraper
  2. Add to your MCP client configuration:

    {
      "mcpServers": {
        "steel-scraper": {
          "command": "npx",
          "args": ["@jharding_npm/mcp-server-steel-scraper"],
          "env": {
            "STEEL_API_URL": "http://localhost:3000"
          }
        }
      }
    }
  3. Start using the visit_with_browser tool in your MCP client!

Features

  • Single Tool: visit_with_browser - Visit websites using steel-dev API
  • Flexible Return Types: HTML, markdown, readability, or cleaned HTML
  • Local/Remote Support: Works with local or remote steel-dev instances
  • Browser Automation: Screenshot capture, PDF generation, proxy support
  • Smart Length Management: Single maxLength parameter with intelligent defaults and automatic content/metadata split
  • Clean Output by Default: Minimal metadata output perfect for 7B models and summarization
  • Verbose Mode: Optional full metadata when detailed information is needed
  • TypeScript: Fully typed implementation

Installation

Option 1: NPM Package (Recommended)

Install the package globally to use it with npx:

npm install -g @jharding_npm/mcp-server-steel-scraper

Or use it directly with npx without installing:

npx @jharding_npm/mcp-server-steel-scraper

Option 2: Local Development

  1. Clone this repository:
git clone <repository-url>
cd mcp-server-steel-scraper
  1. Install dependencies:
npm install
  1. Build the project:
npm run build

Configuration

The server uses environment variables for configuration:

  • STEEL_API_URL: The steel-dev API endpoint (default: http://localhost:3000)
  • STEEL_TIMEOUT: Request timeout in milliseconds (default: 30000)
  • STEEL_RETRIES: Number of retry attempts (default: 3)

Copy env.example to .env and modify as needed:

cp env.example .env

Usage

Running the Server

# Development mode
npm run dev

# Production mode
npm start

MCP Client Configuration

Add this server to your MCP client configuration. Here are examples for popular LLM clients:

For Claude Desktop / Cline / Other MCP Clients (NPM Package)

{
  "mcpServers": {
    "steel-scraper": {
      "command": "npx",
      "args": ["@jharding_npm/mcp-server-steel-scraper"],
      "env": {
        "STEEL_API_URL": "http://localhost:3000"
      }
    }
  }
}

For Continue.dev (NPM Package)

{
  "mcpServers": {
    "steel-scraper": {
      "command": "npx",
      "args": ["@jharding_npm/mcp-server-steel-scraper"],
      "env": {
        "STEEL_API_URL": "http://localhost:3000"
      }
    }
  }
}

For Cursor IDE (NPM Package)

{
  "mcpServers": {
    "steel-scraper": {
      "command": "npx",
      "args": ["@jharding_npm/mcp-server-steel-scraper"],
      "env": {
        "STEEL_API_URL": "http://localhost:3000"
      }
    }
  }
}

For Remote Steel-dev Instance (NPM Package)

{
  "mcpServers": {
    "steel-scraper": {
      "command": "npx",
      "args": ["@jharding_npm/mcp-server-steel-scraper"],
      "env": {
        "STEEL_API_URL": "https://your-steel-dev-instance.com"
      }
    }
  }
}

Alternative: Using Global Installation

If you've installed the package globally with npm install -g @jharding_npm/mcp-server-steel-scraper, you can use:

{
  "mcpServers": {
    "steel-scraper": {
      "command": "mcp-server-steel-scraper",
      "env": {
        "STEEL_API_URL": "http://localhost:3000"
      }
    }
  }
}

For Local Development (using absolute path)

{
  "mcpServers": {
    "steel-scraper": {
      "command": "node",
      "args": ["/path/to/mcp-server-steel-scraper/dist/index.js"],
      "env": {
        "STEEL_API_URL": "http://localhost:3000"
      }
    }
  }
}

Tool Usage

The server provides one tool: visit_with_browser

Parameters

  • url (required): The URL to visit
  • format (optional): Content formats to extract - ["html"] for raw HTML source (may be very large), ["markdown"] for clean formatted text converted from HTML (recommended for reading), ["readability"] for Mozilla Readability format, ["cleaned_html"] for cleaned HTML. You can request multiple formats (default: ["markdown"])
  • screenshot (optional): Take a screenshot of the page (returns base64 encoded image) (default: false)
  • pdf (optional): Generate a PDF of the page (returns base64 encoded PDF) (default: false)
  • proxyUrl (optional): Proxy URL to use for the request (e.g., "http://proxy:port")
  • delay (optional): Delay in seconds to wait after page load before scraping (default: 0)
  • logUrl (optional): URL to send logs to for debugging purposes
  • maxLength (optional): Maximum characters to return. Smart defaults: markdown=8000, readability=10000, html=15000, cleaned_html=12000. For markdown, automatically reserves space for metadata
  • verboseMode (optional): Return full metadata instead of clean content-focused output (default: false). Use when you need detailed visit information

Example Usage

// Basic website visit
{
  "tool": "visit_with_browser",
  "arguments": {
    "url": "https://example.com"
  }
}

// Advanced visit with multiple formats
{
  "tool": "visit_with_browser",
  "arguments": {
    "url": "https://example.com",
    "format": ["markdown", "html"],
    "screenshot": true,
    "delay": 2
  }
}

// Simple visit with smart defaults (perfect for 7B models)
{
  "tool": "visit_with_browser",
  "arguments": {
    "url": "https://example.com",
    "format": ["markdown"]
  }
}

// Custom length limit (automatically handles content vs metadata split)
{
  "tool": "visit_with_browser",
  "arguments": {
    "url": "https://en.wikipedia.org/wiki/Long_Article",
    "format": ["markdown"],
    "maxLength": 5000
  }
}

// Verbose mode when you need detailed visit information
{
  "tool": "visit_with_browser",
  "arguments": {
    "url": "https://example.com",
    "format": ["markdown"],
    "maxLength": 8000,
    "verboseMode": true
  }
}

// With proxy and PDF generation
{
  "tool": "visit_with_browser",
  "arguments": {
    "url": "https://example.com",
    "format": ["readability"],
    "pdf": true,
    "proxyUrl": "http://proxy:8080"
  }
}

Smart Length Management

The server automatically handles content length optimization:

  • Unified Length Control: Single maxLength parameter handles both content and metadata
  • Automatic Content/Metadata Split: For markdown, reserves 10% for metadata, uses 90% for content
  • Smart Defaults: Reasonable defaults when no length is specified (markdown=8000, text=10000, html=15000, json=5000)
  • Better Truncation: Avoids double-truncation issues that could result in incomplete content
  • Conversion Detection: Automatically detects when HTML-to-markdown conversion may have failed
  • Warning System: Provides warnings when content appears truncated or incomplete

How It Works

// Simple usage - uses smart defaults
{
  "url": "https://example.com",
  "format": ["markdown"]
  // Automatically uses 8000 characters, reserves 800 for metadata, 7200 for content
}

// Custom length - automatically splits appropriately
{
  "url": "https://example.com", 
  "format": ["markdown"],
  "maxLength": 5000
  // Uses 5000 total, reserves 500 for metadata, 4500 for content
}

This approach ensures you get complete, properly formatted content while maintaining simple, intuitive parameter management.

Handling Large Pages (Like Amazon)

For large, complex pages like Amazon.com, follow these best practices:

Recommended Approach for Complex Pages

{
  "tool": "visit_with_browser",
  "arguments": {
    "url": "https://www.amazon.com",
    "format": ["readability"],  // Most reliable for complex pages
    "maxLength": 5000,          // Reasonable limit for large pages
    "delay": 3                  // Wait for main content to load
  }
}

Format Comparison for Large Pages

  • HTML: Returns raw HTML source (can be 900,000+ characters for Amazon)
  • Readability: Mozilla Readability format (most reliable, good for complex pages)
  • Markdown: Converts HTML to clean, readable text (may fail on complex pages like Amazon)
  • Cleaned HTML: Cleaned HTML with better structure

Note: Markdown conversion may fail on complex, JavaScript-heavy pages like Amazon. Use ["readability"] for the most reliable results.

Troubleshooting

If you get HTML instead of Markdown:

  • The steel-dev API may not support markdown conversion for that page type
  • Try using format: ["readability"] instead for better text extraction
  • Complex pages with heavy JavaScript may not convert properly

If you get truncated content:

  • The page may be too large for the specified maxLength
  • Try increasing maxLength or using a longer delay
  • Consider using format: ["readability"] for more reliable truncation

For Dynamic Content

Use delay parameter to wait for content to load:

{
  "tool": "visit_with_browser",
  "arguments": {
    "url": "https://www.amazon.com",
    "format": ["markdown"],
    "delay": 5,                 // Wait 5 seconds for content to load
    "maxLength": 10000          // Longer content for complex pages
  }
}

Clean Output by Default

The server is designed with 7B models in mind, providing clean, content-focused output by default:

  • Content Summarization: Perfect for weaker models that need to summarize web content
  • Content Analysis: Ideal for processing large amounts of text
  • Context Optimization: Maximizes the content-to-metadata ratio automatically

How It Works

Default Mode (clean output):

# Article Title
This is the actual content...

Verbose Mode (verboseMode: true):

SUCCESS: Successfully scraped https://example.com
Method: full-browser-automation (stealth browser, anti-detection)
Format: markdown
Status Code: 200
Processing Time: 1250ms
Content Length: 5000 characters
Content Type: text/html
Timestamp: 2024-01-15T10:30:00.000Z
Title: Article Title
Description: Article description
Language: en
Screenshot: Available (base64)
Links Found: 15

SCRAPED CONTENT:
# Article Title
This is the actual content...

Benefits of Clean Output

  • Maximum Content Space: Removes ~200-300 characters of metadata overhead
  • Cleaner Output: Direct content without verbose headers
  • Better for 7B Models: Focuses the model's attention on the actual content
  • Preserves Warnings: Still shows important warnings if conversion issues occur

Recommended Usage

For summarization tasks, use the default clean output:

{
  "tool": "visit_with_browser",
  "arguments": {
    "url": "https://article-to-summarize.com",
    "format": ["markdown"],
    "maxLength": 10000  // Automatically optimizes content vs metadata split
  }
}

Steel-dev API Requirements

This MCP server expects a steel-dev API instance running with the following endpoints:

  • POST /scrape - Main scraping endpoint
  • GET /health - Health check endpoint (optional)
  • GET /info - API information endpoint (optional)

Expected Request Format

{
  "url": "https://example.com",
  "format": ["html", "markdown"],
  "screenshot": true,
  "pdf": false,
  "proxyUrl": "http://proxy:8080",
  "delay": 2,
  "logUrl": "https://logs.example.com"
}

Expected Response Format

{
  "content": {
    "html": "<html>...</html>",
    "markdown": "# Title\nContent..."
  },
  "metadata": {
    "title": "Page Title",
    "description": "Page description",
    "statusCode": 200,
    "timestamp": "2024-01-15T10:30:00.000Z"
  },
  "links": [
    {"url": "https://example.com/link1", "text": "Link Text"}
  ],
  "screenshot": "base64...",
  "pdf": "base64..."
}

Development

Project Structure

src/
├── index.ts          # Main MCP server implementation
├── steel-api.ts      # Steel-dev API wrapper
└── config.ts         # Configuration management

Scripts

  • npm run build - Build TypeScript to JavaScript
  • npm run start - Run the built server
  • npm run dev - Run in development mode with tsx

Adding New Features

  1. Modify the tool schema in src/index.ts
  2. Update the SteelAPI class in src/steel-api.ts if needed
  3. Rebuild and test

Error Handling

The server includes comprehensive error handling:

  • Network errors are caught and returned as error responses
  • Invalid parameters are validated
  • Steel-dev API errors are properly forwarded
  • Timeout handling for long-running requests

License

MIT