npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

@imenam/simple-scraper

v1.0.8

Published

MCP server for web scraping and JavaScript execution using Puppeteer

Readme

simple-scraper-mcp

A Model Context Protocol (MCP) server for web scraping and JavaScript execution using a headless browser (Puppeteer). Includes an optional GUI for cookie management.

Features

  • scrape_page — Navigate to a URL and return the full rendered HTML of the page
  • execute_js — Navigate to a URL and execute custom JavaScript in the page context
  • get_page_inputs — Extract all form inputs from a page as a structured JSON object
  • get_show_page — Parse a detail/show page and extract key-value blocks and tables as structured JSON
  • Interactive sessions — Keep a browser page alive across multiple tool calls to navigate, click, type, run JS, and screenshot the page in any desired state
  • screenshot — Capture a screenshot of any page or active session, with inline or file output
  • Cookie support — Load Netscape-format cookie files automatically before each request
  • Optional GUI — Cookie manager interface, available when integrated with the MCP proxy

Requirements

  • Node.js >= 18
  • Puppeteer will automatically download a compatible Chromium browser (~300 MB) on first install

Installation

npx @imenam/simple-scraper

Or install globally:

npm install -g @imenam/simple-scraper
simple-scraper

Environment Variables

| Variable | Required | Default | Description | |----------|----------|---------|-------------| | PUPPETEER_HEADLESS | No | true | Run Chromium in headless mode. Set to false to display the browser window. | | PUPPETEER_TIMEOUT | No | 30000 | Default timeout in milliseconds for page navigation and waits. | | COOKIES_DIR | No | - | Absolute path to a folder containing Netscape-format .txt cookie files. All files are loaded and merged automatically before each request. | | MCP_LOG_DIR | No | .mcp-gui/logs | Absolute path to the directory where log files are written. | | PROXY_URL | No | - | Base URL of the MCP HTTP Gateway. Required to enable the GUI. | | PROXY_APP_PATH | No | /simple-scraper-mcp | URL path under which the GUI is registered on the proxy. | | PROXY_APP_NAME | No | Simple Scraper MCP | Display name shown in the proxy's app list. | | SCRAPER_MAX_SESSIONS | No | 5 | Maximum number of concurrent interactive sessions. | | SCRAPER_SESSION_TTL_MS | No | 600000 | Inactivity TTL for sessions in milliseconds (default: 10 minutes). Sessions unused beyond this duration are closed automatically. |

Configuration

Copy .env.example to .env and configure the variables:

# Puppeteer options (optional)
PUPPETEER_HEADLESS=true
PUPPETEER_TIMEOUT=30000

# Optional: path to a folder containing Netscape-format cookie files (.txt)
# All files in this folder will be loaded automatically before each request.
# COOKIES_DIR=/path/to/cookies

# GUI (optional) — required to enable the cookie manager interface
# PROXY_URL=http://localhost:3000
# PROXY_APP_PATH=/simple-scraper-mcp
# PROXY_APP_NAME=Simple Scraper MCP

Usage with Claude Desktop

Add the following to your claude_desktop_config.json. Full example with all available options:

{
  "mcpServers": {
    "simple-scraper": {
      "command": "npx",
      "args": ["@imenam/simple-scraper"],
      "env": {
        "PUPPETEER_HEADLESS": "true",
        "PUPPETEER_TIMEOUT": "30000",
        "COOKIES_DIR": "/path/to/your/cookies",
        "MCP_LOG_DIR": "/path/to/your/logs",
        "PROXY_URL": "http://localhost:4500",
        "PROXY_APP_PATH": "/simple-scraper",
        "PROXY_APP_NAME": "Simple Scraper"
      }
    }
  }
}

To load cookies automatically, add COOKIES_DIR pointing to a folder containing .txt files in Netscape cookie format:

{
  "mcpServers": {
    "simple-scraper": {
      "command": "npx",
      "args": ["@imenam/simple-scraper"],
      "env": {
        "PUPPETEER_HEADLESS": "true",
        "COOKIES_DIR": "/path/to/your/cookies"
      }
    }
  }
}

Usage with Cursor

In Cursor, MCP servers are configured in .cursor/mcp.json. You can pass environment variables directly in the config. Full example with all available options:

{
  "mcpServers": {
    "simple-scraper": {
      "command": "npx",
      "args": ["-y", "@imenam/simple-scraper"],
      "env": {
        "PUPPETEER_HEADLESS": "true",
        "PUPPETEER_TIMEOUT": "30000",
        "COOKIES_DIR": "/path/to/your/cookies",
        "MCP_LOG_DIR": "/path/to/your/logs",
        "PROXY_URL": "http://localhost:4500",
        "PROXY_APP_PATH": "/simple-scraper",
        "PROXY_APP_NAME": "Simple Scraper"
      }
    }
  }
}

Note: The -y flag in args avoids the interactive confirmation prompt when using npx.

MCP Tools

scrape_page

Navigate to a URL and return the full rendered HTML.

| Parameter | Type | Required | Description | |-----------|------|----------|-------------| | url | string | ✅ | URL of the page to scrape | | wait_for | string | | CSS selector to wait for before capturing HTML | | timeout | number | | Timeout in ms (default: 30000) |

execute_js

Navigate to a URL and execute custom JavaScript in the page context.

The script parameter is executed as the body of a JavaScript function in the browser page context, equivalent to:

new Function(script)();

To return data from the tool, the script must use an explicit return. A bare expression such as document.title will evaluate but the tool will receive undefined.

Example:

return {
  title: document.title,
  url: window.location.href,
  text: document.body.innerText.slice(0, 500)
};

For asynchronous work, return a promise, for example with an async IIFE:

return (async () => {
  const response = await fetch('/api/data');
  return await response.json();
})();

Returned objects and arrays are serialized as formatted JSON. Primitive values are returned as text.

| Parameter | Type | Required | Description | |-----------|------|----------|-------------| | url | string | ✅ | URL of the page | | script | string | ✅ | JavaScript function body to execute in the page context. Use return to send a result back to the tool. | | wait_for | string | | CSS selector to wait for before executing | | timeout | number | | Timeout in ms (default: 30000) |

get_page_inputs

Extract all form inputs from a page as a structured JSON object.

| Parameter | Type | Required | Description | |-----------|------|----------|-------------| | url | string | ✅ | URL of the page | | selector | string | | CSS selector to scope the search (e.g. #my-form) | | wait_for | string | | CSS selector to wait for before extracting | | show_hidden | boolean | | Include input[type=hidden] fields (default: false) | | timeout | number | | Timeout in ms (default: 30000) |

get_show_page

Parse a detail page and extract key-value blocks and tables as structured JSON.

| Parameter | Type | Required | Description | |-----------|------|----------|-------------| | url | string | ✅ | URL of the page | | keys_map | object | | Map of HTML label → JS key for field name translation | | box_selector | string | | CSS selector for section containers (default: .box.box-primary) | | tables_max_items | number | | Max rows per table (default: 2) | | wait_for | string | | CSS selector to wait for before extraction | | timeout | number | | Timeout in ms (default: 30000) |


Interactive Sessions

Interactive sessions let you keep a browser page alive across multiple tool calls, so you can bring the page into the exact state you need before extracting data or taking a screenshot.

Typical workflow

open_session → session_click / session_type / session_evaluate → screenshot / session_html → close_session

open_session

Open a persistent browser session. Returns a session_id to use with all session_* tools and screenshot.

| Parameter | Type | Required | Description | |-----------|------|----------|-------------| | url | string | ✅ | URL to navigate to | | wait_for | string | | CSS selector to wait for before the session is considered ready | | timeout | number | | Timeout in ms (default: 30000) |

close_session

Close a session and free its resources. Always call this when you are done.

| Parameter | Type | Required | Description | |-----------|------|----------|-------------| | session_id | string | ✅ | Session ID returned by open_session |

list_sessions

List all currently active sessions with their IDs and timestamps. No parameters.

session_goto

Navigate the session to a new URL without closing it.

| Parameter | Type | Required | Description | |-----------|------|----------|-------------| | session_id | string | ✅ | Session ID | | url | string | ✅ | URL to navigate to | | wait_for | string | | CSS selector to wait for after navigation | | timeout | number | | Timeout in ms (default: 30000) |

session_click

Click an element in the session page.

| Parameter | Type | Required | Description | |-----------|------|----------|-------------| | session_id | string | ✅ | Session ID | | selector | string | ✅ | CSS selector of the element to click | | timeout | number | | Timeout in ms to wait for the element (default: 30000) |

session_type

Type text into an input element in the session page.

| Parameter | Type | Required | Description | |-----------|------|----------|-------------| | session_id | string | ✅ | Session ID | | selector | string | ✅ | CSS selector of the input element | | text | string | ✅ | Text to type | | clear | boolean | | Clear the field before typing (default: false) | | timeout | number | | Timeout in ms to wait for the element (default: 30000) |

session_wait_for

Wait for a CSS selector to appear in the session page.

| Parameter | Type | Required | Description | |-----------|------|----------|-------------| | session_id | string | ✅ | Session ID | | selector | string | ✅ | CSS selector to wait for | | timeout | number | | Timeout in ms (default: 30000) |

session_evaluate

Execute JavaScript in the context of the session page. Same conventions as execute_js.

| Parameter | Type | Required | Description | |-----------|------|----------|-------------| | session_id | string | ✅ | Session ID | | script | string | ✅ | JavaScript function body. Use return to get a result back. | | wait_for | string | | CSS selector to wait for before executing | | timeout | number | | Timeout in ms (default: 30000) |

session_html

Return the current full rendered HTML of the session page.

| Parameter | Type | Required | Description | |-----------|------|----------|-------------| | session_id | string | ✅ | Session ID |

screenshot

Capture a screenshot of a page. Use session_id to capture an active session in its current state, or url for a one-shot capture.

| Parameter | Type | Required | Description | |-----------|------|----------|-------------| | session_id | string | | Session ID. If provided, url is ignored and the current page state is captured. | | url | string | | URL for a one-shot screenshot. Required if session_id is not provided. | | wait_for | string | | CSS selector to wait for (one-shot mode only) | | timeout | number | | Timeout in ms (default: 30000) | | selector | string | | CSS selector of a specific element to capture | | full_page | boolean | | Capture the full scrollable page height (default: false, ignored when selector is provided) | | format | png | jpeg | | Image format (default: png) | | output | inline | file | both | ✅ | inline embeds the image in the response, file saves to disk and returns the path, both does both | | path | string | | Absolute or relative path for the saved file. Defaults to ./screenshots/screenshot-<timestamp>.<format> |


Cookie Files

Cookies are loaded from .txt files in Netscape format. Place them in the folder specified by COOKIES_DIR. All files in the folder are loaded and merged automatically before each request.

GUI (Optional)

When PROXY_URL is set, a cookie manager web interface is registered with the MCP proxy. It allows you to list, upload, and delete cookie files through a browser UI.

License

ISC