npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

site-cloner

v0.1.2

Published

MCP server for website cloning

Downloads

157

Readme

Site Cloner MCP Server

This is an MCP (Model Context Protocol) server designed to help LLMs (like Claude) clone websites by providing tools to fetch, analyze, and download website assets.

Features

  • Fetch HTML content from any URL
  • Authentication Support: Set cookies and localStorage to access login-protected pages
  • Browser Integration: Check and integrate with chrome-devtools or playwright MCP tools for login flows
  • Login Detection: Automatically detect login forms and links on web pages
  • Extract assets (CSS, JavaScript, images, fonts, etc.) from HTML content
  • Download individual assets to a local directory
  • Parse CSS files to extract linked assets (fonts, images)
  • Create a sitemap of a website
  • Analyze page structure and layout

Requirements

  • Node.js 18 or higher
  • npm or npx

Usage

Option 1: Run with npx (Recommended)

The easiest way to run the MCP server is using npx without installing anything:

npx -y site-cloner

Option 2: Run Locally

  1. Clone or download this repository:

    git clone https://github.com/yourusername/site-cloner.git
    cd site-cloner
  2. Install dependencies:

    npm install
  3. Build the TypeScript code:

    npm run build
  4. Run the server:

    npm start
    # or
    node dist/index.js

For development with auto-reload:

npm run dev

Connecting to Cursor

To set up this MCP server in Cursor, you have two options:

1. Project-specific configuration

Create a .cursor/mcp.json file in your project root with the following content:

Using npx (recommended):

{
  "mcpServers": {
    "site-cloner": {
      "command": "npx",
      "args": ["-y", "site-cloner"]
    }
  }
}

Using local installation:

{
  "mcpServers": {
    "site-cloner": {
      "command": "node",
      "args": ["/path/to/site-cloner/dist/index.js"]
    }
  }
}

2. Global configuration

To make the MCP server available globally in Cursor, add the following configuration by going to Cursor SettingsMCPAdd new Global MCP Server:

Using npx:

{
  "mcpServers": {
    "site-cloner": {
      "command": "npx",
      "args": ["-y", "site-cloner"]
    }
  }
}

Available Tools

1. fetch_page

Fetches the HTML content of a webpage.

Args:
    url: The URL of the webpage to fetch

2. extract_assets

Extracts links to assets from HTML content.

Args:
    url: The URL of the webpage (used for resolving relative URLs)
    html_content: The HTML content to parse

3. download_asset

Downloads an asset from a URL and saves it to the specified directory.

智能资源发现:下载 JS/CSS 文件后,会自动扫描文件内容,发现其中引用的其他资源(如 importScripts('/assets/...')fetch('/assets/...') 等)。

Args:
    url: The URL of the asset to download
    output_dir: The directory to save the asset to (default: downloaded_site)

Returns:
    success: Whether the download was successful
    saved_to: Path where the file was saved
    discovered_assets: (JS/CSS only) List of additional resources found in the file

4. download_assets_batch

Batch downloads multiple assets from a list of URLs. More efficient than calling download_asset multiple times.

Args:
    urls: Array of asset URLs to download
    base_url: (Optional) Base URL for resolving relative URLs
    output_dir: Directory to save assets (default: downloaded_site)
    recursive: Whether to analyze downloaded JS/CSS for additional resources (default: true)

Returns:
    total: Total number of URLs
    successful: Number of successful downloads
    failed: Number of failed downloads
    results: Array of download results with success status
    discovered_assets: (if recursive=true) Additional resources found in JS/CSS files

5. parse_css_for_assets

Parses CSS content to extract URLs of referenced assets like fonts and images.

Args:
    css_url: The URL of the CSS file (used for resolving relative URLs)
    css_content: The CSS content to parse (if not provided, it will be fetched from css_url)

5. create_site_map

Creates a sitemap of the website starting from the given URL.

Args:
    url: The starting URL to crawl
    max_depth: Maximum depth to crawl (default: 1)

6. analyze_page_structure

Analyzes the structure of an HTML page and extracts key components.

Args:
    html_content: The HTML content to analyze

7. check_browser_mcp_tools

Returns installation and configuration guides for browser MCP tools (chrome-devtools-mcp, playwright-mcp) needed for authentication flows.

Args:
    (no arguments required)

8. detect_login_page

Detects if a webpage contains login forms or login links.

Args:
    url: The URL of the webpage to analyze
    html_content: Optional HTML content (if not provided, will fetch from url)

9. set_auth_credentials

Sets authentication credentials (cookies, localStorage) for a domain to access login-protected pages.

Args:
    domain: The domain to set credentials for (e.g., "example.com")
    cookies: Optional object with cookie name-value pairs
    local_storage: Optional object with localStorage key-value pairs
    session_storage: Optional object with sessionStorage key-value pairs

Enhanced fetch_page for Authentication:

The fetch_page tool now supports an optional use_auth parameter:

Args:
    url: The URL of the webpage to fetch
    use_auth: Set to true to use saved credentials for this domain

Authentication Workflow

To clone login-protected websites:

  1. Check if browser tools are needed:

    Call check_browser_mcp_tools to get installation guides
  2. Detect login page:

    Call detect_login_page with the target URL
  3. Install browser MCP tool (choose one):

    • chrome-devtools-mcp: npx -y chrome-devtools-mcp@latest
    • playwright-mcp: npx -y @playwright/mcp@latest
  4. Configure in Cursor: Add to your .cursor/mcp.json:

    {
      "mcpServers": {
        "site-cloner": {
          "command": "npx",
          "args": ["-y", "site-cloner"]
        },
        "chrome-devtools": {
          "command": "npx",
          "args": ["-y", "chrome-devtools-mcp@latest"]
        }
      }
    }
  5. Login using browser tool:

    • Use chrome-devtools or playwright MCP to navigate to the website
    • Complete the login process manually
    • Extract cookies and localStorage using browser tool commands
  6. Set credentials in site-cloner:

    Call set_auth_credentials with:
    - domain: "example.com"
    - cookies: { sessionId: "xxx", token: "yyy" }
    - local_storage: { userData: "..." }
  7. Fetch protected content:

    Call fetch_page with use_auth=true

Note: Credentials are stored in memory only and expire after 24 hours. They are lost when the server restarts.

Example Usage with Claude

  1. Ask Claude to clone a website: "Please clone the website at example.com"
  2. Claude will use the available tools to:
    • Fetch the HTML content
    • Extract assets
    • Download necessary files
    • Analyze the structure
    • Create a local copy of the site

Troubleshooting

Server not showing up in Cursor

  1. Restart Cursor
  2. Check your configuration file syntax
  3. Make sure Node.js is installed: node --version
  4. Look at Cursor's MCP logs for errors:
    • Output → Select Cursor MCP from Dropdown
  5. Try running the server manually to see any errors:
    npx -y site-cloner

Module Not Found Error

If you encounter a "Module not found" error when running locally:

  1. Make sure you've installed dependencies: npm install
  2. Make sure you've built the project: npm run build
  3. Check that the dist/ directory exists

Build Errors

If you get TypeScript build errors:

  1. Clean the build directory:
    rm -rf dist/
  2. Rebuild:
    npm run build

Publishing to npm

To publish this package to npm:

  1. Update version in package.json
  2. Build the project:
    npm run build
  3. Publish:
    npm publish

Notes

  • The server automatically organizes downloaded assets into subdirectories based on content type (html, css, js, images, fonts, videos, other)
  • When cloning a site, be mindful of copyright and terms of service restrictions
  • Some websites may block automated requests, in which case you might need to adjust the user agent string in the code

Development

Project Structure

site-cloner/
├── src/
│   └── index.ts          # Main server code
├── dist/                 # Compiled JavaScript (generated)
├── package.json          # Node.js dependencies
├── tsconfig.json         # TypeScript configuration
└── README.md             # This file

Scripts

  • npm run build - Compile TypeScript to JavaScript
  • npm run watch - Watch mode for development
  • npm run start - Run the compiled server
  • npm run dev - Run with tsx for development (no build needed)