npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

kosyak-fetch-mcp

v1.1.8

Published

MCP server for fetching web content as Markdown. Transparent handling of PDFs, Reddit/Medium/Hacker News/Discourse, Cloudflare anti-bot, and YouTube transcripts.

Readme

kosyak-fetch-mcp

Model Context Protocol server for fetching web content as clean Markdown from Claude Code, Claude Desktop, or any MCP client. 3 tools, transparent handling of PDFs + 5 scraper-hostile platforms.

Highlights

  • Zero-setup content extraction — Mozilla Readability + OpenGraph metadata header (title · author · site · date), strips nav/ads/sidebars by default
  • PDFs just workapplication/pdf auto-detected, text-extracted via pdf-parse
  • Cloudflare auto-bypass — transparent fallback to a Chrome TLS fingerprint (CycleTLS, bundled Go binary)
  • Reddit / Medium / Hacker News / Discourse rewriters — URLs of these scraper-hostile platforms are rewritten under the hood to their scraper-friendly counterparts (Atom feed, readmedium.com, Algolia API, .json endpoint)
  • YouTube transcripts — timestamped captions via yt-dlp, prefers human-written → auto-generated fallback
  • SSRF-hardened — reserved-IP blocklist (CVE-2025-8020 patched), DNS pinning via undici Agent, per-hop redirect validation
  • LLM-friendly errors — HTTP status hints (404 → "not found", 429 → "rate limited"), JSON-on-HTML tells the model to switch tool
  • Auto-retry on transient 5xx/429 with exponential backoff + Retry-After support
  • In-memory LRU cache — pagination on a large page doesn't re-fetch
  • Charset-aware decoding — Latin-1 / Shift-JIS / Windows-1251 pages no longer come back as replacement characters

Quick Start

Add to ~/.claude.json (or Claude Desktop's %APPDATA%\Claude\claude_desktop_config.json on Windows, ~/Library/Application Support/Claude/claude_desktop_config.json on macOS).

Windows

{
  "mcpServers": {
    "fetch": {
      "type": "stdio",
      "command": "npx.cmd",
      "args": ["-y", "kosyak-fetch-mcp"],
      "env": {
        "DEFAULT_LIMIT": "",
        "MAX_RESPONSE_BYTES": "",
        "FETCH_TIMEOUT_SECONDS": "",
        "FETCH_MAX_RETRIES": "",
        "FETCH_CACHE_DISABLED": "",
        "FETCH_CACHE_TTL_SECONDS": "",
        "FETCH_CACHE_MAX": "",
        "FETCH_CYCLETLS_DISABLED": "",
        "CYCLETLS_JA3": "",
        "PROXY_URL": ""
      }
    }
  }
}

Claude Desktop on Windows does not spawn MCP servers through a shell, so use npx.cmd (not npx) to avoid spawn npx ENOENT. -y skips the install prompt so first launch doesn't hang.

macOS / Linux

{
  "mcpServers": {
    "fetch": {
      "type": "stdio",
      "command": "npx",
      "args": ["-y", "kosyak-fetch-mcp"],
      "env": {
        "DEFAULT_LIMIT": "",
        "MAX_RESPONSE_BYTES": "",
        "FETCH_TIMEOUT_SECONDS": "",
        "FETCH_MAX_RETRIES": "",
        "FETCH_CACHE_DISABLED": "",
        "FETCH_CACHE_TTL_SECONDS": "",
        "FETCH_CACHE_MAX": "",
        "FETCH_CYCLETLS_DISABLED": "",
        "CYCLETLS_JA3": "",
        "PROXY_URL": ""
      }
    }
  }
}

Leave unused keys as empty strings — they fall back to defaults documented below. Restart your MCP client after saving.

Environment Variables

| Variable | Default | Description | |----------|---------|-------------| | DEFAULT_LIMIT | 5000 | Default max_length (0 = unlimited) | | MAX_RESPONSE_BYTES | 10485760 | Body size cap (10 MB) | | FETCH_TIMEOUT_SECONDS | 30 | Per-request HTTP timeout | | FETCH_MAX_RETRIES | 2 | Auto-retries on transient 5xx/429 (0 = disable) | | FETCH_CACHE_DISABLED | — | Set to 1 to disable response cache | | FETCH_CACHE_TTL_SECONDS | 300 | Cache TTL | | FETCH_CACHE_MAX | 50 | LRU cache size | | FETCH_CYCLETLS_DISABLED | — | Set to 1 to disable Cloudflare fallback | | CYCLETLS_JA3 | Chrome 144 (source) | Override the TLS fingerprint used by the Cloudflare-fallback backend. Default matches bogdanfinn/tls-client's Chrome_144 profile (modern extension set incl. ApplicationSettings 17613, CompressCertificate 27, GREASE ECH 65037, post-quantum X25519MLKEM768); the User-Agent advertised on the same request also says Chrome 144 so the UA ↔ TLS fingerprint pair stays internally consistent. JA3 string format: <version>,<ciphers>,<extensions>,<elliptic-curves>,<ec-point-formats>. | | PROXY_URL | — | HTTP(S) proxy for all outbound requests |

All variables go in the env block of your MCP client config.

Tools (3)

fetch_page

Fetch any URL and return Markdown. Extracts the main article body by default (Mozilla Readability); set fullpage: true to include navigation, menus, and sidebars — use only for structural queries like "list all links on this docs index" or "extract this nav menu".

| Parameter | Type | Description | |-----------|------|-------------| | url | string (required) | URL to fetch | | headers | object | Custom HTTP headers | | max_length | number | Max chars to return (default: 5000) | | start_index | number | Offset to continue from a truncated previous call | | fullpage | boolean | Return whole page instead of article extraction |

Handles PDFs, Cloudflare-protected pages, Reddit threads, Medium articles, Hacker News items, and Discourse threads transparently. See Transparent platform handling.

fetch_json

Fetch a JSON endpoint (REST APIs, OpenAPI specs, package registries, manifests) and return the parsed JSON as a compact string. Gives an LLM-actionable hint if the URL returns HTML instead of JSON.

Works directly against package registries:

  • https://registry.npmjs.org/PACKAGE
  • https://pypi.org/pypi/PACKAGE/json
  • https://crates.io/api/v1/crates/NAME

fetch_youtube_transcript

Fetch a YouTube video's transcript as timestamped lines. Prefers human-written captions, falls back to auto-generated. Requires yt-dlp on PATH.

| Parameter | Type | Description | |-----------|------|-------------| | url | string (required) | YouTube video URL | | lang | string | BCP-47 caption language (default: en) | | max_length, start_index | number | Pagination |

Install yt-dlp:

winget install --id yt-dlp.yt-dlp --source winget   # Windows
brew install yt-dlp                                  # macOS
pipx install yt-dlp                                  # Linux

Examples

CLI (after npm i -g kosyak-fetch-mcp):

# Article extraction (default)
mcp-fetch page https://example.com/blog/post

# Whole page including nav / menus
mcp-fetch page https://example.com --fullpage

# Package metadata via the registry API
mcp-fetch json https://registry.npmjs.org/undici

# YouTube transcript
mcp-fetch youtube https://www.youtube.com/watch?v=UF8uR6Z6KLc --lang en

# Paginate a large page
mcp-fetch page https://very-long-post.example/ --max-length 10000 --start-index 10000

Transparent platform handling

Some URLs are routed through alternative endpoints to bypass anti-scraper blocks or recover lost structure. The caller never sees this — the LLM passes the original URL, we transparently hit the working source.

| Platform | Rewrite | Why | |----------|---------|-----| | PDFs | — | Content-Type sniffed, pdf-parse extracts text + metadata | | Reddit threads | old.reddit.com/…/.rss | Main site 403s scrapers; RSS exposes post + top-level comments as Atom | | Medium + publications | readmedium.com proxy | Medium blocks non-browser TLS; readmedium SSRs the article | | Hacker News /item?id=N | hn.algolia.com/api/v1/items/N | HN's <table>-layout HTML breaks Turndown; Algolia returns clean JSON + nested comments | | Discourse threads | /t/slug/ID.json | Static HTML on Discourse is a mostly-empty Ember shell; JSON has the full post_stream | | Cloudflare-protected | Retry via CycleTLS (Chrome JA3) | Node's default TLS fingerprint gets 403/503 from CF; Chrome fingerprint passes |

Discourse communities recognised out of the box: Rust (users/internals), Elixir, PyTorch, HuggingFace, OpenAI, Django, Erlang, freeCodeCamp, and a few more. Add more via PR.

What Cloudflare bypass does NOT help with (needs a real browser): Turnstile captcha, DataDome / PerimeterX JS challenges, cookie-based sessions.

Security

SSRF protections active on every request:

  • URL validation — rejects non-HTTP(S) schemes, localhost, and direct IP URLs in reserved ranges
  • Reserved-IP blocklist — full IANA reserved ranges for IPv4 and IPv6, including 224.0.0.0/4 multicast (CVE-2025-8020 patched — was missing in the private-ip package we replaced)
  • DNS pinning — undici Agent.connect.lookup hook returns the pre-validated IP; the runtime can't re-resolve to something private
  • Per-hop redirect validation — each redirect target goes through the same URL + IP checks before the next fetch
  • User-supplied proxies rejected — proxy is server-only via PROXY_URL to prevent SSRF bypass via proxy=http://169.254.169.254/
  • Credentials scrubbed from error messageshttps://user:pass@host is converted to Authorization: Basic before fetch; the password never appears in logs or error output

The Cloudflare-fallback path (CycleTLS Go subprocess) cannot pin DNS — a TTL=0 rebinding attack window exists there. Disable the fallback with FETCH_CYCLETLS_DISABLED=1 if you're running in a cloud environment with reachable internal metadata endpoints.

Troubleshooting

  • spawn npx ENOENT on Windows → use "command": "npx.cmd", not "npx".
  • Claude Desktop hangs on first use-y missing from args.
  • yt-dlp not found after winget install → winget drops it in a WinGet\Packages\… subdir that isn't on PATH by default. Add it to PATH or copy yt-dlp.exe to a dir that already is.

Development

git clone https://github.com/kosyakdev/fetch-mcp.git
cd fetch-mcp
bun install
bun run dev     # watch mode
bun test        # 316 tests
bun run build   # produces dist/

License

MIT. Forked from zcaceres/fetch-mcp and rebuilt around a different tool surface, URL-rewriter layer, PDF support, and Cloudflare auto-bypass.