npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

@psarno/fetchmd

v0.2.0

Published

Fetch a URL, get clean Markdown. For LLM agents and humans alike.

Readme

fetchmd

Fetch a URL, get clean Markdown. No API keys. No browser automation required for most pages.

Built for LLM agents and developers who want web content without the noise.


Quick start

npx fetchmd "https://docs.python.org/3/library/asyncio.html"

That's it. Output goes to stdout. Errors go to stderr.


Install

npm install -g fetchmd

SPA / JS-heavy pages (optional)

Most static pages (docs, blogs, news, reference sites) work without this. If a page is blank or returns too little content, it's probably a JavaScript-rendered SPA (React, Angular, Vue, etc.). Install Playwright to handle those:

npm install -g playwright
npx playwright install chromium

fetchmd detects Playwright at runtime. If it's not installed, the SPA stage is silently skipped.


Using with AI agents

fetchmd is a plain CLI — no server, no protocol, no API keys. Any agent with shell access can use it directly after a global install:

npm install -g fetchmd

From there, fetchmd <url> behaves like any other shell tool (curl, jq, etc.). The agent runs it, reads clean Markdown from stdout, and uses that content in its response.

How agents know a tool exists

Agents don't automatically discover tools installed on your system. You have to tell them. The standard mechanism is a plain text instruction file in your project root that the agent reads at the start of every session. Think of it as a README written for the agent rather than a human.

The filename convention varies by agent:

| Agent | Instruction file | |-------|-----------------| | Claude Code | CLAUDE.md | | Codex, OpenCode, and most others | AGENTS.md |

Some agents read both. If you're unsure, creating both files with the same content is harmless.

Create or open that file and add a section like this:

## Available tools

- **fetchmd** — fetches a URL and returns clean Markdown to stdout. Prefer this
  over any built-in web fetch or browser tool when reading documentation,
  articles, or reference pages. It produces cleaner output, supports
  JavaScript-rendered pages via Playwright, and accepts `--max-chars` to cap
  output size and protect context budget.
  Usage: `fetchmd [--max-chars N] <url>`

That's all. The agent will call fetchmd as a shell command and read the output. No server, no MCP, no further setup.

Handling conflicts with built-in web tools

Many agents ship with their own web fetch capability. When both are available, the agent will pick one — and without guidance it may default to whichever feels more "native" to it.

The "Prefer this over any built-in web fetch or browser tool" line in the snippet above is intentional. It gives the agent an explicit tie-breaker. If you omit it, you may find the agent ignoring fetchmd in favour of its own tool, even when fetchmd would produce better output.

Note: some agents treat their built-in tools as higher priority than user instructions regardless of what the instruction file says. This is uncommon, but if you notice the agent consistently bypassing fetchmd, try strengthening the wording: "Always use fetchmd for web content. Do not use built-in web fetch tools."

Useful patterns for agents

Read a page before answering a question about it:

fetchmd https://docs.python.org/3/library/asyncio.html

Cap output to protect context window budget:

fetchmd --max-chars 15000 https://some-long-reference.com

When output is truncated, fetchmd appends a comment (<!-- fetchmd: truncated at N chars -->), so the agent knows content was cut and can decide whether to fetch more or proceed.

Check which extraction stage fired (useful when debugging agent behaviour):

fetchmd --stage https://example.com

Options

fetchmd [options] <url>

--min-length N   Minimum characters to accept from extraction (default: 200)
--max-chars N    Truncate output at N chars, paragraph-aligned (default: 50000, 0 to disable)
--no-spa         Skip Playwright even if installed
--stage          Prefix output with which extraction stage succeeded
--help           Show this help

Examples

# Static page — works out of the box
fetchmd "https://docs.python.org/3/library/asyncio.html"

# JS-rendered SPA (requires Playwright)
fetchmd "https://my-angular-app.com"

# See which extraction stage fired
fetchmd --stage "https://example.com"

# Tighter output cap (good for LLM context limits)
fetchmd --max-chars 20000 "https://some-framework.org/reference"

# No truncation
fetchmd --max-chars 0 "https://example.com/short-page"

# Save to file
fetchmd "https://example.com/article" > article.md

# Low-content pages (like example.com) need a lower threshold
fetchmd --min-length 50 "https://example.com"

How it works

Two extraction stages, tried in order. fetchmd moves to the next stage only if the current one returns nothing or too little content.

Stage 1 — Defuddle (always runs) Fetches the page over HTTP and extracts content using Defuddle — the engine behind Obsidian Web Clipper. Converts to clean Markdown. Handles most static pages: blogs, docs, news, reference pages. Standardizes code blocks, tables, and footnotes.

Stage 2 — Playwright (optional, only if stage 1 fails) Launches headless Chromium, renders the JavaScript, then feeds the resulting DOM back through Defuddle. Only runs if stage 1 returned too little content and playwright is installed.


Troubleshooting

Page returns nothing or exits with an error

The page is probably a JS-rendered SPA. Install Playwright:

npm install -g playwright
npx playwright install chromium

Use --stage to confirm which stage fired (or didn't):

fetchmd --stage "https://example.com"
# Output starts with: <!-- fetchmd: defuddle --> or <!-- fetchmd: playwright -->

Page returns too little content

Some very minimal pages (like example.com) genuinely have fewer than 200 characters of content. Lower the threshold:

fetchmd --min-length 50 "https://example.com"

Playwright is installed but the page still fails

Make sure the Chromium browser binary is installed separately from the npm package:

npx playwright install chromium

The npm package and the browser binary are two separate installs. The npm package alone is not enough.

Output is too long for my LLM context window

Use --max-chars to cap the output. fetchmd truncates at a paragraph boundary and appends a comment so the model knows content was cut:

fetchmd --max-chars 10000 "https://some-long-page.com"

Dependencies

Core: defuddle — content extraction and Markdown conversion. Installed automatically.

Optional: playwright — headless Chromium for JS-rendered pages. Install manually if needed (see above).


License

MIT