npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

@johnnywu/pi-webfetch

v1.1.0

Published

Fetch web pages and URLs from pi with readable text, Markdown, HTML, or JSON output.

Readme

pi-webfetch

A pi package that adds a webfetch tool for fetching and cleaning URL content with Scrapling, Defuddle, and gh for GitHub URLs.

Given a user-provided URL, webfetch routes GitHub URLs through GitHub CLI, otherwise chooses a Scrapling fetcher strategy, runs Scrapling through its CLI shell, and returns cleaned Markdown/HTML/text content to pi.

Install

pi install npm:@johnnywu/pi-webfetch

Or via local path in ~/.pi/agent/settings.json while developing:

{
  "packages": ["~/dev/jwu/pi-webfetch"]
}

Requirements

webfetch calls Scrapling through:

scrapling shell -L warning -c "..."

Make sure the scrapling executable is available in the environment where pi runs.

Defuddle conversion is bundled as an npm dependency and is used by default for non-GitHub Markdown output. It can be disabled in settings.

Configuration

Add webfetch settings to .pi/settings.json (project) or ~/.pi/agent/settings.json (global) to override defaults:

{
  "webfetch": {
    "useDefuddle": true,
    "qualityJudge": false,
    "qualityJudgeModel": "google/gemini-2.5-flash",
    "qualityJudgeThinkLevel": "off"
  }
}

Defuddle behavior:

| webfetch.useDefuddle | Markdown behavior | |---|---| | omitted | Scrapling fetches cleaned HTML, then Defuddle converts that HTML to Markdown | | true | Same as omitted: use Scrapling HTML plus Defuddle Markdown conversion | | false | Scrapling fetches and extracts Markdown directly |

Quality judge behavior:

| Setting | Default | Description | |---|---:|---| | webfetch.qualityJudge | false | When enabled, ask an LLM whether the fetched Markdown is usable before accepting a Scrapling strategy. If the judge returns unusable, webfetch records that strategy as failed and tries the next one. | | webfetch.qualityJudgeModel | current pi model | Optional judge model in provider/model form, for example google/gemini-2.5-flash. | | webfetch.qualityJudgeThinkLevel | off | Optional judge thinking level: off, minimal, low, medium, high, or xhigh. Unsupported levels are clamped for the selected model. |

Project settings override global settings. For compatibility, the dotted key form also works:

{
  "webfetch.useDefuddle": true,
  "webfetch.qualityJudge": true,
  "webfetch.qualityJudgeModel": "google/gemini-2.5-flash",
  "webfetch.qualityJudgeThinkLevel": "off"
}

The switch affects non-GitHub Markdown output. Explicit mode: "html" or mode: "text" still uses direct extraction. GitHub URLs are handled by gh and do not use Defuddle.

Tool

webfetch

Fetch and clean an HTTP(S) URL with gh for GitHub URLs or Scrapling for other sites.

| Parameter | Type | Default | Description | |---|---:|---:|---| | url | string | required | HTTP(S) URL to inspect and fetch | | mode | markdown | html | text | markdown | Output mode. Markdown may be converted by Scrapling or Defuddle depending on settings. |

Fetch strategy

For non-GitHub URLs, webfetch uses an explicit built-in site-to-strategy mapping first.

Current mapping:

| Site | Strategy | Reason | |---|---|---| | shadertoy.com and subdomains | StealthyFetcher | Cloudflare protection; static/dynamic fetchers often return 403 or challenge HTML | | x.com, twitter.com and subdomains | StealthyFetcher | SPA and anti-bot behavior; future login-state support can build on this |

For sites that are not in the mapping, webfetch uses sequential escalation from the Scrapling guide:

  1. Fetcher.get(url) — fastest static fetcher
  2. if it fails, returns HTTP >= 400, or extracts empty content, try DynamicFetcher.fetch(url, network_idle=True, wait=3000)
  3. if that also fails or extracts empty content, try StealthyFetcher.fetch(url, network_idle=True, wait=3000)

Each failed attempt is recorded in errors, so the result explains why webfetch adjusted to the next strategy.

When Defuddle is enabled for Markdown output, each Scrapling strategy is considered successful only after both steps succeed: Scrapling extracts cleaned HTML, then Defuddle returns non-empty Markdown. If Defuddle fails or returns empty Markdown for a strategy, that strategy is recorded as failed and webfetch continues to the next Scrapling strategy.

When webfetch.qualityJudge is enabled, the selected judge model receives a sample of the fetched Markdown and returns a JSON usability decision. Unusable content, such as boilerplate, captcha/challenge pages, error pages, or unrelated content, is treated as a failed strategy so webfetch can continue to the next Scrapling strategy. If the judge cannot run, webfetch fails open and uses the fetched content rather than making the tool unusable.

Content extraction uses:

Convertor._extract_content(page, extraction_type=mode, main_content_only=True)

When webfetch.useDefuddle is not false and Markdown output is requested, mode sent to Scrapling is html; the returned cleaned HTML is then parsed with Defuddle using markdown: true.

Output behavior

  • Only http:// and https:// URLs are accepted.
  • Failed Scrapling strategies are included in tool details.
  • Tool output is truncated with pi's standard limits: 2000 lines or 50 KiB, whichever is hit first.
  • If output is truncated, the full extracted content is saved to a temp file and the path is included in the result.

Development

# Install dependencies
bun install

# Run tests
bun test

# Type check
bun run typecheck

# Format
bun run format

# Release (local, requires GH_TOKEN and NPM_TOKEN)
bun run release

This project uses semantic-release with conventional commits.

License

MIT