npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

pi-web-fetch

v1.1.0

Published

Pi extension: fetch web pages via headless Chrome, extract content with trafilatura, and optionally process with an LLM

Readme

pi-web-fetch

A pi extension that gives your agent a proper web browser. Fetches pages via headless Chrome, extracts clean content via trafilatura, and optionally distills it with an LLM.

Why not use Claude Code's built-in WebFetch?

Claude Code's WebFetch uses Turndown to convert HTML to markdown — a simple regex-based converter that can't handle JavaScript-rendered pages, doesn't strip boilerplate well, and has no extensibility. pi-web-fetch improves on this in several ways:

| | Claude Code WebFetch | pi-web-fetch | |---|---|---| | Rendering | Static HTTP fetch | Headless Chrome (handles SPAs, JS-rendered content) | | Extraction | Turndown (regex HTML→md) | trafilatura (ML-based boilerplate removal) | | Raw content | Never — prompt is mandatory | Optional — omit prompt to get full markdown | | Batch fetching | One URL at a time | Up to 10 URLs concurrently with per-URL progress | | Concurrency | Sequential | Browser pool with 6 parallel tabs | | Extensibility | None | Hook system for site-specific handling | | Smart redirects | Generic | Context-aware (e.g. GitHub URLs → gh CLI suggestions) |

Features

  • web_fetch tool — registered in pi's tool system, callable by the LLM
  • Headless Chrome via puppeteer — handles JavaScript-rendered pages, SPAs, pages behind cookie banners
  • Content extraction — strips boilerplate (nav, ads, footers) using trafilatura's ML-based extraction, outputs clean markdown
  • LLM processing — optionally distills page content to answer a specific question via a pi sub-agent
  • Batch fetching — fetch up to 10 pages concurrently in a single tool call with per-URL status in the UI
  • Browser pool — reuses a single Chrome instance with up to 6 parallel tabs, avoiding repeated browser startup overhead
  • Smart large-page handling — when content exceeds ~50KB and no prompt is given, automatically generates a structured summary
  • 15-minute cache — avoids redundant fetches; enables summarize-then-drill-down workflows
  • Cross-host redirect detection — reported to the LLM for explicit follow-up rather than silently following
  • HTTP→HTTPS auto-upgrade
  • Extension hooks — customize fetch behavior for specific sites (redirect, replace HTML, transform markdown, override summarization)
  • Built-in site handlers — GitHub URLs redirect to gh CLI, Google Docs redirect to workspace tools

Prerequisites

  • pi coding agent
  • A Python tool runner for trafilatura (auto-detected in priority order):
    1. uv (uvx) — fastest, recommended
    2. uv run — fallback if uvx alias is missing
    3. pipx — widely available on Debian/Ubuntu
    4. pip-run — niche fallback
  • Node.js 18+

Installation

npm install pi-web-fetch

Or clone for development:

git clone https://github.com/georgebashi/pi-web-fetch
cd pi-web-fetch
npm install

Note: npm install will download puppeteer's bundled Chromium (~300MB). If you already have Chrome/Chromium installed, you can set PUPPETEER_EXECUTABLE_PATH to skip the download:

export PUPPETEER_EXECUTABLE_PATH=/path/to/chrome

Note: The first time web_fetch runs, uvx will download the trafilatura package (~10MB). Subsequent runs use the cached environment and are fast.

Add to pi

Add the package to your pi settings (~/.pi/agent/settings.json):

{
  "extensions": [
    "pi-web-fetch"
  ]
}

Or point to a local checkout:

{
  "extensions": [
    "/path/to/pi-web-fetch"
  ]
}

Or use the -e flag for quick testing:

pi -e pi-web-fetch

Usage

The extension registers a web_fetch tool. The LLM will use it automatically when it needs to fetch web content.

Single URL

With a prompt (recommended):

Fetch https://docs.example.com/api and tell me what authentication methods are supported.

Without a prompt (full content):

Fetch the content of https://example.com/changelog

Batch fetching

Fetch multiple pages concurrently by asking for several URLs at once:

Read these three pages and compare their approaches to error handling:

  • https://docs.python.org/3/tutorial/errors.html
  • https://go.dev/blog/error-handling-and-go
  • https://doc.rust-lang.org/book/ch09-00-error-handling.html

The agent will use the pages parameter to fetch all URLs in parallel (up to 10 per call). The UI shows live per-URL progress:

● docs.python.org/3/tutorial/errors.html
◐ go.dev/blog/error-handling-and-go · fetching
◑ doc.rust-lang.org/book/ch09-00-error-... · extracting

Each URL independently transitions through: pending → fetching → extracting → summarizing → done/error. The browser pool manages concurrency automatically (6 tabs max).

Extensions

pi-web-fetch has a hook system for site-specific fetch behavior. Extensions can intercept any stage of the pipeline: before fetch, after fetch (HTML), after extraction (markdown), or at summarization time.

Built-in extensions

  • GitHub redirect — matches github.com/**, redirects to gh CLI with context-aware suggestions (e.g. gh issue view 123)
  • Google Docs redirect — matches docs.google.com/**, redirects to Google Workspace MCP tools

Writing extensions

Extensions are TypeScript modules with a factory function default export:

import type { WebFetchExtension } from "pi-web-fetch/types";

export default function (): WebFetchExtension {
  return {
    name: "my-handler",
    matches: ["example.com/**"],
    async beforeFetch(ctx) {
      // Return a HookResult to short-circuit, or void to continue
      return ctx.redirect("Use a different tool for this site.");
    },
    async afterFetch(ctx) {
      // ctx.html — replace HTML or short-circuit
    },
    async afterExtract(ctx) {
      // ctx.markdown — replace markdown or short-circuit
    },
    async summarize(ctx) {
      // Override default LLM summarization
    },
  };
}

Extension sources (in priority order)

  1. Event bus — other pi extensions can register handlers via pi.events.emit("web-fetch:register", extension)
  2. Local — TypeScript/JS files in ~/.pi/extensions/web-fetch/ (or configured extensionsDir)
  3. Built-in — shipped with pi-web-fetch in the extensions/ directory

Configuration

Create ~/.pi/agent/web-fetch.json to override defaults:

{
  "model": "provider/model-id",
  "thinkingLevel": "medium",
  "extensionsDir": "~/.pi/extensions/web-fetch"
}

| Key | Default | Description | |-----|---------|-------------| | model | Current session model | Model for LLM content processing | | thinkingLevel | Current session thinking level | Thinking level for the sub-agent | | extensionsDir | ~/.pi/extensions/web-fetch/ | Directory for local extensions |

Without a config file, the extension uses whatever model and thinking level the current session is using.

Architecture

web_fetch(url, prompt?)
  │
  ├─ URL validation & normalization (http→https, scheme check)
  ├─ Cache check (15-min TTL)
  ├─ Extension: beforeFetch hook
  ├─ Browser pool → Puppeteer page (networkidle2, 30s timeout)
  ├─ Cross-host redirect detection
  ├─ Extension: afterFetch hook
  ├─ trafilatura extraction (HTML → clean markdown)
  ├─ Extension: afterExtract hook
  ├─ Cache store
  └─ Content processing:
      ├─ With prompt → pi sub-agent (focused extraction)
      ├─ Small content, no prompt → return raw markdown
      └─ Large content, no prompt → pi sub-agent (structured summary)

For batch mode, each URL runs through this pipeline independently with its own browser tab. The browser pool (6 tabs max, 60s idle timeout) provides backpressure.

License

MIT