npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

@bitcraft-apps/pi-web-tools

v1.2.0

Published

Shell-only web search and fetch tools for pi.dev. No API keys.

Readme

@bitcraft-apps/pi-web-tools

Shell-only web search and fetch tools for pi.dev. Zero API keys, zero accounts — just ddgr + pandoc/w3m running locally.

Tools

websearch

DuckDuckGo search via ddgr. Returns up to 25 results with title, URL, snippet.

webfetch

fetch + optional content-extraction pre-pass + HTML→markdown via pandoc (preferred) or w3m (fallback). Auto-handles Cloudflare challenges via UA hack. Blocks SSRF (localhost/RFC1918). See Content extraction.

Install

# 1. System deps (one-time)
brew install ddgr pandoc        # macOS
# or: pip install ddgr; apt install pandoc w3m

# 2. Extension (from npm)
pi install npm:@bitcraft-apps/pi-web-tools

# Or pin a specific version:
# pi install npm:@bitcraft-apps/[email protected]

# Or for local dev / hacking on the source:
pi install -e /path/to/pi-web-tools

After install, restart pi and the websearch and webfetch tools become available.

Usage examples

In a pi session:

> Find me docs for Bun's native Sqlite API
[agent uses websearch → gets bun.sh URL → uses webfetch → reads docs]

You don't call them directly — pi's agent calls them when it needs.

Limits and behavior

  • websearch: default 8 results, hard cap 25. DuckDuckGo rate-limits ~10 req/min/IP. If you hit it, wait or use webfetch directly.
    • region (optional): DuckDuckGo region code, e.g. pl-pl, us-en, de-de. Maps to ddgr's --reg. Default: ddgr's built-in (us-en).
    • safesearch (optional): off | moderate | strict. Default moderate. off passes --unsafe to ddgr. ddgr does not distinguish moderate vs strict — both use its default safe-search behavior (see ddgr.1 manpage; only --unsafe is exposed).
    • time (optional): d | w | m | y — restrict results to the past day/week/month/year. Maps to ddgr's --time. Default: no filter (all time). Use when the query is time-sensitive ("latest", "recent", "this week") — DuckDuckGo's default ranking otherwise surfaces years-old SEO content above recent results.
  • webfetch: default 50k chars output, hard cap 200k. 5 MB response cap. 30s timeout. Cannot fetch: images, video, audio, localhost, 127/8, 169.254/16; PDFs unless optional pdftotext is installed (see PDF support). Cannot render: JS-heavy SPAs (you'll get an empty markdown).
  • On 429 Too Many Requests or 503 Service Unavailable, honors a Retry-After header (delta-seconds or HTTP-date) for one retry, capped at 10s. No retry without Retry-After, no exponential backoff, no retry on other statuses.
  • Honors the charset= parameter on Content-Type for response decoding (e.g. windows-1250, iso-8859-2, shift_jis, gb2312). Unknown labels fall back to UTF-8.
  • For HTML responses without a Content-Type charset, sniffs <meta charset="..."> or <meta http-equiv="Content-Type" content="...; charset=..."> declared in the first 1024 bytes (HTML comments are stripped first).
  • All operations are read-only and synchronous. No persistent state, no cache.

Content extraction (optional)

For chrome-heavy pages (GitHub repos, MDN, news articles, Stack Overflow, blog posts) the bulk of the converted markdown is navigation, sidebars, footers, cookie banners, and inline icon SVGs — not the content the agent asked for. If a Reader-View-style extractor is on $PATH, webfetch runs it between the HTTP fetch and the markdown conversion. Result: typically 5–20× smaller output on those pages, with the actual article preserved.

Install one (recommended):

pipx install trafilatura     # works everywhere with Python; recommended primary install
# rdrview alternative — https://github.com/eafer/rdrview
#   Linux: package manager, or build from source.
#   macOS: build from source (no homebrew formula upstream).

Detection order: trafilatura first, then rdrview. Detected once per process and cached. The extractor emits cleaned HTML; the existing pandoc/w3m step then converts it to markdown so the output style is identical regardless of which extractor (or none) ran.

No extractor present? webfetch keeps working — you just get the full pre-extraction markdown as before. A one-shot warning is written to stderr on the first call so you know what you're missing; it is never added to tool output.

Caveats:

  • Relative links. rdrview resolves relative hrefs to absolute using the page URL. trafilatura (when used via stdin) does not; relative links stay relative in its output. Most agents handle this from context; mention it in your prompt if it matters.
  • Fallback when extraction looks wrong. If the extracted HTML is < 1% of the original and the original was > 10 KB (e.g. Readability picked the wrong container on a chrome-only page), webfetch discards the extracted result and converts the full HTML instead. You'll get a larger but complete result.
  • Pages where the wanted content is outside the article container (e.g. a code listing in a sidebar) may have it stripped by extraction. There's currently no per-call opt-out; if it bites you in practice, open an issue with the URL.
  • $PATH trust. The agent process inherits the user's $PATH; bare trafilatura/rdrview (same posture as pandoc/ddgr) means a poisoned earlier $PATH entry runs as the extractor. Newly relevant here because extractors parse attacker-controlled HTML.

PDF support (optional)

If pdftotext (poppler) is on $PATH, webfetch will accept application/pdf responses and return the extracted plain text. Useful for academic papers, RFCs served as PDF, datasheets, vendor manuals, government docs — the things you'd otherwise have to download and paste excerpts from.

Install:

brew install poppler         # macOS
# apt install poppler-utils  # Debian/Ubuntu
# dnf install poppler-utils  # Fedora

Detected once per process and cached. webfetch invokes pdftotext -layout -enc UTF-8 - - on the response bytes; -layout preserves two-column papers and tables, which the default reading-order mode mangles. Output is plain text — no markdown wrapping, no fences (PDFs aren't structured for markdown rendering; pretending they are produces worse output than pdftotext -layout).

No pdftotext present? PDFs are rejected with the existing "Cannot fetch application/pdf" error — byte-for-byte the same behavior as before. A one-shot warning is written to stderr on the first PDF fetch so you know what you're missing; it is never added to tool output.

Caveats:

  • Scanned / image-only PDFs return empty or near-empty text. OCR (e.g. tesseract) is a much heavier dependency and a separate decision; out of scope.
  • No DOCX, EPUB, RTF, ODT. Each is a separate optional binary with its own quirks. Open an issue if you need one.
  • No PDF form / annotation extraction.
  • 5 MB response cap still applies. A 50 MB PDF will be rejected before pdftotext ever runs.

What webfetch does not do

  • No JavaScript execution. Pages that render client-side return empty markdown. Workarounds: try the same content via old.reddit.com, *.json API endpoints, RSS/Atom feeds, or the site's documented REST API.
  • No per-host routing. webfetch does not switch behavior based on hostname (no if hostname === "github.com" branches). If you want "use gh for GitHub URLs, fall back to webfetch otherwise," that belongs in a personal pi skill in ~/.pi/agent/skills/, not in this package. See AGENTS.md “Bar for new tools” for the full rationale.
  • No headless browser. Out of scope per AGENTS.md. Shell-only is the project's design constraint.

Troubleshooting

  • ddgr not installedbrew install ddgr or pip install ddgr
  • Need pandoc or w3m installedbrew install pandoc
  • DuckDuckGo timed out (likely rate-limited) → wait 1–2 min
  • Site requires JS, cannot fetch in shell-only mode → site uses Cloudflare/JS-only; not solvable without headless browser, out of scope for this tool

Development

# one-time, if you don't have bun:
#   macOS:        brew install bun
#   Linux / WSL:  curl -fsSL https://bun.sh/install | bash
# (or see https://bun.sh for other options)
git clone https://github.com/bitcraft-apps/pi-web-tools
cd pi-web-tools
bun install
bun run typecheck           # type-check via tsgo (@typescript/native-preview); CI runs this before tests
bun run lint                # oxlint + type-aware oxlint-tsgolint; CI runs this before tests
bun run format              # apply oxfmt to src/, test/, index.ts, vitest.config.ts
bun run format:check        # CI runs this before lint; fails if anything is unformatted
bun run test                # unit tests, no network
bun run test:network        # integration tests (requires net)

We use bun as the dev package manager. The committed lockfile is bun.lock; package-lock.json is gitignored.

End-user installs (pi install npm:...) pull a published tarball from the npm registry. The tarball ships only index.ts, src/, README.md, LICENSE, and CHANGELOG.md (no tests, no bun.lock, no CI configs) — see files in package.json. bun.lock is the dev lockfile only; transitive deps for end users are resolved by npm install against the registry at install time. Peer deps are wildcard-pinned, no runtime deps drift in breaking ways.

Note on npm scope: the GitHub org is bitcraft-apps because bitcraft was taken on GitHub. The npm scope @bitcraft is also taken, so the npm package is published as @bitcraft-apps/pi-web-tools to mirror the GH org (#5).

Hot-reload during dev:

ln -s "$(pwd)" ~/.pi/agent/extensions/pi-web-tools
# in pi session: /reload

License

MIT