@rjshrjndrn/pi-fetch
v0.1.2
Published
Web content extraction for pi — fetch any URL as clean Markdown using Defuddle
Maintainers
Readme
pi-fetch
Web content extraction for pi. Fetch any URL as clean Markdown — no headless browser required.
Powered by Defuddle by Steph Ango (creator of Obsidian Web Clipper).
How it works
Registers a web_fetch tool that the LLM can call with any URL. Under the hood:
- Native
fetchretrieves the HTML (lightweight, no browser engine) - Defuddle extracts the main content, stripping navigation, ads, sidebars, cookie banners, and clutter
- Returns clean Markdown with metadata (title, author, description, word count)
web_fetch https://www.npmjs.com/package/defuddle
# defuddle
**Author:** kepano · **Domain:** npmjs.com · **Words:** 342
> Get the main content of any page as Markdown.
---
Defuddle extracts the main content from web pages...Output is automatically truncated to stay within pi's context limits.
Install
# As a pi package
pi install npm:@rjshrjndrn/pi-fetch
# Or test without installing
pi -e npm:@rjshrjndrn/pi-fetch
# Or test locally
pi -e ./extensions/index.tsUsage
Once installed, just ask pi to fetch a URL:
fetch https://docs.example.com/getting-startedOr the LLM will use web_fetch automatically when it needs to read a webpage.
Limitations
- No JavaScript rendering — uses native
fetch, not a browser. SPAs that require JS to render content will return empty or minimal results. For those, you'll still need a headless browser. - Some sites block non-browser requests — sites with aggressive bot detection may reject the request.
- Output truncation — very large pages are truncated to 50KB / 2000 lines to protect context window.
Development
cd pi-fetch
npm install
npm testCredits
- Defuddle by Steph Ango (kepano) — content extraction engine
- pi by Mario Zechner (badlogic) — agent framework
License
MIT
