@sisu-ai/tool-web-fetch
v8.0.1
Published
Fetch a web page by URL and return text, HTML, or JSON for LLM consumption.
Maintainers
Readme
@sisu-ai/tool-web-fetch
Fetch a web page by URL and return text, HTML, or JSON for LLM consumption.
Install
npm i @sisu-ai/tool-web-fetchEnvironment / Flags
WEB_FETCH_USER_AGENTorHTTP_USER_AGENT(flag:--web-fetch-user-agent)WEB_FETCH_MAX_BYTES(flag:--web-fetch-max-bytes) — default 500kBWEB_FETCH_RESPECT_ROBOTS(flag:--web-fetch-respect-robots) —1/true(default) to honor robots.txt; set0/falseto disable
Tool
- Name:
webFetch - Args:
{ url: string; format?: 'text'|'html'|'json'; maxBytes?: number } - Returns:
{ url, finalUrl?, status, contentType?, title?, text?, html?, json? }
Behavior
- Respects robots.txt by default for the provided User-Agent.
- Follows redirects and reads up to
maxBytesto avoid huge pages. - If
format: 'text'(default) and page is HTML, strips tags (removes script/style) and decodes basic entities; includestitle. - If
format: 'html', returns raw HTML andtitle. - If server returns JSON or
format: 'json', parses intojson. - Non-OK responses return status and a short text body snippet for debugging.
Notes
- This is a minimal fetcher to empower summarization / extraction workflows. For deeper crawling, add queueing, URL normalization, and robots.txt handling in upstream middleware.
Community & Support
Discover what you can do through examples or documentation. Check it out at https://github.com/finger-gun/sisu. Example projects live under examples/ in the repo.
