sift-web-tools
v0.1.4
Published
Pi agent web search, fetch, and save tools powered by the local sift CLI.
Maintainers
Readme
sift-web-tools
Adds LLM-callable tools (web_search, web_fetch, web_save, web_artifacts, web_clean) that give pi local-first web access via the sift CLI.
Install
pi install npm:sift-web-toolsFor local testing before publishing:
pi install /Users/akc/develop/sift-web-tools
# or for one run only:
pi -e /Users/akc/develop/sift-web-toolsRequires the sift CLI to be installed and available on $PATH; see Prerequisites.
Tools
web_search(query, max_results?)— Runssift search <query> --json(DuckDuckGo by default; SearXNG if configured) and renders the top results as a markdown list with titles, URLs, and snippets.web_fetch(url)— Runssift fetch <url> --jsonand returns the page's primary content as clean markdown, plustitle/final_url/status/kindin the result details.web_save(url, mode?, filename?, force?)— Runssift fetch <url> --out /tmp/sift-web-tools/...and returns the saved local path instead of loading the content into context. Use it for large pages, PDFs, images, media, or files the agent should inspect later withread,grep, orbash.modeisrenderedby default;rawsaves original response bytes.web_artifacts(limit?)— Lists files saved under/tmp/sift-web-tools/, newest first, with paths, sizes, kinds, and modification times. Also available as/web_artifacts [limit].web_clean(older_than_minutes?, all?, dry_run?)— Deletes saved artifacts. By default deletes files older than 1440 minutes; setall: trueto delete everything ordry_run: trueto preview matches. Also available as/web_clean [older_than_minutes|all] [dry-run].
To fetch multiple URLs, the agent issues parallel web_fetch or web_save tool calls in a single turn — sift instances run concurrently (one child process per URL). Artifact listing is read-only; cleanup runs sequentially.
The tools are local: queries and URLs are not forwarded to any third-party API. The agent talks to a child sift process on your machine, which in turn uses curl for the actual HTTP request.
Prerequisites
siftCLI installed and available in the system's$PATH.curlused by sift for transport.pdftotext(optional) only required if you wantweb_fetchto handle PDFs.
Get pre-built binaries
- Get the latest release
- Put the
siftbinary somewhere in your$PATH(e.g.~/.local/bin/or/usr/local/bin/).
Install from source
git clone https://github.com/anoopkcn/siftzig build -Doptimize=ReleaseSafe- and copy
zig-out/bin/siftto~/.local/bin/or/usr/local/bin/.
Configuration
To override the binary location, set SIFT_BIN to a full path:
export SIFT_BIN="$HOME/.local/bin/sift" # or wherever you put it(Optional) To use SearXNG instead of DuckDuckGo for search, set sift's native env var:
export SIFT_SEARXNG_URL="https://your-searxng.example/search" # Replace the URL with your SearXNG instance's search endpoint(no extension change needed — sift reads it directly).
Limits
web_searchtruncates the rendered list to roughlymax_results × 1600chars (hard ceiling 30k) to keep the agent's context tidy.web_fetchreturns whateversift fetchproduces; sift enforces its own size cap, so the extension does not re-truncate.web_savestores artifacts under/tmp/sift-web-tools/and returns only path/size/mode hints to keep context small.web_savefilenames are sanitized, path components are stripped, and an 8-char URL hash is appended to reduce collisions.web_artifactsandweb_cleanoperate only on regular files directly inside/tmp/sift-web-tools/; they do not recurse into subdirectories.web_fetchandweb_savereject non-http(s)schemes (file://,data:, etc.) before spawning sift.- A 30-second timeout is passed to sift via
--timeout. - Execution uses pi's
pi.exec()with the agent abort signal and an outer timeout; cancellation/timeout terminates the child process promptly.
Security
This extension is intended for agents whose URLs come from a trusted source (search results, user-pasted links). It is not safe to use with untrusted URL inputs.
- No private-IP filtering. Neither this extension nor the underlying
siftCLI blocks private, loopback, or link-local addresses. URLs likehttp://127.0.0.1/,http://localhost:6379/,http://10.0.0.1/, and cloud metadata endpoints (e.g.http://169.254.169.254/) will be fetched. - No DNS rebinding protection. Hostnames are resolved by
curlat fetch time; a public hostname can resolve to a private address. - Redirects are scheme-locked but not IP-revalidated.
siftenforces http/https on redirects (max 10 hops) but does not re-check whether the destination IP is private. - TLS verification is on by default.
siftdoes not expose an--insecureflag. - Response size is capped at 50 MB by
sift. Larger responses fail withtransport error. - Schemes are restricted. Only
http://andhttps://are accepted;file://,data:,gopher://, etc. are rejected before sift is spawned.
If you need strict SSRF defense (e.g. agent input is attacker-controlled), filter URLs upstream — resolve the hostname yourself and reject private/loopback/link-local IPs before invoking these tools.
Failure modes
Errors are thrown from the tool execution so pi marks the tool result as failed, with sift's exit code context included:
transport error: ...— exit 3 from sift (curl failed, HTTP 4xx/5xx, response > 50 MB).page requires JavaScript (SPA) — sift cannot render it— exit 4. sift has no JS engine; report and move on rather than retrying.output file exists: ...— exit 5 from sift if an output path collision still occurs.unsupported content type: ...— exit 6 (e.g. PDF withoutpdftotextinstalled).sift returned invalid JSON ...— sift emitted non-JSON in--jsonmode; the message includes a sample of the actual output for debugging.sift binary not found ...— install sift or setSIFT_BIN.
