@leonardorick/pi-web-search
v0.2.2
Published
Web search tool for pi — Exa MCP search with DuckDuckGo fallback via wreq-js.
Maintainers
Readme
@leonardorick/pi-web-search
Web search tool for pi — Exa hosted MCP search with DuckDuckGo HTML fallback via wreq-js browser TLS fingerprinting. Companion to pi-smart-fetch.
Why
Pi's built-in web search relies on Anthropic's first-party web_search_20250305 tool, which is only available when running Claude models. This package fills the gap for any other model with a keyless web_search tool.
Search tries Exa's hosted MCP endpoint first (web_search_exa over JSON-RPC). If Exa MCP fails, rate-limits, or returns no results, the tool falls back to DuckDuckGo HTML search.
Bare Node fetch() (even with a spoofed User-Agent header) gets served DDG's anti-bot anomaly page (cc=botnet) because the TLS/HTTP2 fingerprint leaks "this is Node, not Chrome." wreq-js wraps native Rust bindings that emulate real browser TLS/HTTP2 fingerprints — same primitive pi-smart-fetch uses for web_fetch.
Install
// ~/.pi/agent/settings.json
{
"packages": ["npm:@leonardorick/pi-web-search"],
}Tool
Registers web_search:
| param | type | required | description |
| ----------------- | ---------- | -------- | ------------------------------------------------------------------------------------------------------------------------ |
| query | string | yes | Search query (min length 2) |
| allowed_domains | string[] | no | Only return results from these domains |
| blocked_domains | string[] | no | Exclude results from these domains |
| numResults | integer | no | Result cap. Default 8, min 1, max 20. |
| freshness | string | no | Time filter. day / week / month / year, or custom range YYYY-MM-DDtoYYYY-MM-DD. |
| country | string | no | ISO 3166-1 alpha-2 country code for region-localised results (e.g. US, DE, BR). |
allowed_domains and blocked_domains are mutually exclusive. The tool translates filters to Exa site: / -site: query terms and also enforces them client-side. It over-fetches when filters are active so post-filter trimming can still meet the requested num_results cap.
Provider support matrix
| filter | Exa MCP | DuckDuckGo (fallback) |
| ----------------- | ----------------------------- | --------------------- |
| allowed_domains | yes (via site:) | yes (via site:) |
| blocked_domains | yes (via -site:) | yes (via -site:) |
| numResults | yes (native) | yes (post-filter) |
| freshness | best-effort (may be ignored)¹ | yes (df= param) |
| country | best-effort (may be ignored)¹ | yes (kl= param) |
¹ Exa MCP's documented schema only exposes query and numResults. Extra fields are sent best-effort: harmless if ignored, honored if Exa supports them in the future. The DuckDuckGo fallback enforces them reliably.
Output mirrors Claude Code's Web search results for query: "..." block, including the Links: [...] JSON line and the "REMINDER: include sources" suffix — the model is expected to append a Sources: markdown list to its reply.
Design notes
No prompt argument and no summarization step. Claude Code's WebSearchTool calls Haiku with the user-provided prompt to compress results before returning. Pi's built-in tools (read, grep, etc.) follow a "raw output, model digests it" pattern — this extension matches that convention.
Anonymous Exa MCP access is rate-limited by Exa (currently documented as 2 QPS and 50 tool calls/day per IP). The tool treats Exa as a best-effort first provider and keeps DDG as fallback.
TODO: Add an optional
promptparameter paired with configurablemodel/providersettings so results can be summarized before returning — matching Claude Code's behavior. The summarizer should use the active pi provider rather than hardcoding a specific model.
Limitations
- Anonymous Exa MCP quota is shared by client IP; heavy use can hit
429and fall back to DDG. - DDG HTML scraping is best-effort. If DDG changes its result markup the snippet regex is the most likely break point; titles + URLs degrade more gracefully.
freshnessandcountryare only guaranteed on the DDG fallback. On Exa MCP they ride along as extra fields and are silently ignored unless Exa expands the schema.- This package only returns search results and snippets. Use
web_fetchfrompi-smart-fetchfor page content extraction.
License
MIT
