@vanillagreen/pi-web-tools
v1.1.1
Published
First-party Pi web tools: provider-toggled web search, Exa deep research, content retrieval, and OpenAI native web_search integration.
Downloads
659
Maintainers
Readme
pi-web-tools
First-party Pi package for web access tools.
For the Exa-specific API map and tool semantics, see EXA.md.
Implemented in this package:
web_searchwith provider selection (auto,exa,perplexity,gemini,exa-mcp,duckduckgo,openai-native). Direct execution is wired for Exa, Perplexity Sonar, Gemini API (Google Search grounding), Gemini Web (browser-cookie auth), no-key Exa MCP, and no-key DuckDuckGo HTML. OpenAI-native is handled by abefore_provider_requestrewrite on supported OpenAI/Codex models and is last in auto mode because it has no normal Pi tool output.web_researchusing Exa Deep Search withresearchMode: lite|standard|fullplus low-level overrides (type,numResults,textMaxCharacters, domains, dates, additional queries). Accepts inlinequeryorqueryFile, pluscontextFiles/contextGlob.web_research.outputPathfindings report writing with Pi's file mutation queue. Clean reports default to a raw metadata sidecar (findings.raw.json) instead of embedding raw JSON infindings.md.web_fetchextraction chain: GitHub URLs (shallow clone cache + cached blob/tree/README), URL + local PDF text viapdftotextwith vision OCR fallback for scanned PDFs (pdftoppmrasterization + ImageContent blocks for the host LLM), HTML/text/JSON with Wikipedia-style chrome stripping, Jina Reader auto-fallback on blocked/cookie-walled pages and 403/429/5xx, YouTube + local video understanding via Gemini Web/API (auto-applies[HH:MM:SS]directive when the prompt asks for transcripts/lyrics/captions), and Exacontentsfallback/override for URLs.web_answer,web_find_similarExa-first tools.code_searchdefaults to the Exa Code/contextendpoint and falls back to classic Exa search with code-focused domain hints; renderer surfaces token + source counts and parsed source URLs (Ctrl+O to expand).get_web_contentretrieval for stored full content./web-toolsopens the extension-manager settings popup (with/web-tools:doctorand/web-tools:provider:<name>for diagnostics and session-scoped provider switching).
Install
Via npm:
pi install npm:@vanillagreen/pi-web-toolsVia vstack:
cargo install --git https://github.com/vanillagreencom/vstack.git vstack
vstack add vanillagreencom/vstack --pi-extension pi-web-tools --harness pi -yRestart Pi after installation.
Commands
| Command | Action |
| --- | --- |
| /web-tools | Open the extension-manager settings popup (falls back to inline status when the manager is not installed). |
| /web-tools:doctor | Show Web Tools status and diagnostics. |
| /web-tools:provider:<name> | Set the active web-search provider for this session: auto, exa, perplexity, gemini, exa-mcp, duckduckgo, or openai-native. Persist via pi-web-tools.defaultProvider in extension-manager settings. |
Fetch storage and truncation
web_fetch returns a compact preview and stores extracted content in the current Pi session under a generated content id such as web-.... Tool text and UI label preview output separately from stored full text; when a preview is truncated the renderer shows preview <shown>/<full> chars. Use get_web_content with that content id to retrieve the stored text; it does not fetch the URL again.
- GitHub blob, GitHub repo metadata/README, direct HTTP, and PDF extraction store the full extracted text before preview truncation. PDF extraction prefers local
pdftotextwhen available, then falls back to a basic embedded-text parser; URL PDFs can fall back to Exa when local extraction fails. - Local PDFs are supported with
filePath/filePaths,file://..., or PDF-looking paths. They are extracted locally and are never sent to Exa fallback. - Exa
contentsfallback/override stores the text returned by Exa;textMaxCharactersis the provider extraction cap for that path. web_fetch.textMaxCharacterscaps the immediate preview shown to the model for direct/GitHub/PDF paths; default preview cap is 4k characters per stored item.get_web_content.maxCharacterscaps the retrieval returned to the model; default is 50k characters. Omit it for normal full-context retrieval, lower it only for previews.- Session storage is not a standalone project file. The durable handle shown in UI is the
content id, and Pi persists it with the session history.
Settings
Settings are read from the vstack extension-manager namespace:
{
"vstack": {
"extensionManager": {
"config": {
"@vanillagreen/pi-web-tools": {
"defaultProvider": "auto",
"enabledProviders": "exa,perplexity,gemini,exa-mcp,duckduckgo,openai-native",
"exaDeepResearchEnabled": true,
"exaAdvancedEnabled": false,
"htmlExtraction": { "jinaFallback": true },
"pdfOcr": { "enabled": true, "maxPages": 5, "dpi": 150 },
"githubClone": { "enabled": true, "maxRepoSizeMB": 350, "cloneTimeoutSeconds": 60, "cacheMaxAgeHours": 24 },
"video": { "enabled": true },
"browserCookieAccess": false,
"browserCookies": { "preferredBrowser": "auto" }
}
}
}
}
}Key toggles:
exaAdvancedEnabledis required to exposeweb_answer,web_find_similar, andcode_searchto the active toolset (gated to keep the default surface small). Flip it on per-user or per-project to use them.web_search provider=autotries keyed providers first (exa,perplexity, Gemini API), then no-key fallbacks (exa-mcp,duckduckgo), then Gemini Web cookie auth when enabled, thenopenai-nativeif supported. Simple searches default to 5 results; setnumResultsonly when you need more. Put providers inenabledProvidersto allow or remove them; set/web-tools:provider:<name>to force one for the session.browserCookieAccessopts in to Gemini Web cookie scraping. WithbrowserCookies.preferredBrowserset toauto(default) /firefox/zen/chrome/chromium, the package reads cookies from Firefox/Zen unencrypted SQLite or Chromium-family DBs (libsecret on Linux, Keychain on macOS, DPAPI master key + AES-GCM on Windows).pdfOcr.enabledcontrols whetherpdftoppmrasterizes scanned PDFs into ImageContent blocks for vision OCR. Defaults on; set to false to skip rasterization on scanned PDFs.githubClone.enabledtoggles the GitHub clone cache (default on). Repos exceedingmaxRepoSizeMBautomatically fall back to API-based extraction.video.enabledtoggles YouTube and local video understanding via Gemini API/Web when available.
Secrets should be supplied with environment variables, project .env.local/.env files, or a private config file. Process environment variables win over values loaded from files:
EXA_API_KEYPERPLEXITY_API_KEYGEMINI_API_KEYOPENAI_API_KEYJINA_API_KEY(optional; anonymous Jina Reader works without it but may rate-limit on heavy use)PI_WEB_TOOLS_CONFIG_FILE=/path/to/private.json
Shared Pi settings keys such as exaApiKey are loaded for compatibility but emit a warning.
API key values may be direct keys or 1Password references such as op://Private/Exa API Key/credential when the op CLI is installed and signed in.
Deep research modes
| Mode | Exa type | Default results | Text cap | Highlight cap | Notes |
|---|---|---:|---:|---:|---|
| lite | deep-lite | 15 | 10k chars/result | 600 chars/source | Fast, lower-cost spikes; no default structured output schema. |
| standard | deep-reasoning | 50 | 16k chars/result | 900 chars/source | Default for normal findings reports; requests Exa summaries and structured output. |
| full | deep-reasoning | 150 | 24k chars/result | 1200 chars/source | Runs the primary query plus each additionalQueries entry, then dedupes URLs; requests richer summaries/structured output. |
web_research uses Exa /search with deep search types, systemPrompt, text extraction, highlights, and (for standard/full) source summaries plus structured outputSchema. Clean Markdown reports use Exa output.content when present and keep raw provider payloads in sidecars. lite intentionally avoids the default output schema because live Exa deep-lite tests returned empty result sets when structured output was requested.
Explicit tool arguments override mode defaults: type, numResults, textMaxCharacters, highlightsMaxCharacters, highlightNumSentences, highlightsPerUrl, summaryQuery, maxAgeHours, category, and outputSchema.
You can override mode defaults globally or per-project with pi-web-tools.exaResearchModes in Pi settings. The extension-manager UI stores this as a JSON string, while settings files may use either a JSON string or object:
{
"lite": { "numResults": 8, "textMaxCharacters": 6000 },
"standard": {
"numResults": 30,
"highlightsMaxCharacters": 700,
"highlightsPerUrl": 2,
"summaryQuery": "Summarize evidence relevant to the research question."
},
"full": { "numResults": 80, "maxAgeHours": 168 }
}Migration
web_search moved here from pi-codex-minimal-tools. Install both updated packages together; pi-codex-minimal-tools now owns only image_generation, view_image, and apply_patch.
Attribution
This implementation was designed after reviewing the MIT-licensed Pi web-access and Exa extension patterns referenced in the project implementation plan. No source code was copied verbatim.
