pi-web-tools

v0.1.0

Published

22 days ago

Web search via Exa, content extraction, and GitHub repo cloning for Pi coding agent

0High
0Medium
0Low

coctostan

pi-package pi pi-coding-agent extension web-search exa fetch github

pi-web-tools

Web search, content extraction, and GitHub repo cloning for the Pi coding agent.

A lightweight extension providing three tools:

web_search — Search the web via Exa with snippet extraction
fetch_content — Fetch any URL and extract clean markdown (HTML via Readability, Jina Reader fallback, GitHub via clone)
get_search_content — Retrieve stored results from previous searches/fetches

Install

pi install npm:pi-web-tools

Or install from git:

pi install github:coctostan/pi-web-tools

Setup

Exa API Key (required for web_search)

Get a key at exa.ai and set it via environment variable:

export EXA_API_KEY="your-key-here"

Or add it to the config file ~/.pi/web-tools.json:

{
  "exaApiKey": "your-key-here"
}

The environment variable takes precedence over the config file.

GitHub CLI (recommended for fetch_content)

For GitHub repo cloning, install the GitHub CLI:

# Debian/Ubuntu
sudo apt install gh

# Or via conda, brew, etc.
gh auth login

Without gh, the extension falls back to git clone (works for public repos).

Configuration

Config file: ~/.pi/web-tools.json (auto-reloaded every 30 seconds)

{
  "exaApiKey": "your-exa-key",
  "github": {
    "maxRepoSizeMB": 350,
    "cloneTimeoutSeconds": 30,
    "clonePath": "/tmp/pi-github-repos"
  }
}

| Option | Default | Description | |--------|---------|-------------| | exaApiKey | null | Exa API key (env EXA_API_KEY overrides) | | github.maxRepoSizeMB | 350 | Skip cloning repos larger than this | | github.cloneTimeoutSeconds | 30 | Abort clone after this many seconds | | github.clonePath | /tmp/pi-github-repos | Where to store cloned repos |

Tools

`web_search`

Search the web using Exa. Returns results with snippets and source URLs.

| Parameter | Type | Description | |-----------|------|-------------| | query | string | Single search query | | queries | string[] | Multiple queries (batch) | | numResults | number | Results per query (default: 5, max: 20) |

Example:

Search for "TypeScript 5.8 new features"

`fetch_content`

Fetch URL(s) and extract readable content as markdown.

| Parameter | Type | Description | |-----------|------|-------------| | url | string | Single URL to fetch | | urls | string[] | Multiple URLs (parallel, max 3 concurrent) | | forceClone | boolean | Force cloning large GitHub repos |

Content extraction pipeline:

GitHub URLs → Clone repo (shallow, depth 1), generate tree + README
HTML pages → Readability extraction → Markdown conversion
Readability fails → Jina Reader fallback (r.jina.ai)
Non-HTML → Return raw text

Content over 30,000 characters is truncated with a pointer to get_search_content.

`get_search_content`

Retrieve full content from a previous web_search or fetch_content result.

| Parameter | Type | Description | |-----------|------|-------------| | responseId | string | ID from a previous tool result | | query | string | Filter by query text | | queryIndex | number | Filter by query index | | url | string | Filter by URL | | urlIndex | number | Filter by URL index |

How GitHub Cloning Works

When fetch_content receives a GitHub URL:

Parse — Extracts owner, repo, ref, path, type (root/blob/tree)
Size check — Queries repo size via gh api. Skips if over threshold (default 350MB)
Clone — Shallow clone (--depth 1) to temp directory, cached for the session
Generate — Based on URL type:
- Root: Full directory tree + README content
- Tree: Directory listing for the specified path
- Blob: File content (with binary detection and 100K truncation)

Non-code GitHub URLs (issues, PRs, discussions, etc.) are fetched as normal web pages.

Architecture

index.ts          — Extension entry point, 3 tools, session management
├── config.ts     — Config with 30s TTL cache, env var overrides
├── storage.ts    — LRU storage (max 50 entries, session restore)
├── exa-search.ts — Exa API client
├── extract.ts    — Readability + Jina Reader content extraction
└── github-extract.ts — GitHub URL parsing, clone, tree/content generation

Development

# Install dependencies
npm install

# Run tests
npx vitest run

# Run tests in watch mode
npx vitest

# Load in pi for testing
pi -e ./index.ts

License

MIT

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

pi-web-tools

Install

Setup

Exa API Key (required for web_search)

GitHub CLI (recommended for fetch_content)

Configuration

Tools

web_search

fetch_content

get_search_content

How GitHub Cloning Works

Architecture

Development

License

`web_search`

`fetch_content`

`get_search_content`