pi-agent-browser
v0.1.0
Published
Browser automation tool for pi — interactive browsing, screenshots with inline vision, and session cleanup via agent-browser CLI
Maintainers
Readme
pi-agent-browser
Browser automation tool for pi. Gives the LLM a browser tool that drives a real browser via agent-browser.
Install
pi install npm:pi-agent-browserOr try it without installing:
pi -e npm:pi-agent-browserWhat it does
Registers a browser tool that the LLM can call to:
- Navigate —
open <url> - Inspect —
snapshot -i(returns interactive elements with@refhandles) - Interact —
click @e1,fill @e2 "search query",press Enter,scroll down - Read —
get text,get title,get url,get text @e3 - Screenshot — returns the image inline so the LLM can see the page
- Clean up —
close(also auto-closes on session shutdown)
Features
| Feature | Details |
|---|---|
| Inline screenshots | Screenshots are returned as base64 images — the LLM can describe what it sees |
| Output truncation | Large snapshot output is truncated to fit context windows, with full output saved to a temp file |
| Auto-install | If agent-browser isn't installed, prompts to install it (npm + Chromium download) |
| Session cleanup | Browser is automatically closed on pi session shutdown — no orphaned Chromium processes |
| TUI rendering | Compact display: shows command inline, element counts for snapshots, screenshot paths |
Example
You: Open hacker news and tell me the top 3 stories
browser open https://news.ycombinator.com
browser snapshot -i
browser close
The top 3 stories on Hacker News right now are:
1. ...
2. ...
3. ...Requirements
- agent-browser — installed automatically on first use, or manually:
npm install -g agent-browser agent-browser install # downloads Chromium - A vision-capable model (for screenshot descriptions): Claude Sonnet/Opus, GPT-4o, Gemini Pro, etc.
License
MIT
