agentwebai
v0.6.20
Published
MCP server that gives AI agents structured access to any webpage
Maintainers
Readme
AgentWeb
MCP server that gives AI agents structured access to any webpage. Returns clean text, links, forms, and metadata instead of raw HTML.
Works with Claude Code, Cursor, Windsurf, and any MCP-compatible AI coding agent.
Install
One command. Detects your AI tools, registers your account, and configures everything.
npx agentwebai setupRestart your AI tool. Done.
Manual install (advanced)
If you prefer to configure manually, add to your MCP config:
{
"mcpServers": {
"agentweb": {
"command": "npx",
"args": ["-y", "agentwebai"]
}
}
}Config file locations:
- Claude Code:
~/.claude.json - Cursor:
~/.cursor/mcp.json - Windsurf:
~/.codeium/windsurf/mcp_config.json
What it does
AgentWeb gives your AI agent a read_page tool that replaces built-in web fetching. Instead of raw HTML, the agent gets:
- Clean text extracted via Mozilla Readability (no HTML noise)
- All links on the page with absolute URLs
- Forms with field names, types, and required flags
- Page metadata (title, description, Open Graph tags)
Handles every content type: HTML, JSON, Markdown, CSV, XML, RSS, plain text, and more.
JS-rendered pages (React, Next.js, Vue) are handled automatically via Playwright when available.
llms.txt support
Sites that publish /llms.txt or /llms-full.txt (Stripe, Anthropic, Cloudflare, and 600+ others) are served their agent-friendly content directly. The site is explicitly opting in to agent access.
Cookie handling
AgentWeb maintains an in-memory cookie jar per session, just like a browser. Cookies from Set-Cookie headers are stored and sent on subsequent requests to the same domain. Cookies are never persisted to disk or shared across users.
Example
Your agent runs:
read_page("https://example.com/pricing")And gets back structured data:
{
"title": "Pricing - Example",
"text": "Start free. Scale as you grow. Free tier includes...",
"links": [
{ "text": "Sign up", "href": "https://example.com/signup" },
{ "text": "Contact sales", "href": "https://example.com/contact" }
],
"forms": [
{
"action": "https://example.com/signup",
"method": "POST",
"fields": [
{ "name": "email", "type": "email", "required": true }
]
}
],
"metadata": {
"description": "Simple, transparent pricing for teams of all sizes"
}
}Tools
read_page
Read any public webpage and return structured content.
| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| url | string | yes | The URL to fetch |
| selector | string | no | CSS selector to scope extraction to a specific element |
agentweb_register
Register your email (if you didn't during setup).
| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| email | string | yes | Your email address |
JS-rendered pages
For pages that require JavaScript (SPAs, React apps), install Playwright:
npx playwright install chromiumAgentWeb detects JS-rendered pages automatically and falls back to Playwright when the initial HTTP fetch returns minimal content.
Built-in protections
- SSRF guard - blocks requests to private/internal networks and cloud metadata endpoints
- robots.txt - respects site crawling rules
- Rate limiting - 10 requests/second per domain
- Content limits - 5MB max page size, 50k character text output
Usage data
AgentWeb collects anonymous usage telemetry to understand how agents use the web:
- Which domains are being accessed
- Whether pages required JS rendering
- Response times and success/failure rates
- A random device ID (not tied to your identity)
- Your email if you register
No page content is collected. No browsing history. This data helps us build better infrastructure for agent-native web access.
License
Proprietary. Copyright 2026 AgentWeb Labs. All rights reserved.
