mcp-reader
v0.2.0
Published
MCP server for @vakra-dev/reader (scrape/crawl websites) with token-saving artifact truncation.
Maintainers
Readme
mcp-reader
MCP server for @vakra-dev/reader: scrape and crawl websites using a real browser, and keep LLM context small with artifact offloading.
This server is built for Claude Code / OpenCode:
- Tool outputs are intentionally compact (summaries + artifact ids).
- Full payloads (manifests, markdown, html, url lists) are stored as artifacts.
- Agents fetch only what they need via
reader_artifact_get(grep,range,head,tail).
Requirements
- Node:
>=24 @vakra-dev/readeris an optional dependency. If it fails to install (native deps),mcp-readerwill still install, but scrape/crawl tools will fail until Reader is installed.
Recommended:
- Use Node 24.x for the smoothest native-dependency story.
Install
npm install mcp-readerIf Reader did not install automatically (you will see warnings during npm install), install it explicitly:
npm install @vakra-dev/readerConfigure (Claude Code / OpenCode)
Recommended (Windows-safe): run through node to avoid .cmd wrapper spawning issues.
{
"mcpServers": {
"reader": {
"command": "node",
"args": ["./node_modules/mcp-reader/dist/cli.js"],
"env": {
"MCP_READER_LOG_LEVEL": "info",
"MCP_READER_STORE": "file:.mcp-reader-artifacts",
"MCP_READER_MAX_BYTES": "80000",
"MCP_READER_PREVIEW_MAX_CHARS": "6000"
}
}
}
}Tools
reader_scrape: Scrape 1+ URLs. Returns a summary and stores a scrape manifest + per-page markdown/html as artifacts.reader_crawl: Crawl a site (depth/maxPages/pattern filters) and optionally scrape discovered pages. Stores crawl manifest + URL list (+ scrape manifest if enabled).reader_challenge: Detect Cloudflare/anti-bot challenge and optionally wait for resolution.reader_status: Show server config + whether ReaderClient is initialized.reader_warmup: Warm up ReaderClient/browser core.reader_close: Close ReaderClient/browser pools.
Artifacts (token saving):
reader_artifact_get: Fetch slices of stored artifacts (auto|head|tail|range|grep|full|json).reader_artifact_info: Artifact metadata.reader_artifact_list: List recent artifacts.reader_artifact_delete: Delete an artifact.
How To Use (Humans)
Read a single page (main content -> markdown):
{
"url": "https://docs.reader.dev/documentation/overview",
"formats": ["markdown"],
"onlyMainContent": true
}You will receive:
- a compact scrape summary
- a manifest artifact id with per-page
markdownArtifactId/htmlArtifactId
Then fetch the markdown artifact:
{ "id": "art_...", "mode": "head", "headLines": 80 }How To Use (Agents)
Start with the smallest signal:
reader_crawlwithpreview: "summary"to get a URL inventory.- Use
reader_artifact_getwithgrep/rangeon the URL list artifact. reader_scrapeonly the handful of relevant URLs.- Fetch only the needed slices of markdown via
reader_artifact_get.
More: AGENTS.md
Artifact Storage + Truncation
Artifact store:
MCP_READER_STORE:memory(default) orfile:.mcp-reader-artifacts
Truncation knobs:
MCP_READER_MAX_BYTES: Offload threshold (default80000)MCP_READER_PREVIEW_MAX_CHARS: Preview cap (default6000)MCP_READER_HEAD_LINES/MCP_READER_TAIL_LINES: Preview slices (default60/60)MCP_READER_TTL_SECONDS: Optional TTL for memory storeMCP_READER_MAX_ARTIFACTS: Optional cap for memory store
ReaderClient Configuration
Configure Reader via env (JSON):
MCP_READER_BROWSER_POOL:{ "size": 2, "retireAfterPages": 100, "retireAfterMinutes": 30, "maxQueueSize": 100 }MCP_READER_PROXIES:[ { "url": "http://user:pass@host:port", "country": "US" } ]MCP_READER_PROXY_ROTATION:round-robinorrandomMCP_READER_VERBOSE:true|falseMCP_READER_SHOW_CHROME:true|false
Documentation
docs/TOOLS.md: tool-by-tool reference with examplesdocs/CONFIG.md: configuration and environment variablesdocs/TRUNCATION.md: artifact modes and token-saving patternsdocs/RELEASE.md: CI/CD + release + npm publish
CI / Releases / npm publish
- CI: lint + typecheck + tests + build
- Releases: Release Please opens a PR with version bump + changelog
- Publish: on release creation, GitHub Actions publishes to npm
Required GitHub secrets:
NPM_TOKEN: npm automation token with publish rights
Development
npm run lint
npm run typecheck
npm test
npm run buildLicense
GPL-3.0-only. See LICENSE.
