@zapfetchdev/mcp-server
v0.2.0
Published
MCP server for ZapFetch - APAC-native web scraping API for AI agents
Downloads
25
Maintainers
Readme
@zapfetchdev/mcp-server
MCP (Model Context Protocol) server for ZapFetch — APAC-native web scraping API for AI agents.
Use ZapFetch directly from Claude Desktop, Cursor, Windsurf, and any other MCP-compatible client.
Tools
zapfetch_scrape— scrape a single URLzapfetch_search— web search with optional content extractionzapfetch_crawl— crawl a website (async, returns job_id)zapfetch_crawl_status— poll crawl job progresszapfetch_map— discover URLs on a site (fast, no content)zapfetch_extract— extract structured data with a prompt + schemazapfetch_extract_status— poll extract job progress
Docs: https://docs.zapfetch.com
For docs lookups, Claude/Cursor/Windsurf can also use the auto-generated Mintlify MCP at https://docs.zapfetch.com/mcp.
Prerequisites
- Node.js 20+
- A ZapFetch API key (get one here)
Install
npm install -g @zapfetchdev/mcp-serverConfigure
Claude Desktop
Edit ~/Library/Application Support/Claude/claude_desktop_config.json (macOS) or %APPDATA%\Claude\claude_desktop_config.json (Windows):
{
"mcpServers": {
"zapfetch": {
"command": "npx",
"args": ["-y", "@zapfetchdev/mcp-server"],
"env": {
"ZAPFETCH_API_KEY": "zf-your-api-key"
}
}
}
}Cursor
Edit ~/.cursor/mcp.json:
{
"mcpServers": {
"zapfetch": {
"command": "npx",
"args": ["-y", "@zapfetchdev/mcp-server"],
"env": { "ZAPFETCH_API_KEY": "zf-your-api-key" }
}
}
}Windsurf
Edit ~/.codeium/windsurf/mcp_config.json:
{
"mcpServers": {
"zapfetch": {
"command": "npx",
"args": ["-y", "@zapfetchdev/mcp-server"],
"env": { "ZAPFETCH_API_KEY": "zf-your-api-key" }
}
}
}Install via Smithery
For a one-command install that auto-writes the config for your MCP client:
npx -y @smithery/cli install @zapfetchdev/mcp-server --client claude
# or --client cursor, --client windsurfSmithery will prompt for your ZapFetch API key once, then register the server in the right config file for you. This invokes the package's STDIO entry (bin.zapfetch-mcp → dist/index.js) — identical to the manual configs above.
Environment Variables
| Variable | Required | Default | Description |
|---------------------|----------|---------------------------|------------------------------|
| ZAPFETCH_API_KEY | STDIO | — | Your ZapFetch API key (STDIO mode only — HTTP mode takes the key per-request via Authorization: Bearer) |
| ZAPFETCH_API_URL | no | https://api.zapfetch.com| Override for self-host / dev |
| PORT | HTTP | 3000 | HTTP server port (HTTP mode only) |
| ZAPFETCH_TRANSPORT| Docker | stdio | Inside the Docker image only, switches between stdio / http via entry.sh. Ignored by the zapfetch-mcp / zapfetch-mcp-http npm binaries. |
Usage Examples
After configuration, ask your AI assistant naturally — it will pick the right tool automatically.
Scrape a single page
"Scrape https://rakuten.co.jp and give me the main content as markdown."
Uses zapfetch_scrape. Best for a known URL where you want raw page content quickly. If the page is geo-blocked or returns sparse content, follow up with a search instead.
Search the web
"Find the top 5 recent blog posts about TypeScript 5.7 and summarize each one."
Uses zapfetch_search. Returns ranked results with optional content extraction. Useful when you don't have a specific URL yet, or as a fallback when a direct scrape comes up empty.
Crawl a site (multi-page)
"Crawl https://docs.example.com starting from the root, up to 50 pages, and summarize the authentication section."
Uses zapfetch_crawl to kick off an async job (returns a job_id), then zapfetch_crawl_status to poll until complete. The assistant handles the polling loop — you just wait for the result.
Tip: For large sites, map first (see below) to identify which URLs are worth crawling before committing.
Map a site (URL discovery)
"List all URLs under https://docs.example.com/api so I can decide which pages to scrape."
Uses zapfetch_map. Returns URLs only — no content fetched — so it's fast even on large sites. Pair with zapfetch_scrape to cherry-pick the pages you actually need:
"Map https://stripe.com/docs, then scrape the 3 pages most relevant to webhook setup."
Extract structured data
"Extract product name, price, currency, and stock status from these 5 rakuten.co.jp product URLs. Return as a JSON array."
Uses zapfetch_extract with a prompt and optional JSON schema. The job is async — zapfetch_extract_status polls it to completion. Good for turning arbitrary product pages, job listings, or articles into structured records at scale.
Poll extract job status
"Check whether the extract job job_abc123 is done."
Uses zapfetch_extract_status directly. You rarely need to ask for this by name — the assistant calls it automatically after zapfetch_extract — but it's useful if you started a job in a previous session and want to retrieve results later.
Combining tools
Tools compose naturally. A few common patterns:
- Survey then scrape: map a large site to get all URLs, filter to the relevant ones, scrape each.
- Search then scrape: search to find the canonical source for a topic, then scrape that page for full content.
- Scrape with fallback: if
zapfetch_scrapereturns thin content (e.g. JS-heavy page), the assistant can fall back tozapfetch_searchto find a cached or mirror version.
Migrating from Firecrawl MCP
ZapFetch is Firecrawl-compatible at the API level, but this MCP uses zapfetch_* tool names (not firecrawl_*) to avoid conflicts if you run both. Capabilities are 1:1 — just update prompts referring to tool names.
HTTP transport (self-hosted)
For hosted / multi-tenant deployments, run the HTTP server instead of STDIO. Each request carries its own API key via Authorization: Bearer, so a single deployment can serve many users without sharing credentials.
Docker
docker run -d \
-p 3000:3000 \
-e ZAPFETCH_TRANSPORT=http \
docker.io/zapfetchdev/mcp:latestFrom npm
npm install -g @zapfetchdev/mcp-server
# then run the HTTP entry (never the STDIO one in this mode)
zapfetch-mcp-httpEndpoints
| Path | Auth | Behavior |
|------|------|----------|
| GET /health | none | Returns {ok:true, version, transport:"http"}. Use for container health checks. |
| POST /mcp | Authorization: Bearer <key> required | MCP JSON-RPC endpoint. Requests MUST include Accept: application/json, text/event-stream per MCP spec. |
Example call
curl -X POST http://localhost:3000/mcp \
-H "Authorization: Bearer fc-YOUR-KEY" \
-H "Content-Type: application/json" \
-H "Accept: application/json, text/event-stream" \
-d '{"jsonrpc":"2.0","id":1,"method":"tools/list"}'Security notes
- Do not set
ZAPFETCH_API_KEYin HTTP mode. The server refuses to start (exit code 2) if the env var is present — this prevents a misconfigured container from silently serving every request from the operator's key. - The HTTP server is stateless: each request builds its own transport + client, with the Bearer token flowing to the MCP tool handler via the SDK's native
extra.authInfo.tokenchannel. No cross-request state, no session leaks. - Upstream ZapFetch API error strings are sanitized before transiting to HTTP clients — only a small allowlist of error codes (rate_limit, invalid_key, quota_exceeded, upstream_unavailable) passes through.
- The stderr access log is strict-allowlist: only
ts / method / path / status / ms / origin_ip. Bearer tokens, bodies, and headers are never logged.
Development
pnpm install # or npm install
npm run typecheck
npm run build # -> dist/Local test with Claude Desktop pointed at your build:
{
"mcpServers": {
"zapfetch-dev": {
"command": "node",
"args": ["/absolute/path/to/zapfetch-mcp/dist/index.js"],
"env": { "ZAPFETCH_API_KEY": "zf-..." }
}
}
}License
MIT
