mcp-scraper

v0.3.1

Published

9 hours ago

MCP server for MCP Scraper web intelligence tools

0High
0Medium
0Low

vilovieta

MCP Scraper

MCP Scraper is an MCP server for live web intelligence tools backed by https://mcpscraper.dev.

Install

Use the MCPB Desktop Extension for the branded Claude Desktop install, or use the npm package from any MCP client that can run local stdio commands.

MCP Scraper ships three local stdio entrypoints plus human-facing helper CLIs:

mcp-scraper — live web intelligence, SERP, PAA, site extraction, YouTube, Facebook ads and organic video transcripts, Maps, directory, rank tracker blueprint, and credit tools.
browser-agent — agent-controlled browser tools for hosted cloud sessions or local Chrome profile mode, with screenshots, clicks, typing, scrolling, watch URLs, replay links, MP4 replay download, and local profile import/sync helpers.
mcp-scraper-combined — one context-aware command. In a human terminal it prints the branded ASCII install card; in an MCP client it runs the combined stdio server with both tool sets. This is the entrypoint used by the MCPB Desktop Extension.
mcp-scraper-install — explicit alias for the human-facing terminal installer card with the branded ASCII intro and copyable install commands.
mcp-scraper-cli — a human-facing CLI for setup checks, AI-agent config generation, workflow prompts, local SEO workflow runs, and local HTML reports. This command is safe to print because it is not an MCP stdio server.

Terminal installer

Run the combined command when you want the designed terminal install experience:

npx -y -p mcp-scraper@latest mcp-scraper-combined

In a human terminal, it prints the MCP Scraper banner, loaded tool groups, Desktop Extension download, Claude Code command, and Codex config. When launched by an MCP client, the same command detects non-interactive stdio and writes only valid JSON-RPC to stdout.

The explicit installer alias still works:

npx -y -p mcp-scraper@latest mcp-scraper-install

Human CLI

Run setup checks and generate agent wiring:

npx -y -p mcp-scraper@latest mcp-scraper-cli doctor
npx -y -p mcp-scraper@latest mcp-scraper-cli browser profiles --email [email protected]
npx -y -p mcp-scraper@latest mcp-scraper-cli browser import-chrome --email [email protected] --name seo-example-com
MCP_SCRAPER_API_KEY=sk_live_your_key npx -y -p mcp-scraper@latest mcp-scraper-cli agent install claude --apply
npx -y -p mcp-scraper@latest mcp-scraper-cli agent install codex
npx -y -p mcp-scraper@latest mcp-scraper-cli agent install claude-desktop --browser-mode local --browser-profile seo-example-com
npx -y -p mcp-scraper@latest mcp-scraper-cli agent prompt agent-packet

agent install claude --apply upserts the Claude Code user-scope mcp-scraper entry to npx -y -p mcp-scraper@latest mcp-scraper-combined. Fully exit Claude Code and open a new Claude terminal after applying; MCP servers are attached when Claude starts.

Check usage and upgrade concurrency from a normal terminal:

MCP_SCRAPER_API_KEY=sk_live_your_key npx -y -p mcp-scraper@latest mcp-scraper-cli billing concurrency info
MCP_SCRAPER_API_KEY=sk_live_your_key npx -y -p mcp-scraper@latest mcp-scraper-cli billing concurrency checkout

Each account has 1 base concurrent operation. Extra concurrency slots are $5/month per slot. If an MCP tool hits concurrency_limit_exceeded, the error includes active, limit, upgrade_url, and the mcp-scraper-cli billing concurrency checkout command so the AI can explain the next step without guessing.

Run local workflow reports:

npx -y -p mcp-scraper@latest mcp-scraper-cli workflow list
npx -y -p mcp-scraper@latest mcp-scraper-cli workflow run agent-packet --keyword "roof repair Denver" --domain example.com
npx -y -p mcp-scraper@latest mcp-scraper-cli workflow run local-competitive-audit --query roofers --state TN --min-pop 100000 --per-city 20 --hydrate-top 5 --reviews 50
npx -y -p mcp-scraper@latest mcp-scraper-cli workflow run map-comparison --query roofers --location "Denver, CO" --per-city 20 --hydrate-top 5
npx -y -p mcp-scraper@latest mcp-scraper-cli workflow run serp-comparison --keyword "roof repair Denver" --domain example.com --extract-top 5
npx -y -p mcp-scraper@latest mcp-scraper-cli workflow run paa-expansion-brief --keyword "roof repair cost" --max-questions 80
npx -y -p mcp-scraper@latest mcp-scraper-cli workflow run ai-overview-language --keyword "best roof repair company" --domain example.com
npx -y -p mcp-scraper@latest mcp-scraper-cli report open last

Workflow runs save manifest.json, report.html, CSVs, Markdown, and evidence JSON under MCP_SCRAPER_OUTPUT_DIR/workflows or ~/Downloads/mcp-scraper/workflows. High-level workflow IDs are directory, agent-packet, local-competitive-audit, map-comparison, serp-comparison, paa-expansion-brief, and ai-overview-language.

Run hosted workflow schedules and hosted run history:

npx -y -p mcp-scraper@latest mcp-scraper-cli schedule create local-competitive-audit --weekly --query roofers --state TN --min-pop 100000 --per-city 20 --webhook https://example.com/mcp-scraper-hook
npx -y -p mcp-scraper@latest mcp-scraper-cli schedule list
npx -y -p mcp-scraper@latest mcp-scraper-cli schedule run <schedule-id>
npx -y -p mcp-scraper@latest mcp-scraper-cli schedule pause <schedule-id>
npx -y -p mcp-scraper@latest mcp-scraper-cli schedule resume <schedule-id>
npx -y -p mcp-scraper@latest mcp-scraper-cli runs list
npx -y -p mcp-scraper@latest mcp-scraper-cli runs status <run-id>
npx -y -p mcp-scraper@latest mcp-scraper-cli runs download <run-id>

The hosted workflow API is mounted under /workflows: GET /definitions, POST /run, GET /runs, GET /runs/:id, GET /runs/:id/artifacts/:artifactId, POST /schedules, GET /schedules, PATCH /schedules/:id, DELETE /schedules/:id, and POST /schedules/:id/run. Scheduled dispatch is handled by /workflows/cron/dispatch and the main /cron/tick route when CRON_SECRET is configured. Current hosted artifacts are filesystem-backed for immediate run retrieval; durable public rollout should add S3/R2-compatible object storage.

Claude Desktop MCPB

Build the branded one-click bundle:

npm run build:mcpb

The generated bundle is written to build/mcpb/mcp-scraper-<version>.mcpb and copied to public/downloads/ for the hosted download. The current public bundle is https://mcpscraper.dev/downloads/mcp-scraper.mcpb (0.2.24, SHA-256 2aff0e961e3de920d1fc67230056988a4a9120735f26f234f0ada6242350a4f0). Install it by opening or dragging it into Claude Desktop. Claude displays the MCP Scraper install card, icon, and API-key configuration field from the bundle manifest.

The MCPB install exposes the same web-intelligence tools as mcp-scraper plus all browser_* tools from browser-agent through one server.

Raw stdio config

Claude Desktop:

{
  "mcpServers": {
    "mcp-scraper": {
      "command": "npx",
      "args": ["-y", "-p", "mcp-scraper@latest", "mcp-scraper-combined"],
      "env": {
        "MCP_SCRAPER_API_KEY": "sk_live_your_key",
        "MCP_SCRAPER_BROWSER_MODE": "local",
        "MCP_SCRAPER_BROWSER_PROFILE": "work-accounts",
        "MCP_SCRAPER_BROWSER_EXECUTABLE": "/Applications/Google Chrome.app/Contents/MacOS/Google Chrome"
      }
    }
  }
}

Existing MCP configs that use only npx -y mcp-scraper still work for the web intelligence server, but they do not automatically add browser tools. Switch to mcp-scraper-combined or add the second browser-agent config entry if you want browser tools. Use mcp-scraper@latest to force npm to resolve the newest published package whenever the MCP client starts a fresh npx process.

For local Chrome state, first clone a Chrome profile into MCP Scraper managed storage:

mcp-scraper-cli browser import-chrome --email [email protected] --name work-accounts
MCP_SCRAPER_API_KEY=sk_live_your_key mcp-scraper-cli agent install claude --apply --browser-mode local --browser-profile work-accounts

browser import-chrome copies the selected local Chrome profile into ~/.mcp-scraper/browser-profiles/<name>/user-data, skipping cache and lock files. The managed clone can include cookies, local storage, history, session storage, and Chrome password database files. It does not upload the profile. Re-run mcp-scraper-cli browser sync-profile <name> after logging into new sites in normal Chrome.

Set MCP_SCRAPER_BROWSER_MODE=local and MCP_SCRAPER_BROWSER_PROFILE=<name> when you want browser_open to launch local Google Chrome against that managed clone. A profile argument passed directly to browser_open overrides the default for that one session.

Hosted browser profiles are still available. BROWSER_AGENT_PROFILE_NAME sets the default saved Kernel browser profile for hosted sessions. For first-time hosted setup, set BROWSER_AGENT_PROFILE_SAVE_CHANGES=true, open a browser with the profile, complete login through the watch URL, then call browser_close to persist cookies/local storage into the hosted Kernel profile.

Inside MCP clients, use browser_profile_list to inspect local Chrome account/profile metadata, browser_profile_import to clone a profile for local browser mode, browser_profile_sync to refresh an existing clone, and browser_profile_onboard only for hosted Kernel setup.

Claude Code one-command setup:

MCP_SCRAPER_API_KEY=sk_live_your_key npx -y -p mcp-scraper@latest mcp-scraper-cli agent install claude --apply

Then fully exit Claude Code, open a new Claude terminal, and verify:

claude mcp list

Manual Claude Code command:

claude mcp add mcp-scraper --scope user --env MCP_SCRAPER_API_KEY=sk_live_your_key -- npx -y -p mcp-scraper@latest mcp-scraper-combined

Split-server raw config still works:

claude mcp add mcp-scraper --scope user --env MCP_SCRAPER_API_KEY=sk_live_your_key -- npx -y mcp-scraper@latest
claude mcp add browser-agent --scope user --env MCP_SCRAPER_API_KEY=sk_live_your_key -- npx -y -p mcp-scraper@latest browser-agent

Codex config:

[mcp_servers.mcp-scraper]
command = "npx"
args = ["-y", "-p", "mcp-scraper@latest", "mcp-scraper-combined"]
env = { MCP_SCRAPER_API_KEY = "sk_live_your_key" }

Split-server Codex config:

[mcp_servers.mcp-scraper]
command = "npx"
args = ["-y", "mcp-scraper@latest"]
env = { MCP_SCRAPER_API_KEY = "sk_live_your_key" }

[mcp_servers.browser-agent]
command = "npx"
args = ["-y", "-p", "mcp-scraper@latest", "browser-agent"]
env = { MCP_SCRAPER_API_KEY = "sk_live_your_key" }

Tools

Web-intelligence tools

harvest_paa
search_serp
extract_url
map_site_urls
extract_site
youtube_harvest
youtube_transcribe
facebook_ad_search
facebook_page_intel
facebook_ad_transcribe — transcribe a direct Facebook ad video URL returned by facebook_page_intel.
facebook_video_transcribe — transcribe an organic Facebook reel, video, watch, post, or share URL, including fb.watch links. The tool renders the page, extracts the best matching public Facebook CDN MP4 URL, then returns transcript text, timestamped chunks, selected quality, video metadata, and the extracted MP4 URL for follow-up download.
maps_search — search Google Maps for multiple business/profile candidates. Use for GMB/GBP prospect lists, competitors, categories, and anything needing more than the Google 3-pack. In default proxyMode: "location", retryable failures rotate to a new residential proxy and new browser session for up to 5 attempts. maxResults defaults to 10 and is capped at 50.
maps_place_intel — hydrate one known/named Google Maps business with profile details and optional reviews. Use after maps_search when a selected candidate needs full details.
directory_workflow — build city-by-city directory/prospecting datasets from Census place selection plus Google Maps searches. Use it for requests like "all cities over 100k population in Tennessee, then get 20 roofers from Maps." In default proxyMode: "location", each city search rotates retryable failures to a new residential proxy and new browser session for up to 5 attempts. The saved CSV includes source_location, result_position, business_name, review_stars, review_count, category, address, phone, hours_status, website_url, directions_url, place_url, cid, cid_decimal, Census population, and ZIP groups.
workflow_list — list higher-level workflow IDs plus AI-facing recipes for market analysis, ICP research, forum/review acquisition, brand design briefings, CRO audits, positioning briefs, content gaps, and AI search visibility audits.
workflow_suggest — route a high-level business goal to the right workflow/tool chain before spending credits.
workflow_run — run hosted workflows such as agent-packet, local-competitive-audit, map-comparison, serp-comparison, paa-expansion-brief, and ai-overview-language; returns run metadata, summary, and artifact IDs.
workflow_status — reopen a workflow run and list its current status and artifacts.
workflow_artifact_read — pull generated workflow artifacts such as evidence.json, CSVs, Markdown briefs, and reports back into MCP context.
rank_tracker_blueprint — generate a database schema, cron/heartbeat plan, ingestion workflow, metrics list, and implementation prompt for building rank trackers. It has modes for Maps rankings via directory_workflow/maps_search, organic rankings via search_serp, AI Overview citation tracking, and PAA source presence tracking. This is a local planning tool and does not spend credits.
credits_info

Browser-agent tools

browser_open — open a browser session. Hosted mode returns a human watch_url; local mode opens Google Chrome on this machine against an imported managed profile.
browser_profile_list — list local Chrome account/profile metadata and suggested managed profile names. This does not read cookies, passwords, browsing history, or copy local Chrome state.
browser_profile_import — clone a local Chrome profile into ~/.mcp-scraper/browser-profiles for local browser mode. This copies browser state files but skips cache and locks.
browser_profile_sync — refresh an existing managed local profile clone from its recorded source Chrome profile.
browser_profile_onboard — create or load a hosted Kernel browser profile and open a setup browser with profile saving enabled. The user logs in through the watch_url, then browser_close persists cookies/local storage into the hosted Kernel profile.
browser_screenshot — capture a screenshot plus visible text and clickable element center coordinates and DOM bounds.
browser_read — read the current page text and elements with center coordinates and DOM bounds, without an image.
browser_locate — locate exact visible DOM elements or text ranges and return screenshot-pixel bounds.
browser_goto
browser_click
browser_type
browser_scroll
browser_press
browser_replay_start — start an MP4 replay. Returns replay_id, view_url, and download_url when available.
browser_replay_stop — stop a replay. Returns the final view_url and download_url.
browser_list_replays — list replay videos for a session.
browser_replay_download — download and save the replay MP4 locally under MCP_SCRAPER_OUTPUT_DIR/browser-replays.
browser_replay_mark — while recording, locate a DOM target and return a replay-timed annotation object.
browser_replay_annotate — download a replay MP4, render timed boxes, circles, underlines, arrows, and labels using annotation objects from browser_replay_mark or exact bounds from browser_locate, and save a new annotated MP4 locally.
browser_close
browser_list_sessions

For accurate annotated videos, do not guess annotation times from a script. Start the replay, navigate until each target is visible and stable, call browser_replay_mark for each callout, then stop the replay and pass the returned annotations to browser_replay_annotate with the returned source_width and source_height.

For general SERP tools (harvest_paa, search_serp, and hosted SERP capture), omit proxyMode for normal use. The default is configured, which uses the configured browser-service proxy without city/ZIP targeting for the highest general success rate. Use proxyMode: "location" only when the user explicitly needs city/ZIP-targeted residential proxy evidence. When Google shows a CAPTCHA/challenge, browser-service sessions briefly wait for automatic challenge solving first, then rotate to a new proxy/session if the challenge does not clear. Proxy tunnel failure and wrong-location evidence are also retryable before returning.

For Google Maps tools (maps_search and directory_workflow), keep proxyMode at the default location for US city/state work. Retryable Maps failures such as CAPTCHA, timeout, proxy tunnel failure, proxy unavailability, or browser-session death rotate to a new residential proxy and a new browser session for up to 5 attempts. Successful structured responses include sanitized attempt telemetry so callers can verify the proxy source, observed city/region, and attempt count without exposing full proxy or browser IDs.

The MCPB bundle and mcp-scraper-combined expose both sections through one local MCP server. The split mcp-scraper entrypoint exposes only the web-intelligence tools, and the split browser-agent entrypoint exposes only the browser-agent tools.

All local and hosted MCP tools expose output schemas and return structuredContent with the IDs, URLs, CSV paths, transcripts, browser session handles, replay paths, artifacts, recipe fields, or blueprint fields needed by the next step. Browser Agent tools keep a JSON text block for older clients, but structured data is the primary contract. All tools carry MCP annotations; file-writing tools such as replay downloads and annotations state their local file side effects.

The canonical tool inventory is generated at docs/mcp-tool-manifest.generated.json. The split local mcp-scraper stdio server exposes 22 web-intelligence/workflow tools. The split browser-agent stdio server exposes 21 browser tools. The combined local stdio server exposes 43 tools. The hosted MCP endpoint at https://mcpscraper.dev/mcp exposes the 22 web-intelligence/workflow tools plus capture_serp_snapshot and capture_serp_page_snapshots (24 total). Across all public local and hosted surfaces there are 45 unique tool names.

Resources

The mcp-scraper and mcp-scraper-combined NPX stdio servers also expose saved reports as MCP resources: resources/list returns the most recent Markdown reports from your output directory as report:// URIs, and resources/read returns their content — so an MCP client can pull prior research into context without re-scraping or spending credits. The hosted endpoint does not expose resources (it saves no files).

Environment

MCP_SCRAPER_API_KEY is required.
MCP_SCRAPER_BASE_URL is optional and defaults to https://mcpscraper.dev.
MCP_SCRAPER_OUTPUT_DIR is optional and defaults to ~/Downloads/mcp-scraper.
MCP_SCRAPER_SAVE_REPORTS=false disables automatic Markdown report files.
MCP_SCRAPER_KEY_PATH is optional. When no API key env var is set, the server also reads ~/.mcp-scraper-key for compatibility with older installs.
MCP_SCRAPER_BROWSER_MODE=local makes browser_open use local Google Chrome instead of the hosted browser service.
MCP_SCRAPER_BROWSER_PROFILE sets the default managed local browser profile name created by mcp-scraper-cli browser import-chrome.
MCP_SCRAPER_BROWSER_EXECUTABLE optionally points local browser mode at a Chrome executable. On macOS, Google Chrome is recommended so Keychain-backed Chrome state works.
MCP_SCRAPER_BROWSER_PROFILE_DIR optionally points local browser mode at a direct Chrome-compatible user data directory instead of a managed profile manifest.
BROWSER_AGENT_PROFILE_NAME is optional and sets the default saved Kernel browser profile for hosted-mode browser-agent and mcp-scraper-combined stdio sessions. Aliases: BROWSER_SERVICE_PROFILE_NAME, KERNEL_BROWSER_PROFILE_NAME, KERNEL_PROFILE_NAME.
BROWSER_AGENT_PROFILE_SAVE_CHANGES=true is optional hosted setup mode. It persists cookies and local storage back to the named profile when browser_close deletes the hosted browser session. Aliases: BROWSER_SERVICE_PROFILE_SAVE_CHANGES, KERNEL_BROWSER_PROFILE_SAVE_CHANGES, KERNEL_PROFILE_SAVE_CHANGES.

Every web intelligence tool call made through mcp-scraper or mcp-scraper-combined saves a full Markdown report locally by default and returns the file path in the MCP response. The hosted /mcp endpoint returns reports inline only and never writes files. Browser replay downloads are saved by browser_replay_download under MCP_SCRAPER_OUTPUT_DIR/browser-replays.

Updating Existing Installs

Hosted API and website changes deploy immediately to https://mcpscraper.dev. Local stdio MCP changes require publishing a new npm package version and restarting the MCP client. Running MCP server processes do not hot-update, and tool names/descriptions are loaded when the local server process starts.

Recommended config for update-friendly installs:

npx -y -p mcp-scraper@latest mcp-scraper-combined

This is context-aware: in a normal terminal it prints the visible installer and ASCII card; in an MCP client it runs as the silent stdio server. Use --stdio or MCP_SCRAPER_FORCE_STDIO=1 if you need to force server mode from a terminal.

Split-server config:

npx -y mcp-scraper@latest
npx -y -p mcp-scraper@latest browser-agent

If a user configured [email protected], installed globally with npm install -g mcp-scraper, or installed it as a project dependency, they will stay on that version until they update the config or reinstall:

npm update -g mcp-scraper
npm install mcp-scraper@latest

Users who do not update can keep using the tools their local package already advertises, but they will not see newly added local stdio tools, schemas, or AI-facing descriptions. For example, a client running an older local package cannot call rank_tracker_blueprint, directory_workflow, or browser tools through stdio even if the hosted API already supports adjacent endpoints. Users who configured only mcp-scraper must switch to mcp-scraper-combined or add browser-agent separately; MCP clients do not auto-create a second server entry from an existing config.

Branded One-Click Installs

Raw npx MCP server installs are command/config based. mcp-scraper-combined is context-aware: terminal TTY prints onboarding text and the ASCII card; MCP-client stdio pipes stay protocol-clean. Do not print marketing text to stdout from an active MCP stdio session; stdout is reserved for JSON-RPC protocol messages.

For a branded Claude Desktop install, package MCP Scraper as an MCPB Desktop Extension. The repository now builds one combined MCPB bundle with a generated icon, manifest.json, bundled runtime dependencies, and user_config fields for API-key setup, API URL, and output folder.

npm run build:mcpb

The bundle uses mcp-scraper-combined internally, so the user installs MCP Scraper once and gets web-intelligence tools plus live browser tools in one MCP server.

Development

MCP Tool Quality Spec defines the shipping bar for model-facing tool names, descriptions, schemas, structured outputs, errors, packaging, and deployment.