anti-bot-sniffer
v0.1.0
Published
Probe a URL and identify which anti-bot stack it uses (Cloudflare Bot Management, DataDome, PerimeterX, Akamai, Kasada, AWS WAF, …) and what proxy type you'll need to scrape it.
Maintainers
Readme
anti-bot-sniffer
Probe a URL and identify which anti-bot stack it uses — Cloudflare Bot Management, DataDome, PerimeterX, Akamai, Kasada, AWS WAF, Imperva, F5, Sucuri, Wordfence — plus the proxy tier you'll likely need to scrape it.
A one-shot CLI for the question every scraper asks before they commit to a target: "do I need datacenter, residential, or mobile proxies for this site?"
$ npx anti-bot-sniffer https://www.nike.com
https://www.nike.com
status 200 · 7 cookies set
Detected
● Akamai Bot Manager
via ak_bmsc cookie
Enterprise-grade. Behavior + IP scoring; carrier ASN avoids
most challenges.
Recommended proxy tier
▶ MOBILE CARRIER
Strict-tier detection: Akamai Bot Manager.
Datacenter and residential ASNs are likely blocked or challenged. Real
mobile carrier IPs (T-Mobile, Vodafone, Orange, etc.) blend with
millions of consumer phones and are the only tier this class of
anti-bot reliably trusts.Install
Run once via npx:
npx anti-bot-sniffer https://example.comOr install globally:
npm install -g anti-bot-sniffer
anti-bot-sniffer https://example.comNode 18+ required (uses built-in fetch).
Usage
anti-bot-sniffer <url> [options]
--json Output JSON for piping into other tools
--timeout <ms> Probe timeout (default 15000)
--user-agent <ua> Override the User-Agent header
-h, --help Show help
-v, --version Show versionWhat it detects
| Platform | Tier | What we look for |
|---|---|---|
| Cloudflare Bot Management | mobile | cf-mitigated header · __cf_bm cookie · Turnstile challenge |
| DataDome | mobile | x-dd-b / x-datadome-cid headers · datadome cookie · js.datadome.co |
| PerimeterX / HUMAN | mobile | _px3 / _pxhd / _pxvid cookies · client.perimeterx.net |
| Akamai Bot Manager | mobile | _abck / bm_sz / ak_bmsc cookies · AkamaiGHost server |
| Kasada | mobile | x-kpsdk-cd / x-kpsdk-ct headers · KP_UIDz cookie |
| F5 / Shape | mobile | TS<hex> cookies in Shape pattern |
| AWS WAF | residential | aws-waf-token cookie · challenge page markers |
| Imperva / Incapsula | residential | incap_ses_ / visid_incap_ cookies · x-iinfo header |
| Cloudflare (base CDN) | residential | cf-ray · cf-cache-status · server: cloudflare |
| Sucuri | datacenter | server: Sucuri/Cloudproxy · x-sucuri-* headers |
| Wordfence | datacenter | wordfence_verifiedHuman cookie · wfwaf- cookies |
| reCAPTCHA / hCaptcha / Turnstile | informational | embedded challenge widget references |
How it works
A single GET request with a normal browser-like User-Agent, follows up
to 5 redirects, reads up to 64KB of the response body, then scans
response headers, Set-Cookie names, and HTML markers against a
curated signature catalog (see src/signatures.ts).
This is a heuristic from HTTP signals only. Real anti-bot products do most of their detection in browser-side JavaScript fingerprinting, which an HTTP probe can't see. What this tool catches is the outer wall: the CDN / WAF identity, the cookies set on the first byte, and challenge widgets in the initial HTML. That's enough to make a decent first guess about IP-class requirements — usually the question that matters for choosing a proxy.
Recommendation rationale
Three tiers, in order of strictness:
mobile— only real mobile carrier IPs (T-Mobile, Vodafone, Orange, etc.) reliably pass. The carrier ASN is shared with millions of consumer phones, which makes IP-class scoring unreliable for anti-bot platforms (blocking one IP blocks hundreds of real subscribers). Used by all the enterprise anti-bot stacks above.residential— residential ISP-pool IPs blend with real home traffic at the ISP-ASN layer. Cheaper than mobile but easier to fingerprint across; major social/retail platforms increasingly flag the well-known pool ASNs.datacenter— IPs owned by AWS, Hetzner, DigitalOcean, etc. Cheap. Get blocked by anything that profiles IP class. Fine for documentation sites, public APIs, and low-trust crawls.
For a longer breakdown — including when datacenter is actually the right answer despite the strict-tier name — see Mobile vs residential vs datacenter proxies.
JSON output
$ npx anti-bot-sniffer nike.com --json
{
"url": "https://nike.com",
"finalUrl": "https://www.nike.com/",
"status": 200,
"redirects": 1,
"cookieCount": 7,
"recommendedTier": "mobile",
"tierReason": "Strict-tier detection: Akamai Bot Manager.",
"tierExplainer": "Datacenter and residential ASNs are likely blocked or challenged. …",
"detections": [
{
"id": "akamai-bot-manager",
"name": "Akamai Bot Manager",
"tier": "mobile",
"marker": "ak_bmsc cookie",
"vendorUrl": "https://www.akamai.com/products/bot-manager"
}
]
}Stable shape — pipe it into jq, your CI, or a target-tracking spreadsheet.
Limitations and honest caveats
- No JavaScript execution. A real browser would surface much more — fingerprint canvas, WebGL, audio context, behavior — none of which an HTTP probe can simulate. The tool catches the outer wall only.
- Sites change. Vendors deploy new defenses; we update signatures
reactively. Open an issue or PR with a
curl -idump if a target produces a false negative. - The
mobile/residential/datacenterrecommendation is industry consensus as of 2025–2026. A given site might block datacenter while another running the same Cloudflare tier doesn't. Treat the recommendation as a strong default, not a guarantee.
Contributing
Signatures live in src/signatures.ts. To add a
platform: a Signature object with a detect(probe) => string | null
function that returns the matched marker. PRs welcome, especially with
a real-world curl -iL example in the description so we can pin the
detection to observed traffic.
License
MIT.
Built by
The team at Atheris — mobile and residential proxies, per gigabyte, no subscriptions. We wrote this because every prospective customer asked the same question first: "will mobile even matter for the site I'm scraping?" This tool answers it without needing them to buy from us.
