silentscraper
v1.0.2
Published
Silent stealth-grade scraping and reverse engineering toolkit: rotating UAs, proxy rotation, TLS/JA3 fingerprint hints, cookie jar, session handling, HTML/JSON/XML/RSS/sitemap/robots parsers, JS deobfuscator, API endpoint discovery, network call sniffer h
Maintainers
Readme
SilentScraper
Silent. Stealth. Surgical. A complete scraping and reverse-engineering toolkit for Node.js.
SilentScraper is a batteries-included toolkit for engineers who scrape, crawl, and reverse-engineer web targets. It bundles a stealth HTTP client, rotating identities, parsers for every common format, and reverse-engineering helpers, usable as a library or via the silent CLI.
Install
npm install silentscraperAt a glance
import { Silent } from "silentscraper";
const s = new Silent({
proxies: ["http://user:[email protected]:8080"],
proxyStrategy: "round-robin",
rateLimit: { rps: 2, jitterMs: 250 },
concurrency: 8,
retries: 4,
stealth: true,
deviceProfile: "desktop",
logLevel: "info",
});
const page = await s.scrape("https://example.com");
console.log(page.text("h1"));
console.log(page.links());
console.log(page.tables());
const data = await s.json("https://api.example.com/v1/items");
console.log(data.query("$..price"));
const intel = await s.reverse("https://target.com", { followScripts: true, deobfuscate: true });
console.log(intel.protections);
console.log(intel.scan.endpoints);
console.log(intel.scan.secrets);
console.log(intel.scan.jsonBlobs);
await s.crawl("https://example.com", {
maxDepth: 3,
maxPages: 500,
onPage: ({ url, parser }) => console.log(url, parser.text("title")),
});Features
Stealth HTTP client
- Rotating User-Agents (desktop + mobile pools)
- Proxy rotation: round-robin / random / sticky, with health tracking and auto-bans
- Coherent
sec-ch-uaclient hints matched to UA - HTTP/1.1 + HTTP/2 via
undici, gzip / deflate / brotli auto-decoded - Cookie jar with persistence (tough-cookie)
- Per-host token-bucket rate limiting with jitter
- Bounded-concurrency task queue
- Automatic retries with exponential backoff on configurable status codes
- Custom retry status codes, timeouts, redirect limits
Parsers
- HTML (CSS selectors, links/images/scripts/stylesheets, meta, JSON-LD, forms, tables, tabular extract)
- JSON (JSONPath via jsonpath-plus)
- XML (JSON-style + XPath)
- sitemap.xml + sitemapindex.xml
- robots.txt
- RSS 2.0 + Atom feeds
Reverse engineering
- Endpoint finder (fetch, XHR, axios, generic /api/, GraphQL, WebSockets, NEXT_DATA, Apollo, application/json scripts, route hints)
- Secret scanner (AWS, Google, Stripe, Slack, GitHub, JWT, generic)
- Protection detector (Cloudflare, Akamai, PerimeterX, DataDome, Imperva, Kasada, reCAPTCHA, hCaptcha, Turnstile, ShieldSquare)
- JS deobfuscator (escape decode, base64 extract, _0x rename, pretty-print)
- String extractor for long constants
- cURL parser + builder (DevTools "Copy as cURL" round-trip)
- HAR parser
- JA3 fingerprint hints
Output
- CSV, JSON, NDJSON exporters with streaming append
CLI
silent fetch <url> [--proxy URL] [--ua STRING] [--out FILE]
silent scrape <url> --selector "h1" [--attr href]
silent crawl <url> [--depth 2] [--max 50] [--out file.ndjson]
silent sitemap <url>
silent robots <origin>
silent feed <url>
silent reverse <url> [--follow-scripts] [--deobfuscate]
silent deobfuscate <file.js> [--out file.js]
silent curl <file-or-->
silent endpoints <file>
silent protections <url>Ethical use
This toolkit is intended for security research, accessibility, journalism, competitive intelligence on public data, and engineering on systems you have authorization to test. Respect robots.txt, terms of service, and applicable law.
License
MIT
