@vinikjkkj/wa-fetcher
v0.2.0
Published
Headless scraper for web.whatsapp.com bundles — downloads every loaded JS chunk + emits a manifest. Pairs with @vinikjkkj/wa-mex and @vinikjkkj/wa-proto.
Downloads
253
Maintainers
Readme
@vinikjkkj/wa-fetcher
Headless scraper for web.whatsapp.com bundles. Downloads every loaded JS chunk to disk and writes a manifest — that's all. Per-domain extractors (proto, mex, diff, …) consume the raw dump independently.
Install
npm i @vinikjkkj/wa-fetcherCLI
# Default: download every bundle + write manifest
npx wa-fetcher --out dump/
# Discovery only: just the URL list (no download). Useful when piping into
# tools that have their own downloader (e.g. wa-modules-loader).
npx wa-fetcher --urls-only --out urls.json
npx wa-fetcher --urls-only > urls.json| Flag | Default | Notes |
|---|---|---|
| --out <path> | dump (dir) / stdout (urls-only) | Output destination |
| --urls-only | off | Skip download; emit only the discovered URL array (JSON) |
| --extra-wait <ms> | 5000 | Wait this long after network-idle for lazy chunks |
Output layout:
dump/
├── manifest.json { waVersion, fetchedAt, bundles[] }
└── raw/
└── <wa-version>/
├── chunk-AAAA.js
├── chunk-BBBB.js
└── …Library
const { discoverBundleUrls, fetchBundles } = require('@vinikjkkj/wa-fetcher')
// Discovery only — no download. Returns the same URL list the full fetcher
// would have downloaded (sorted, deduped, host-filtered to static.whatsapp.net).
const { waVersion, urls } = await discoverBundleUrls()
// waVersion "2.3000.xxxxxxx" | null
// urls string[]
// Discovery + download.
const dump = await fetchBundles({ out: 'dump' })
// dump.waVersion "2.3000.xxxxxxx" | null
// dump.bundles[] [{ url, file, bytes }, ...]
// dump.paths.raw absolute path to dump/raw/<version>/
// dump.paths.manifest absolute path to dump/manifest.jsonGitHub Action
- uses: vinikjkkj/wa-spec/packages/fetcher@v1
id: fetch
with:
out: dump
- run: npx wa-mex apply --bundles ${{ steps.fetch.outputs.raw-dir }}
- run: npx wa-proto apply --bundles ${{ steps.fetch.outputs.raw-dir }}Caveats
- Lazy chunks that the SPA only loads via UI interaction (Settings,
Profile, Premium) won't be in the dump — the
data-sjsrsrcMap covers most of them but not 100%. - Anti-bot —
puppeteer-real-browserworks today but Meta can tighten detection. If the fetcher returns blank pages, re-evaluate the strategy. - No extraction — this package is intentionally dumb. The extractors live
in
@vinikjkkj/wa-mexand@vinikjkkj/wa-protoso adding a new artifact never requires touching the fetcher.
