@vinikjkkj/wa-fetcher

v0.2.0

Published

2 months ago

Headless scraper for web.whatsapp.com bundles — downloads every loaded JS chunk + emits a manifest. Pairs with @vinikjkkj/wa-mex and @vinikjkkj/wa-proto.

0High
0Medium
0Low

vinikjkkj

whatsapp whatsapp-web scraper fetcher puppeteer puppeteer-real-browser reverse-engineering wa-spec

@vinikjkkj/wa-fetcher

Headless scraper for web.whatsapp.com bundles. Downloads every loaded JS chunk to disk and writes a manifest — that's all. Per-domain extractors (proto, mex, diff, …) consume the raw dump independently.

Install

npm i @vinikjkkj/wa-fetcher

CLI

# Default: download every bundle + write manifest
npx wa-fetcher --out dump/

# Discovery only: just the URL list (no download). Useful when piping into
# tools that have their own downloader (e.g. wa-modules-loader).
npx wa-fetcher --urls-only --out urls.json
npx wa-fetcher --urls-only > urls.json

| Flag | Default | Notes | |---|---|---| | --out <path> | dump (dir) / stdout (urls-only) | Output destination | | --urls-only | off | Skip download; emit only the discovered URL array (JSON) | | --extra-wait <ms> | 5000 | Wait this long after network-idle for lazy chunks |

Output layout:

dump/
├── manifest.json                  { waVersion, fetchedAt, bundles[] }
└── raw/
    └── <wa-version>/
        ├── chunk-AAAA.js
        ├── chunk-BBBB.js
        └── …

Library

const { discoverBundleUrls, fetchBundles } = require('@vinikjkkj/wa-fetcher')

// Discovery only — no download. Returns the same URL list the full fetcher
// would have downloaded (sorted, deduped, host-filtered to static.whatsapp.net).
const { waVersion, urls } = await discoverBundleUrls()
//   waVersion   "2.3000.xxxxxxx" | null
//   urls        string[]

// Discovery + download.
const dump = await fetchBundles({ out: 'dump' })
//   dump.waVersion              "2.3000.xxxxxxx" | null
//   dump.bundles[]              [{ url, file, bytes }, ...]
//   dump.paths.raw              absolute path to dump/raw/<version>/
//   dump.paths.manifest         absolute path to dump/manifest.json

GitHub Action

- uses: vinikjkkj/wa-spec/packages/fetcher@v1
  id: fetch
  with:
      out: dump
- run: npx wa-mex apply --bundles ${{ steps.fetch.outputs.raw-dir }}
- run: npx wa-proto apply --bundles ${{ steps.fetch.outputs.raw-dir }}

Caveats

Lazy chunks that the SPA only loads via UI interaction (Settings, Profile, Premium) won't be in the dump — the data-sjs rsrcMap covers most of them but not 100%.
Anti-bot — puppeteer-real-browser works today but Meta can tighten detection. If the fetcher returns blank pages, re-evaluate the strategy.
No extraction — this package is intentionally dumb. The extractors live in @vinikjkkj/wa-mex and @vinikjkkj/wa-proto so adding a new artifact never requires touching the fetcher.

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

@vinikjkkj/wa-fetcher

Install

CLI

Library

GitHub Action

Caveats