npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

mallmaverick-store-scraper

v0.2.0

Published

MCP server + CLI for scraping shopping mall store directories. Hours-first layered pipeline + image classification.

Readme

mall-scraper-mcp

Layered scraper for shopping-mall store directories. Works as:

  • MCP server — coworkers drive scrapes from Claude Desktop / Claude Code
  • CLI — direct command-line use (node src/main.js)

Both share the same v5 pipeline: deterministic hours extraction (JSON-LD → DOM patterns → labeled section → sync-with-mall → focused LLM → external follow), per-page image classification with logo/brand/storefront separation, brand-site fallback for problematic logos.


How coworkers install it

Once published to npm and the Cloudflare Worker is deployed, every coworker runs one command in their terminal:

claude mcp add mall-scraper \
  --env MALL_SCRAPER_PROXY_URL=https://mall-scraper-openai-proxy.YOURSUB.workers.dev \
  --env MALL_SCRAPER_TOKEN=YOUR_SHARED_SECRET \
  -- npx -y mallmaverick-store-scraper@latest

Then in Claude they say things like:

Scrape https://grasslands.ca/store-directory/, first 10 stores. Save as CSV.

Claude calls the scrape_directory tool, returns the data, and Claude can do follow-up analysis (write CSV, find missing fields, retry specific stores).

What requires no setup on coworker machines

  • ❌ No git clone
  • ❌ No OpenAI API key (it lives in your Worker)
  • ❌ No zip to download or replace on updates
  • ✅ npm/npx + Node 18+ (most have this; otherwise nodejs.org)
  • ✅ The shared-secret token (you give them)

The first scrape downloads Chromium (~170 MB, one-time, automatic via Puppeteer).


How YOU set it up (one-time)

1. Deploy the Cloudflare Worker (10 min)

See cloudflare-worker/README.md. The short version:

cd cloudflare-worker
npm install
npx wrangler login          # browser auth to your Cloudflare account
npx wrangler deploy
npx wrangler secret put OPENAI_API_KEY     # paste your real OpenAI key
npx wrangler secret put SHARED_SECRET      # paste a long random string

You now have:

  • MALL_SCRAPER_PROXY_URL = https://mall-scraper-openai-proxy.YOURSUB.workers.dev
  • MALL_SCRAPER_TOKEN = (whatever you put as SHARED_SECRET)

Free tier covers ~300 mall scrapes/day. Cost = whatever your OpenAI bill is (~$0.005/store at gpt-5.4-mini).

2. Publish the npm package

# Log in to npm
npm login

# Sanity check
npm pack --dry-run                # see exactly what would be published

# First publish
npm publish --access public

If mall-scraper-mcp is taken, edit package.json "name" to something available (or use a scope like @yourname/mall-scraper-mcp — make sure to npm publish --access public for scoped public packages).

3. Share the install command with coworkers

Send them the one-line claude mcp add command above, with your actual proxy URL and shared secret pasted in.


How you ship updates

This is the workflow that makes "easy updates" actually easy:

# Make changes
git commit -am "improve hours layer 4 for X site"

# Bump the version
npm version patch                 # 0.1.0 → 0.1.1   (bug fixes)
npm version minor                 # 0.1.0 → 0.2.0   (new features)

# Publish
npm publish

Coworkers get the new version automatically on their next Claude session because the install command uses npx -y mallmaverick-store-scraper@latest — npx re-resolves to the latest published version every time.

If you want stricter pinning (you publish a buggy version, want time to revert), tell them to use [email protected] instead of @latest.

Worker updates (less frequent)

cd cloudflare-worker
npx wrangler deploy

Live in seconds. No coworker action needed.


CLI usage (you, or fallback)

cd path/to/mall-scraper-mcp
npm install
echo "OPENAI_API_KEY=sk-..." > .env    # or set MALL_SCRAPER_* env vars
./run.sh

CLI prompts for: directory URL, model, max stores, concurrency, threshold, vision yes/no. Output lands in extracted_stores/.


MCP tools exposed

| Tool | Use when | |---|---| | scrape_directory | User wants the full per-store extraction across a directory listing | | get_store_hours | Debugging — quick hours-only check on a single store URL | | validate_image_url | A logo isn't loading in the CMS — confirm whether the URL itself is bad |

All three accept JSON inputs documented in their schemas; Claude figures out the args from the conversation.


File layout

mall-scraper-mcp/
├── package.json             ← bin entry → src/mcp-server.js
├── src/
│   ├── mcp-server.js        ← MCP stdio server (entry for `npx mallmaverick-store-scraper`)
│   ├── main.js              ← CLI entry
│   ├── openai-proxy.js      ← chooses direct OpenAI vs Worker proxy from env
│   ├── browser.js           ← Puppeteer wrapper + XHR intercept
│   ├── discovery.js         ← directory URL discovery + logo map
│   ├── hoursParser.js       ← canonical hours parsing / validation
│   ├── hoursPipeline.js     ← 7-layer hours extraction
│   ├── mallContext.js       ← mall hours + socials + chrome images detection
│   ├── imageExtraction.js   ← logo/brand/storefront classifier
│   ├── brandSiteFallback.js ← brand-site logo when mall has GIF/missing
│   ├── deterministic.js     ← phone, socials, website, status flags
│   ├── storeExtractor.js    ← LLM extraction for non-deterministic fields
│   ├── retryStrategy.js     ← 3-attempt escalating page loads
│   ├── storeModel.js        ← 40-field schema + CSV writer (CRLF/BOM)
│   └── output.js            ← (legacy, unused by mcp server)
├── cloudflare-worker/
│   ├── worker.js            ← OpenAI proxy (30 LOC)
│   ├── wrangler.toml
│   └── README.md
└── test/
    └── hoursParser.test.js  ← 40+ unit tests

Auth modes

The scraper supports two ways to reach OpenAI; it picks the first that's configured:

  1. Proxy mode (production / coworker default). MALL_SCRAPER_PROXY_URL + MALL_SCRAPER_TOKEN set → calls go through the Cloudflare Worker, which holds your real OpenAI key.

  2. Direct mode (your local dev fallback). OPENAI_API_KEY set → calls go straight to api.openai.com. Useful when developing without spinning up the Worker.

If neither is set, the scraper refuses to start with a clear error.


Troubleshooting

Logo URL returns HTML in coworker's CMS: Ask Claude to run validate_image_url on the failing URL. Confirms whether the URL itself returns a real image. If it does, the issue is on the CMS side (the shopcurrents-style empty property_manager_id case is a known example).

Coworker gets "unauthorized" from the Worker: Their MALL_SCRAPER_TOKEN doesn't match the current SHARED_SECRET. Either rotate it on their side or wrangler secret put SHARED_SECRET to match.

First scrape takes 2-3 minutes: Puppeteer is downloading Chrome on first run (~170 MB). Subsequent scrapes are normal speed.

npx mallmaverick-store-scraper not found: They need Node 18+ in PATH. node --version to check.


License

MIT.