@itiden/check-sitemap
v0.1.0
Published
CLI tool that crawls an XML sitemap in headless Chrome and reports HTTP, console, and HTML validation issues.
Readme
check-sitemap
CLI tool that crawls an XML sitemap, loads every URL in a headless Chrome browser, and reports problems — HTTP errors, console errors, and HTML validation issues.
Install
bun installThis will also download a local Chrome binary via Puppeteer.
To run directly without installing globally:
bunx @itiden/check-sitemap https://example.com/sitemap.xmlUsage
bunx @itiden/check-sitemap <sitemap-url> [options]Options
| Flag | Description | Default |
| ------------------------ | ---------------------------------------- | -------- |
| -c, --concurrency <n> | Number of concurrent page checks | 3 |
| -t, --timeout <ms> | Page load timeout in milliseconds | 30000 |
| -a, --auth <user:pass> | Basic auth credentials | none |
| -l, --limit <n> | Only check the first N URLs | all |
| --no-validate-html | Skip HTML validation | validate |
| -v, --verbose | Show problem details inline during crawl | false |
| -h, --help | Show help | |
Examples
# Check a sitemap
bunx @itiden/check-sitemap https://example.com/sitemap.xml
# With basic auth and higher concurrency
bunx @itiden/check-sitemap https://staging.example.com/sitemap.xml --auth admin:secret -c 5
# Fast check — skip HTML validation, verbose output
bunx @itiden/check-sitemap https://example.com/sitemap.xml --no-validate-html -v
# Custom timeout for slow pages
bunx @itiden/check-sitemap https://example.com/sitemap.xml -t 60000
# Test with only the first 10 URLs
bunx @itiden/check-sitemap https://example.com/sitemap.xml --limit 10What it checks
- Sitemap resolution — recursively follows
<sitemapindex>children to collect all<url><loc>entries - HTTP status — flags any response with status >= 400
- Console errors — captures
console.erroroutput and uncaught JS exceptions from the page - HTML validation — runs html-validate with recommended rules against the rendered page source
Output
Each page is logged with a status as it completes. At the end, a summary lists all pages with problems grouped by URL.
Exits with code 1 if any problems were found, 0 otherwise.
Automated release with GitHub Releases
This repository includes three GitHub workflows:
.github/workflows/pr-labeler.ymlauto-labels pull requests.github/workflows/release-drafter.ymlupdates the upcoming release draft with merged PRs.github/workflows/release.ymlpublishes to npm when you manually publish a GitHub Release
One-time setup
- Create an npm automation token with publish access.
- In GitHub, add it as repository secret:
NPM_TOKEN.
Release flow
- Merge PRs to
main— each merged PR is added as a row in the draft release notes. - Bump
versioninpackage.json(for example0.1.1) and push tomainwhen you are ready. - Open GitHub Releases and publish the draft (or create/publish a release) with tag
v0.1.1(or0.1.1). - The publish workflow validates tag/version match and publishes to npm.
