@posty5/screenshoter

v2.1.0

Published

a month ago

Crawl a website and take screenshots of every page, saving them in a folder structure mirroring the URL path

0High
0Medium
0Low

posty5urls

screenshot crawl website playwright capture posty5

@posty5/screenshoter

Crawl any website, discover all its pages, and take full-page screenshots — saved in a folder structure that mirrors the URL path. Built with Crawlee and Playwright.

🌟 What is @posty5/screenshoter?

@posty5/screenshoter is a TypeScript package that automates the process of capturing screenshots for every page on a website. It handles two tasks:

Crawl — Discover all URLs on a website using Crawlee's PlaywrightCrawler or sitemap parsing
Capture — Take a screenshot of each URL using Playwright and save it to disk
Single Page — Capture a screenshot of a single URL without crawling
Batch Capture — Capture screenshots of a list of URLs you provide

Screenshots are saved in a folder structure that mirrors the URL path:

https://posty5.com/en/social-media-publisher
  → screenshots/posty5.com/en/social-media-publisher/capture.webp

https://posty5.com/en/qr-code-generator
  → screenshots/posty5.com/en/qr-code-generator/capture.webp

https://posty5.com/
  → screenshots/posty5.com/capture.webp

Use Cases:

📸 Visual regression testing
🗂️ Website archiving and documentation
🖼️ Generating preview images for SEO / social sharing
📊 Auditing website pages at scale
🔍 Collecting URLs from a website for analysis

📦 Installation

npm install @posty5/screenshoter

After installing, make sure Playwright's Chromium browser is available:

npx playwright install chromium

🚀 Quick Start

Programmatic API

import { captureWebsite } from "@posty5/screenshoter";

const result = await captureWebsite({
  url: "https://posty5.com",
  outputDir: "./screenshots",
  format: "webp",
  excludePatterns: ["/api/**", "/user/*"],
});

console.log(`Captured ${result.captured} of ${result.totalUrls} pages`);

CLI

# Take screenshots of all pages
npx screenshoter capture https://posty5.com -o ./screenshots

# Collect URLs only (save to file)
npx screenshoter collect-urls https://posty5.com --file urls.txt

📚 API Reference

`captureWebsite(config)`

Crawl a website, discover all pages, and take a screenshot of each one.

Returns: Promise<CaptureResult>

import { captureWebsite } from "@posty5/screenshoter";

const result = await captureWebsite({
  url: "https://posty5.com",
  outputDir: "./screenshots",
  format: "webp",
  viewport: { width: 1440, height: 900 },
  maxPages: 50,
  maxDepth: 3,
  excludePatterns: ["/api/**", "/user/*", "*.pdf"],
  concurrency: 3,
  waitAfterLoad: 1000,
  headless: true,
});

console.log(result);
// {
//   totalUrls: 42,
//   captured: 40,
//   failed: 2,
//   skipped: 0,
//   pages: [
//     { url: 'https://posty5.com/en', filePath: './screenshots/posty5.com/en/capture.webp', status: 'ok' },
//     { url: 'https://posty5.com/en/about', filePath: './screenshots/posty5.com/en/about/capture.webp', status: 'ok' },
//     ...
//   ]
// }

`collectUrls(config)`

Crawl a website and save all discovered URLs to a text file — one URL per line.

Config type: CollectUrlsConfig — only URL collection options (no screenshot settings).

Returns: Promise<string[]>

import { collectUrls } from "@posty5/screenshoter";

const urls = await collectUrls({
  url: "https://posty5.com",
  outputFile: "./urls.txt",
  maxPages: 100,
  excludePatterns: ["/api/**"],
});

console.log(`Found ${urls.length} URLs`);
// urls.txt:
// https://posty5.com
// https://posty5.com/en
// https://posty5.com/en/social-media-publisher
// https://posty5.com/en/qr-code-generator
// ...

`capturePage(config)`

Capture a screenshot of a single page — no crawling involved.

Config type: CapturePageConfig — only screenshot options + url.

Returns: Promise<PageResult>

import { capturePage } from "@posty5/screenshoter";

const result = await capturePage({
  url: "https://posty5.com/en/about",
  outputDir: "./screenshots",
  format: "png",
  viewport: { width: 1440, height: 710 },
  waitAfterLoad: 1500,
});

console.log(result);
// {
//   url: 'https://posty5.com/en/about',
//   filePath: './screenshots/posty5.com/en/about/capture.png',
//   status: 'ok'
// }

`capturePages(urls, config?)`

Capture screenshots for a list of URLs. Reuses a single browser instance and supports concurrency.

Config type: CaptureConfig — only screenshot options (no url or crawl settings).

Returns: Promise<CaptureResult>

import { capturePages } from "@posty5/screenshoter";

const result = await capturePages(["https://posty5.com", "https://posty5.com/en/about", "https://posty5.com/en/social-media-publisher"], {
  outputDir: "./screenshots",
  format: "png",
  viewport: { width: 1440, height: 710 },
  concurrency: 2,
  waitAfterLoad: 1500,
});

console.log(`Captured ${result.captured} of ${result.totalUrls} pages`);

`scrollToEnd(page)`

A beforeScreenshot utility that smoothly scrolls to the bottom of the page and waits for scroll-triggered animations to settle before the screenshot is taken. Useful for pages with fade-in or reveal animations driven by IntersectionObserver.

Returns: Promise<void>

import { capturePages, scrollToEnd } from "@posty5/screenshoter";

const result = await capturePages(urls, {
  outputDir: "./screenshots",
  format: "png",
  fullPage: true,
  beforeScreenshot: scrollToEnd,
});

⚙️ Configuration

The configuration is split into two concerns:

URL Collection Options (`CollectUrlsConfig`)

Used by collectUrls() and captureWebsite().

| Option | Type | Default | Description | | ----------------- | -------------------------- | ------------------- | -------------------------------------------------- | | url | string | (required) | Root URL to start crawling from | | strategy | 'crawl' \| 'sitemap' | 'crawl' | URL discovery strategy | | sitemapUrl | string | <url>/sitemap.xml | Custom sitemap URL (sitemap strategy only) | | maxPages | number | 100 | Maximum number of pages to crawl | | maxDepth | number | 5 | Maximum crawl depth from the root URL | | excludePatterns | string[] | [] | Glob patterns for URL paths to skip | | includePatterns | string[] | [] | Glob patterns — only matching URLs are captured | | sameDomainOnly | boolean | true | Only capture URLs on the same domain | | headless | boolean | true | Run browser in headless mode (crawl strategy) | | shouldCapture | (url: string) => boolean | — | Programmatic filter — return false to skip a URL |

For collectUrls(), an additional option is available:

| Option | Type | Default | Description | | ------------ | -------- | ------------ | ------------------------------ | | outputFile | string | 'urls.txt' | File path to save the URL list |

Screenshot Options (`CaptureConfig`)

Used by capturePage(), capturePages(), and captureWebsite().

| Option | Type | Default | Description | | --- | --- | --- | --- | | outputDir | string | './screenshots' | Directory to save screenshots | | format | 'webp' \| 'png' \| 'jpeg' | 'webp' | Screenshot image format | | viewport | { width, height } | { width: 1440, height: 900 } | Browser viewport size | | fullPage | boolean | false | Capture the full scrollable page instead of just the viewport | | waitAfterLoad | number | 1000 | Milliseconds to wait after page load before screenshot | | concurrency | number | 3 | Number of parallel browser pages | | headless | boolean | true | Run browser in headless mode | | scrollToEnd | boolean | false | Smoothly scroll to the bottom before screenshot (triggers scroll animations). Auto-enabled when fullPage is true | | beforeScreenshot | (page: Page) => Promise<void> | — | Callback to run before each screenshot |

captureWebsite() accepts both sets of options combined (CaptureWebsiteConfig).

🚫 Filtering Dynamic Pages

Three mechanisms to control which pages get captured:

1. Exclude Patterns

Skip URLs matching glob patterns:

await captureWebsite({
  url: "https://example.com",
  excludePatterns: [
    "/api/**", // Skip all API routes
    "/user/*", // Skip user profile pages
    "/*/edit", // Skip edit pages
    "*.pdf", // Skip PDF links
    "/auth/**", // Skip auth pages
  ],
});

2. Include Patterns (Whitelist)

When set, only matching URLs are captured:

await captureWebsite({
  url: "https://example.com",
  includePatterns: [
    "/en/**", // Only capture English pages
    "/products/*", // Only capture product pages
  ],
});

3. Custom Filter Callback

For complex logic:

await captureWebsite({
  url: "https://example.com",
  shouldCapture: (url) => {
    const parsed = new URL(url);
    // Skip URLs with query parameters
    if (parsed.search) return false;
    // Skip paths with more than 4 segments
    if (parsed.pathname.split("/").filter(Boolean).length > 4) return false;
    return true;
  },
});

🖥️ CLI Reference

`capture` — Take Screenshots

npx screenshoter capture <url> [options]

| Option | Description | Default | | ------------------------- | ---------------------------------------- | --------------- | | -o, --output <dir> | Output directory | ./screenshots | | -f, --format <format> | Image format: webp, png, jpeg | webp | | --width <number> | Viewport width | 1440 | | --height <number> | Viewport height | 900 | | --max-pages <number> | Maximum pages to crawl | 100 | | --max-depth <number> | Maximum crawl depth | 5 | | --exclude <patterns...> | Glob patterns to exclude | — | | --include <patterns...> | Glob patterns to include | — | | --concurrency <number> | Parallel browser pages | 3 | | --wait <number> | Wait ms after page load | 1000 | | --no-headless | Show browser window | — | | --urls-file <path> | Use URLs from a file instead of crawling | — |

Examples:

# Basic usage
npx screenshoter capture https://posty5.com

# Custom output and format
npx screenshoter capture https://posty5.com -o ./my-screenshots -f png

# Exclude dynamic pages
npx screenshoter capture https://posty5.com --exclude "/api/**" "/user/*" "/auth/**"

# Only capture specific sections
npx screenshoter capture https://posty5.com --include "/en/**"

# Limit crawl depth and page count
npx screenshoter capture https://posty5.com --max-pages 20 --max-depth 2

# Use a pre-collected URLs file
npx screenshoter capture https://posty5.com --urls-file urls.txt

`collect-urls` — Collect URLs Only

npx screenshoter collect-urls <url> [options]

| Option | Description | Default | | ------------------------- | ------------------------ | ---------- | | --file <path> | Output file path | urls.txt | | --max-pages <number> | Maximum pages to crawl | 100 | | --max-depth <number> | Maximum crawl depth | 5 | | --exclude <patterns...> | Glob patterns to exclude | — | | --include <patterns...> | Glob patterns to include | — | | --no-headless | Show browser window | — |

Examples:

# Collect all URLs
npx screenshoter collect-urls https://posty5.com

# Save to custom file with filters
npx screenshoter collect-urls https://posty5.com --file sitemap.txt --exclude "/api/**"

🔧 Advanced Usage

Scroll Animations (Fade-in on Scroll)

Some pages reveal content as you scroll (using IntersectionObserver or CSS fade-in classes). A viewport-only screenshot would capture those elements as invisible/empty.

Use scrollToEnd: true or the built-in scrollToEnd utility to scroll through the page first:

import { capturePages, scrollToEnd } from "@posty5/screenshoter";

// Option 1 — config flag (auto-enabled when fullPage: true)
const result = await capturePages(urls, {
  fullPage: true, // scrollToEnd is automatically true
  waitAfterLoad: 1000,
});

// Option 2 — explicit flag
const result = await capturePages(urls, {
  scrollToEnd: true,
  waitAfterLoad: 1000,
});

// Option 3 — use the scrollToEnd utility as a beforeScreenshot hook
const result = await capturePages(urls, {
  fullPage: true,
  beforeScreenshot: scrollToEnd,
});

Before Screenshot Hook

Dismiss cookie banners, close popups, or interact with the page before taking the screenshot:

await captureWebsite({
  url: "https://example.com",
  beforeScreenshot: async (page) => {
    // Dismiss cookie consent
    const cookieBtn = page.locator('button:has-text("Accept")');
    if (await cookieBtn.isVisible()) {
      await cookieBtn.click();
      await page.waitForTimeout(500);
    }
  },
});

Two-Step Workflow: Collect → Edit → Capture

First collect URLs, manually edit the list, then capture only the URLs you want:

# Step 1: Collect all URLs
npx screenshoter collect-urls https://posty5.com --file urls.txt

# Step 2: Edit urls.txt — remove any pages you don't want

# Step 3: Capture only the remaining URLs
npx screenshoter capture https://posty5.com --urls-file urls.txt

Or programmatically with separated configs:

import { collectUrls, capturePages } from "@posty5/screenshoter";

// Step 1: Collect URLs (CollectUrlsConfig only)
const urls = await collectUrls({
  url: "https://posty5.com",
  strategy: "sitemap",
  maxPages: 500,
  excludePatterns: ["/api/**"],
  outputFile: "./urls.txt",
});

// Step 2: Filter programmatically
const staticPages = urls.filter((u) => !u.includes("/trends/"));

// Step 3: Capture with separate config (CaptureConfig only)
const result = await capturePages(staticPages, {
  outputDir: "./screenshots",
  format: "png",
  viewport: { width: 1440, height: 710 },
  concurrency: 3,
});

📂 Output Structure

The output folder mirrors the URL path structure:

screenshots/
├── posty5.com/
│   ├── capture.webp                              ← https://posty5.com
│   ├── en/
│   │   ├── capture.webp                          ← https://posty5.com/en
│   │   ├── social-media-publisher/
│   │   │   └── capture.webp                      ← https://posty5.com/en/social-media-publisher
│   │   ├── qr-code-generator/
│   │   │   └── capture.webp                      ← https://posty5.com/en/qr-code-generator
│   │   └── url-shortener/
│   │       └── capture.webp                      ← https://posty5.com/en/url-shortener
│   └── ar/
│       ├── capture.webp                          ← https://posty5.com/ar
│       └── social-media-publisher/
│           └── capture.webp                      ← https://posty5.com/ar/social-media-publisher

💻 Requirements

Node.js: >= 18.0.0
TypeScript: Full type definitions included
Browser: Chromium (auto-installed via Playwright)

📖 Resources

Website: https://posty5.com
Dashboard: https://studio.posty5.com
GitHub: https://github.com/Posty5/Screenshoter
Support: https://posty5.com/contact-us

🆘 Support

Contact Us: https://posty5.com/contact-us
GitHub Issues: Report bugs or request features

🤝 Contributing

Contributions are welcome! Please follow these steps:

Fork the repository
Create a feature branch: git checkout -b feature/amazing-feature
Make your changes
Commit your changes: git commit -m 'Add amazing feature'
Push to the branch: git push origin feature/amazing-feature
Submit a pull request

📄 License

MIT License - see LICENSE file for details.

Made with ❤️ by the Posty5 team

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

@posty5/screenshoter

🌟 What is @posty5/screenshoter?

📦 Installation

🚀 Quick Start

Programmatic API

CLI

📚 API Reference

captureWebsite(config)

collectUrls(config)

capturePage(config)

capturePages(urls, config?)

scrollToEnd(page)

⚙️ Configuration

URL Collection Options (CollectUrlsConfig)

Screenshot Options (CaptureConfig)

🚫 Filtering Dynamic Pages

1. Exclude Patterns

2. Include Patterns (Whitelist)

3. Custom Filter Callback

🖥️ CLI Reference

capture — Take Screenshots

collect-urls — Collect URLs Only

🔧 Advanced Usage

Scroll Animations (Fade-in on Scroll)

Before Screenshot Hook

Two-Step Workflow: Collect → Edit → Capture

📂 Output Structure

💻 Requirements

📖 Resources

🆘 Support

🤝 Contributing

📄 License

`captureWebsite(config)`

`collectUrls(config)`

`capturePage(config)`

`capturePages(urls, config?)`

`scrollToEnd(page)`

URL Collection Options (`CollectUrlsConfig`)

Screenshot Options (`CaptureConfig`)

`capture` — Take Screenshots

`collect-urls` — Collect URLs Only