@posty5/screenshoter
v2.1.0
Published
Crawl a website and take screenshots of every page, saving them in a folder structure mirroring the URL path
Maintainers
Readme
@posty5/screenshoter
Crawl any website, discover all its pages, and take full-page screenshots — saved in a folder structure that mirrors the URL path. Built with Crawlee and Playwright.
🌟 What is @posty5/screenshoter?
@posty5/screenshoter is a TypeScript package that automates the process of capturing screenshots for every page on a website. It handles two tasks:
- Crawl — Discover all URLs on a website using Crawlee's PlaywrightCrawler or sitemap parsing
- Capture — Take a screenshot of each URL using Playwright and save it to disk
- Single Page — Capture a screenshot of a single URL without crawling
- Batch Capture — Capture screenshots of a list of URLs you provide
Screenshots are saved in a folder structure that mirrors the URL path:
https://posty5.com/en/social-media-publisher
→ screenshots/posty5.com/en/social-media-publisher/capture.webp
https://posty5.com/en/qr-code-generator
→ screenshots/posty5.com/en/qr-code-generator/capture.webp
https://posty5.com/
→ screenshots/posty5.com/capture.webpUse Cases:
- 📸 Visual regression testing
- 🗂️ Website archiving and documentation
- 🖼️ Generating preview images for SEO / social sharing
- 📊 Auditing website pages at scale
- 🔍 Collecting URLs from a website for analysis
📦 Installation
npm install @posty5/screenshoterAfter installing, make sure Playwright's Chromium browser is available:
npx playwright install chromium🚀 Quick Start
Programmatic API
import { captureWebsite } from "@posty5/screenshoter";
const result = await captureWebsite({
url: "https://posty5.com",
outputDir: "./screenshots",
format: "webp",
excludePatterns: ["/api/**", "/user/*"],
});
console.log(`Captured ${result.captured} of ${result.totalUrls} pages`);CLI
# Take screenshots of all pages
npx screenshoter capture https://posty5.com -o ./screenshots
# Collect URLs only (save to file)
npx screenshoter collect-urls https://posty5.com --file urls.txt📚 API Reference
captureWebsite(config)
Crawl a website, discover all pages, and take a screenshot of each one.
Returns: Promise<CaptureResult>
import { captureWebsite } from "@posty5/screenshoter";
const result = await captureWebsite({
url: "https://posty5.com",
outputDir: "./screenshots",
format: "webp",
viewport: { width: 1440, height: 900 },
maxPages: 50,
maxDepth: 3,
excludePatterns: ["/api/**", "/user/*", "*.pdf"],
concurrency: 3,
waitAfterLoad: 1000,
headless: true,
});
console.log(result);
// {
// totalUrls: 42,
// captured: 40,
// failed: 2,
// skipped: 0,
// pages: [
// { url: 'https://posty5.com/en', filePath: './screenshots/posty5.com/en/capture.webp', status: 'ok' },
// { url: 'https://posty5.com/en/about', filePath: './screenshots/posty5.com/en/about/capture.webp', status: 'ok' },
// ...
// ]
// }collectUrls(config)
Crawl a website and save all discovered URLs to a text file — one URL per line.
Config type: CollectUrlsConfig — only URL collection options (no screenshot settings).
Returns: Promise<string[]>
import { collectUrls } from "@posty5/screenshoter";
const urls = await collectUrls({
url: "https://posty5.com",
outputFile: "./urls.txt",
maxPages: 100,
excludePatterns: ["/api/**"],
});
console.log(`Found ${urls.length} URLs`);
// urls.txt:
// https://posty5.com
// https://posty5.com/en
// https://posty5.com/en/social-media-publisher
// https://posty5.com/en/qr-code-generator
// ...capturePage(config)
Capture a screenshot of a single page — no crawling involved.
Config type: CapturePageConfig — only screenshot options + url.
Returns: Promise<PageResult>
import { capturePage } from "@posty5/screenshoter";
const result = await capturePage({
url: "https://posty5.com/en/about",
outputDir: "./screenshots",
format: "png",
viewport: { width: 1440, height: 710 },
waitAfterLoad: 1500,
});
console.log(result);
// {
// url: 'https://posty5.com/en/about',
// filePath: './screenshots/posty5.com/en/about/capture.png',
// status: 'ok'
// }capturePages(urls, config?)
Capture screenshots for a list of URLs. Reuses a single browser instance and supports concurrency.
Config type: CaptureConfig — only screenshot options (no url or crawl settings).
Returns: Promise<CaptureResult>
import { capturePages } from "@posty5/screenshoter";
const result = await capturePages(["https://posty5.com", "https://posty5.com/en/about", "https://posty5.com/en/social-media-publisher"], {
outputDir: "./screenshots",
format: "png",
viewport: { width: 1440, height: 710 },
concurrency: 2,
waitAfterLoad: 1500,
});
console.log(`Captured ${result.captured} of ${result.totalUrls} pages`);scrollToEnd(page)
A beforeScreenshot utility that smoothly scrolls to the bottom of the page and waits for scroll-triggered animations to settle before the screenshot is taken. Useful for pages with fade-in or reveal animations driven by IntersectionObserver.
Returns: Promise<void>
import { capturePages, scrollToEnd } from "@posty5/screenshoter";
const result = await capturePages(urls, {
outputDir: "./screenshots",
format: "png",
fullPage: true,
beforeScreenshot: scrollToEnd,
});⚙️ Configuration
The configuration is split into two concerns:
URL Collection Options (CollectUrlsConfig)
Used by collectUrls() and captureWebsite().
| Option | Type | Default | Description |
| ----------------- | -------------------------- | ------------------- | -------------------------------------------------- |
| url | string | (required) | Root URL to start crawling from |
| strategy | 'crawl' \| 'sitemap' | 'crawl' | URL discovery strategy |
| sitemapUrl | string | <url>/sitemap.xml | Custom sitemap URL (sitemap strategy only) |
| maxPages | number | 100 | Maximum number of pages to crawl |
| maxDepth | number | 5 | Maximum crawl depth from the root URL |
| excludePatterns | string[] | [] | Glob patterns for URL paths to skip |
| includePatterns | string[] | [] | Glob patterns — only matching URLs are captured |
| sameDomainOnly | boolean | true | Only capture URLs on the same domain |
| headless | boolean | true | Run browser in headless mode (crawl strategy) |
| shouldCapture | (url: string) => boolean | — | Programmatic filter — return false to skip a URL |
For collectUrls(), an additional option is available:
| Option | Type | Default | Description |
| ------------ | -------- | ------------ | ------------------------------ |
| outputFile | string | 'urls.txt' | File path to save the URL list |
Screenshot Options (CaptureConfig)
Used by capturePage(), capturePages(), and captureWebsite().
| Option | Type | Default | Description |
| --- | --- | --- | --- |
| outputDir | string | './screenshots' | Directory to save screenshots |
| format | 'webp' \| 'png' \| 'jpeg' | 'webp' | Screenshot image format |
| viewport | { width, height } | { width: 1440, height: 900 } | Browser viewport size |
| fullPage | boolean | false | Capture the full scrollable page instead of just the viewport |
| waitAfterLoad | number | 1000 | Milliseconds to wait after page load before screenshot |
| concurrency | number | 3 | Number of parallel browser pages |
| headless | boolean | true | Run browser in headless mode |
| scrollToEnd | boolean | false | Smoothly scroll to the bottom before screenshot (triggers scroll animations). Auto-enabled when fullPage is true |
| beforeScreenshot | (page: Page) => Promise<void> | — | Callback to run before each screenshot |
captureWebsite() accepts both sets of options combined (CaptureWebsiteConfig).
🚫 Filtering Dynamic Pages
Three mechanisms to control which pages get captured:
1. Exclude Patterns
Skip URLs matching glob patterns:
await captureWebsite({
url: "https://example.com",
excludePatterns: [
"/api/**", // Skip all API routes
"/user/*", // Skip user profile pages
"/*/edit", // Skip edit pages
"*.pdf", // Skip PDF links
"/auth/**", // Skip auth pages
],
});2. Include Patterns (Whitelist)
When set, only matching URLs are captured:
await captureWebsite({
url: "https://example.com",
includePatterns: [
"/en/**", // Only capture English pages
"/products/*", // Only capture product pages
],
});3. Custom Filter Callback
For complex logic:
await captureWebsite({
url: "https://example.com",
shouldCapture: (url) => {
const parsed = new URL(url);
// Skip URLs with query parameters
if (parsed.search) return false;
// Skip paths with more than 4 segments
if (parsed.pathname.split("/").filter(Boolean).length > 4) return false;
return true;
},
});🖥️ CLI Reference
capture — Take Screenshots
npx screenshoter capture <url> [options]| Option | Description | Default |
| ------------------------- | ---------------------------------------- | --------------- |
| -o, --output <dir> | Output directory | ./screenshots |
| -f, --format <format> | Image format: webp, png, jpeg | webp |
| --width <number> | Viewport width | 1440 |
| --height <number> | Viewport height | 900 |
| --max-pages <number> | Maximum pages to crawl | 100 |
| --max-depth <number> | Maximum crawl depth | 5 |
| --exclude <patterns...> | Glob patterns to exclude | — |
| --include <patterns...> | Glob patterns to include | — |
| --concurrency <number> | Parallel browser pages | 3 |
| --wait <number> | Wait ms after page load | 1000 |
| --no-headless | Show browser window | — |
| --urls-file <path> | Use URLs from a file instead of crawling | — |
Examples:
# Basic usage
npx screenshoter capture https://posty5.com
# Custom output and format
npx screenshoter capture https://posty5.com -o ./my-screenshots -f png
# Exclude dynamic pages
npx screenshoter capture https://posty5.com --exclude "/api/**" "/user/*" "/auth/**"
# Only capture specific sections
npx screenshoter capture https://posty5.com --include "/en/**"
# Limit crawl depth and page count
npx screenshoter capture https://posty5.com --max-pages 20 --max-depth 2
# Use a pre-collected URLs file
npx screenshoter capture https://posty5.com --urls-file urls.txtcollect-urls — Collect URLs Only
npx screenshoter collect-urls <url> [options]| Option | Description | Default |
| ------------------------- | ------------------------ | ---------- |
| --file <path> | Output file path | urls.txt |
| --max-pages <number> | Maximum pages to crawl | 100 |
| --max-depth <number> | Maximum crawl depth | 5 |
| --exclude <patterns...> | Glob patterns to exclude | — |
| --include <patterns...> | Glob patterns to include | — |
| --no-headless | Show browser window | — |
Examples:
# Collect all URLs
npx screenshoter collect-urls https://posty5.com
# Save to custom file with filters
npx screenshoter collect-urls https://posty5.com --file sitemap.txt --exclude "/api/**"🔧 Advanced Usage
Scroll Animations (Fade-in on Scroll)
Some pages reveal content as you scroll (using IntersectionObserver or CSS fade-in classes). A viewport-only screenshot would capture those elements as invisible/empty.
Use scrollToEnd: true or the built-in scrollToEnd utility to scroll through the page first:
import { capturePages, scrollToEnd } from "@posty5/screenshoter";
// Option 1 — config flag (auto-enabled when fullPage: true)
const result = await capturePages(urls, {
fullPage: true, // scrollToEnd is automatically true
waitAfterLoad: 1000,
});
// Option 2 — explicit flag
const result = await capturePages(urls, {
scrollToEnd: true,
waitAfterLoad: 1000,
});
// Option 3 — use the scrollToEnd utility as a beforeScreenshot hook
const result = await capturePages(urls, {
fullPage: true,
beforeScreenshot: scrollToEnd,
});Before Screenshot Hook
Dismiss cookie banners, close popups, or interact with the page before taking the screenshot:
await captureWebsite({
url: "https://example.com",
beforeScreenshot: async (page) => {
// Dismiss cookie consent
const cookieBtn = page.locator('button:has-text("Accept")');
if (await cookieBtn.isVisible()) {
await cookieBtn.click();
await page.waitForTimeout(500);
}
},
});Two-Step Workflow: Collect → Edit → Capture
First collect URLs, manually edit the list, then capture only the URLs you want:
# Step 1: Collect all URLs
npx screenshoter collect-urls https://posty5.com --file urls.txt
# Step 2: Edit urls.txt — remove any pages you don't want
# Step 3: Capture only the remaining URLs
npx screenshoter capture https://posty5.com --urls-file urls.txtOr programmatically with separated configs:
import { collectUrls, capturePages } from "@posty5/screenshoter";
// Step 1: Collect URLs (CollectUrlsConfig only)
const urls = await collectUrls({
url: "https://posty5.com",
strategy: "sitemap",
maxPages: 500,
excludePatterns: ["/api/**"],
outputFile: "./urls.txt",
});
// Step 2: Filter programmatically
const staticPages = urls.filter((u) => !u.includes("/trends/"));
// Step 3: Capture with separate config (CaptureConfig only)
const result = await capturePages(staticPages, {
outputDir: "./screenshots",
format: "png",
viewport: { width: 1440, height: 710 },
concurrency: 3,
});📂 Output Structure
The output folder mirrors the URL path structure:
screenshots/
├── posty5.com/
│ ├── capture.webp ← https://posty5.com
│ ├── en/
│ │ ├── capture.webp ← https://posty5.com/en
│ │ ├── social-media-publisher/
│ │ │ └── capture.webp ← https://posty5.com/en/social-media-publisher
│ │ ├── qr-code-generator/
│ │ │ └── capture.webp ← https://posty5.com/en/qr-code-generator
│ │ └── url-shortener/
│ │ └── capture.webp ← https://posty5.com/en/url-shortener
│ └── ar/
│ ├── capture.webp ← https://posty5.com/ar
│ └── social-media-publisher/
│ └── capture.webp ← https://posty5.com/ar/social-media-publisher💻 Requirements
- Node.js: >= 18.0.0
- TypeScript: Full type definitions included
- Browser: Chromium (auto-installed via Playwright)
📖 Resources
- Website: https://posty5.com
- Dashboard: https://studio.posty5.com
- GitHub: https://github.com/Posty5/Screenshoter
- Support: https://posty5.com/contact-us
🆘 Support
- Contact Us: https://posty5.com/contact-us
- GitHub Issues: Report bugs or request features
🤝 Contributing
Contributions are welcome! Please follow these steps:
- Fork the repository
- Create a feature branch:
git checkout -b feature/amazing-feature - Make your changes
- Commit your changes:
git commit -m 'Add amazing feature' - Push to the branch:
git push origin feature/amazing-feature - Submit a pull request
📄 License
MIT License - see LICENSE file for details.
Made with ❤️ by the Posty5 team
