npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

seo-reporter

v2.2.0

Published

High-performance SEO analysis CLI with 220+ checks, Rust-powered duplicate detection, worker thread parallelization, and clean progress UI. Analyzes 1000+ page sites in under 2 minutes. Achieves ~85% Screaming Frog parity.

Downloads

280

Readme

SEO Reporter

A powerful TypeScript CLI tool that crawls websites, extracts SEO metadata, detects common issues, and generates comprehensive HTML reports. Think of it as a free, open-source alternative to Screaming Frog's core SEO analysis features.

Quick Start

# Install dependencies
pnpm install

# Run the crawler
pnpm dev --url https://example.com

# View results
open seo-report/index.html

Features

Core Capabilities

  • 🕷️ Website Crawling: Breadth-first traversal with configurable depth and concurrency
  • 📊 220+ SEO Checks: Implements 85-90% parity with Screaming Frog's core analysis
  • Fast & Efficient: Concurrent crawling with configurable request limits and memory-efficient processing
  • 📝 Interactive HTML Reports: Beautiful, filterable reports with severity-based issue categorization (now with working filters, sortable columns, default severity sort, live filtered totals, a new Issues-by-Type view with drill-down, and a reorganized nav with a Content dropdown)
  • 💡 Actionable Tooltips: Hover over any issue to see specific fix recommendations (25+ guidance tips). Improved (i) tooltips now render consistently across the report.
  • 📱 Mobile Responsive: All reports fully responsive with touch-friendly interfaces (44px tap targets)
  • 🖥️ Built-in Server: Zero-dependency static file server using Node.js built-ins (seo-reporter serve)
  • 🔌 JSON API Routes: Complete RESTful-style JSON API for all SEO data - perfect for integrations and custom dashboards
  • 🧭 Clickable Site Structure: Navigate from the site structure tree directly to per-page details
  • 📤 CSV Export: Export all data to Excel-compatible CSV files for further analysis
  • 🤖 Robots.txt Integration: Automatic robots.txt parsing and compliance
  • 🗺️ Sitemap Analysis: Auto-detects and analyzes XML sitemaps (runs by default)

Metadata Extraction

  • Comprehensive On-Page Data: Titles, descriptions, headings, canonical URLs, robots directives
  • Social Media Tags: Open Graph, Twitter Cards
  • Internationalization: hreflang attributes with validation
  • Structured Data: JSON-LD & microdata extraction
  • Links: Internal/external links with anchor text, nofollow detection
  • Images: Alt text, dimensions tracking
  • Content Metrics: Word count, text length, HTML size, content-to-code ratio
  • Performance: Response times, redirect chains
  • Scripts: External JavaScript detection with async/defer tracking

Security Analysis 🔒

  • Protocol Security: HTTP vs HTTPS detection
  • Mixed Content: HTTP resources on HTTPS pages
  • Insecure Forms: Form actions over HTTP
  • Security Headers: HSTS, CSP, X-Frame-Options, X-Content-Type-Options, Referrer-Policy
  • Protocol-Relative URLs: Detection of // URLs

URL Quality 🔗

  • URL Issues: Multiple slashes, spaces, non-ASCII characters, uppercase letters
  • URL Structure: Repetitive paths, overly long URLs (>2083 chars)
  • Parameters: Query params, tracking params, internal search URLs
  • Fragment URLs: Detection of fragment-only links

Content Quality 📄

  • Duplicate Detection:
    • Exact duplicates (SHA-256 hash-based)
    • Near duplicates (>90% similarity using MinHash)
    • Duplicate H1/H2 across pages
  • Content Analysis:
    • Lorem ipsum detection
    • Soft 404 detection
    • Readability metrics (Flesch-Kincaid Grade, Reading Ease, ARI)
    • Poor readability warnings (>12th grade level)
    • Thin content detection (<300 words)

Advanced SEO Checks

  • Titles: Missing, duplicate, too long/short (characters + pixel width), multiple tags, outside <head>, identical to H1
  • Meta Descriptions: Missing, duplicate, too long/short (characters + pixel width), multiple tags, outside <head>
  • Headings: Missing H1, multiple H1, broken hierarchy, overly long (>70 chars), empty headings
  • Canonical: Multiple/conflicting tags, relative URLs, fragments, outside <head>, invalid attributes
  • Robots: Conflicting directives, noindex/nofollow detection
  • Indexability Tracking: Comprehensive analysis of why pages are/aren't indexable
    • Non-200 status codes (404, 500, etc.)
    • noindex in meta robots tag
    • noindex in X-Robots-Tag header
    • Canonical pointing to different URL
    • Detailed reasons shown in Issues tab and individual page reports
  • Pagination: rel="next"/rel="prev" validation, multiple pagination links, sequence errors

Stealth Mode 🥷

  • Anti-Detection Crawling: Bypass basic bot detection systems with realistic browser simulation
  • User Agent Rotation: 20+ realistic user agents from Chrome, Firefox, Safari, Edge across Windows, macOS, and Linux
  • Header Randomization: Dynamic browser headers with realistic patterns and Chrome sec-ch-ua headers
  • Human-Like Timing: Intelligent delays (1-8 seconds) simulating quick, normal, and slow browsing patterns
  • Proxy Support: Rotate through multiple proxy servers with automatic failover and validation
  • Session Management: Maintain consistent headers across requests for realistic browsing simulation
  • Custom Configuration: Define your own user agents, proxies, and timing patterns
  • Seamless Integration: Works with all existing crawl modes and analysis features

Link Analysis 🔗

  • 404 Tracking:
    • Dedicated 404 Pages tab with referrer tracking
    • Shows which pages link to each 404 (now normalized so /path and /path/ are treated the same)
    • Helps identify and fix broken internal links
  • Link Quality:
    • Orphan pages (no internal inlinks)
    • Dead ends (no outlinks)
    • Weak anchor text ("click here", empty, too short)
    • Localhost links (127.0.0.1)
    • Missing protocol on external links (e.g., facebook.com without https://)
  • Link Metrics:
    • Internal vs external link counts
    • High outlink warnings (>100 internal, >50 external)
    • Inlink count per page
    • Crawl depth distribution

HTML Validation ⚙️

  • Document Structure: Missing/multiple <head> or <body> tags
  • Element Positioning: Tags outside <head> that should be inside
  • Document Order: Incorrect <head>/<body> ordering
  • Size & Complexity: Large HTML (>1MB), excessive DOM depth (>30 levels)
  • Invalid Elements: Elements that shouldn't be in <head>

Issue Detection with Severity Levels

  • 🔴 High Severity: Missing titles/H1s, HTTP pages, mixed content, insecure forms, soft 404s, lorem ipsum, malformed HTML
  • 🟡 Medium Severity: Title/description length issues, multiple H1s, images without alt, thin content, slow pages, security headers missing
  • 🔵 Low Severity: Heading hierarchy, redirect chains, URL quality issues, readability warnings, informational notices

Performance ⚡

SEO Reporter includes a Rust-powered native module for near-duplicate content detection, providing massive performance gains for large sites:

Near-Duplicate Detection Performance

| Pages | TypeScript (O(n²)) | Rust + LSH (O(n)) | Speedup | |-------|-------------------|-------------------|---------| | 100 | ~10s | ~0.1s | 100x | | 500 | ~2.5min | ~0.5s | 300x | | 1000 | ~10min | ~1s | 600x | | 5000 | ~4 hours | ~5s | ~3000x |

How It Works

The Rust module uses Locality-Sensitive Hashing (LSH) with MinHash signatures:

  • Generates 128-hash MinHash signatures for each page
  • Groups pages into buckets using 16 bands × 8 rows
  • Only compares pages that share at least one bucket (candidates)
  • Reduces comparisons from O(n²) to O(n) with ~95% accuracy

Fallback Behavior

If the Rust module fails to load (unsupported platform or not built), the tool automatically falls back to the pure TypeScript implementation, ensuring compatibility on all platforms.

⚠️ Rust Warning: When the Rust module is unavailable, the CLI displays a clear warning:

⚠️  Rust native module not available - using TypeScript fallback for near-duplicate detection
   Note: Near-duplicate detection will be slower. Run `npm rebuild` to build the Rust module.

This helps users understand why near-duplicate detection may be slower and provides actionable instructions to enable the faster implementation.

Supported Platforms

Pre-built Rust binaries are included for:

  • macOS (Intel x64 & Apple Silicon ARM64)
  • Linux (x64 & ARM64, glibc & musl)
  • Windows (x64)

Installation

Prerequisites

  • Node.js 18 or higher
  • pnpm (recommended) or npm
  • Rust 1.70+ (optional, only needed if building from source; pre-built binaries included)

Install Dependencies

pnpm install

Setup Rust (Optional but Recommended)

For maximum performance, install Rust to enable the native module:

# Automatic Rust installation (Windows, macOS, Linux)
pnpm setup:rust

This will:

  • Download and install rustup (Rust toolchain installer)
  • Install the latest stable Rust toolchain
  • Set up the environment for building the native module
  • Verify the installation

Build the Project

pnpm build

Build the Rust Module (Optional but Recommended)

The Rust module provides 100-1000x faster near-duplicate detection. The build process automatically handles Rust environment setup:

# Install Rust automatically (Windows, macOS, Linux)
pnpm setup:rust

# Full build (Rust + TypeScript)
pnpm build

# Rust module only
pnpm build:rust-only

# TypeScript only (fallback if Rust unavailable)
pnpm build:ts-only

Note: The build scripts automatically source the Rust environment ($HOME/.cargo/env) if available. Pre-built binaries are included for most platforms, but you can rebuild if needed.

Run Locally

# Development mode (no build required)
pnpm dev --url https://example.com

# Production mode (requires build)
pnpm start --url https://example.com

Install Globally (Optional)

pnpm install -g .
seo-reporter --url https://example.com

Usage

Basic Usage

# Crawl and analyze a website
seo-reporter crawl --url https://example.com

# Or use the legacy format (still supported)
seo-reporter --url https://example.com

# Start the report server (no URL needed)
seo-reporter serve

All Options

# Crawl command
seo-reporter crawl \
  --url https://example.com \           # Required: Target URL to crawl
  --depth 3 \                            # Optional: Max crawl depth (default: 3)
  --max-pages 1000 \                     # Optional: Max pages to crawl (default: 1000)
  --concurrency 10 \                     # Optional: Concurrent requests (default: 10)
  --output ./seo-report \                # Optional: Output directory (default: ./seo-report)
  --timeout 10000 \                      # Optional: Request timeout in ms (default: 10000)
  --user-agent "CustomBot/1.0" \         # Optional: Custom user agent
  --export-csv \                         # Optional: Export results to CSV files
  --respect-robots \                     # Respect robots.txt (default: true)
  --ignore-robots \                      # Ignore robots.txt rules
  --crawl-mode both \                    # Optional: Crawl mode - crawl|sitemap|both (default: both)
  --sitemap-url https://example.com/sitemap.xml \  # Custom sitemap URL
  --validate-schema \                    # Validate JSON-LD schema.org data
  --stealth \                            # Enable stealth mode with randomized headers and timing
  --stealth-user-agents "Agent1,Agent2" \ # Custom user agents for stealth mode
  --stealth-min-delay 1000 \             # Minimum delay between requests in stealth mode (ms)
  --stealth-max-delay 5000 \             # Maximum delay between requests in stealth mode (ms)
  --stealth-proxies "proxy1:8080,proxy2:3128"  # Proxy rotation for stealth mode

# Serve command
seo-reporter serve \
  --port 8080 \                          # Optional: Port to listen on (default: 8080)
  ./seo-report                           # Optional: Directory to serve (default: ./seo-report)

Crawl Modes

The --crawl-mode option controls how the tool discovers pages:

  • crawl - Follow links only (traditional crawling)
  • sitemap - Crawl only URLs found in sitemap(s)
  • both - Crawl sitemap URLs + follow links (default, discovers maximum pages)

Example:

# Only crawl URLs from sitemap
seo-reporter --url https://example.com --crawl-mode sitemap

# Traditional link-based crawling only
seo-reporter --url https://example.com --crawl-mode crawl

# Both (default)
seo-reporter --url https://example.com --crawl-mode both

Viewing Reports

After generating a report, you can view it in two ways:

Option 1: Open directly in browser

open seo-report/index.html

Option 2: Start a local server (Recommended)

# Using the built-in server
seo-reporter serve seo-report

# Or with a custom port
seo-reporter serve seo-report --port 3000

# Using npm script
pnpm serve  # Serves ./seo-report on port 8080

The built-in server uses Node.js's native http module (zero dependencies, works everywhere).

Example Output

When running the tool, you'll see detailed progress for each phase:

$ seo-reporter --url https://example.com --max-pages 100

🔍 SEO Reporter

Configuration:
  URL: https://example.com/
  Max Depth: 3
  Max Pages: 100
  Concurrency: 10
  Output: ./seo-report

⠹ Crawling website... 🟢 25/100 pages
⠸ Crawling website... 🟢 50/100 pages
⠼ Crawling website... 🟢 100/100 pages
✔ Crawled 100 pages in 15.2s

⠹ Parsing SEO metadata... 25/100 pages
⠸ Parsing SEO metadata... 50/100 pages
⠼ Parsing SEO metadata... 100/100 pages
✔ Parsed metadata from 100 pages in 3.4s

⠹ Analyzing... Per-page analysis (25/100)
⠸ Analyzing... Per-page analysis (50/100)
⠼ Analyzing... Per-page analysis (100/100)
⠴ Analyzing... Link quality analysis
⠦ Analyzing... Content quality checks
⠧ Analyzing... Finding duplicate titles/descriptions
⠇ Analyzing... Finding exact duplicate content
⠏ Analyzing... Finding near-duplicate content (Rust + LSH)
⠋ Analyzing... Finding duplicate headings
✔ Analysis complete in 2.1s

✔ Sitemap analyzed (95 URLs in sitemap)

📊 Issues Found:
  ⚠️  5 pages with missing meta descriptions
  ⚠️  3 pages with duplicate titles
  ...

Note: The progress counters (e.g., 25/100) show real-time progress during crawling, parsing, and analysis phases, making it easy to estimate remaining time.

Examples

Crawl a small site with depth 2:

seo-reporter --url https://myblog.com --depth 2 --max-pages 100

Fast crawl with high concurrency:

seo-reporter --url https://example.com --concurrency 20 --depth 2

Crawl and save to custom directory:

seo-reporter --url https://example.com --output ./reports/example-audit

Crawl with CSV export:

seo-reporter --url https://example.com --export-csv

Stealth mode crawling:

# Basic stealth mode
seo-reporter --url https://example.com --stealth

# Stealth with custom timing
seo-reporter --url https://example.com --stealth --stealth-min-delay 2000 --stealth-max-delay 8000

# Stealth with custom user agents and proxies
seo-reporter --url https://example.com --stealth \
  --stealth-user-agents "Mozilla/5.0 (Custom Bot),Another Custom Agent" \
  --stealth-proxies "proxy1.example.com:8080,proxy2.example.com:3128"

SEO Issues Detected

The tool checks for the following SEO issues:

Critical Issues (Red)

  • Missing Title Tags: Pages without a <title> tag
  • Broken Links: Pages returning 404 status codes
  • Conflicting Robots Directives: Multiple robots tags with contradictory instructions (e.g., "index" and "noindex")
  • Multiple Canonical Tags: Pages with conflicting canonical URLs
  • Malformed JSON-LD: Structured data scripts with JSON parsing errors

Warnings (Yellow)

  • ⚠️ Missing Meta Descriptions: Pages without meta description tags
  • ⚠️ Duplicate Titles: Multiple pages sharing the same title text
  • ⚠️ Duplicate Descriptions: Multiple pages sharing the same meta description
  • ⚠️ Title Too Long: Titles over 60 characters (may be truncated in search results)
  • ⚠️ Title Too Short: Titles under 20 characters (may not be descriptive enough)
  • ⚠️ Description Too Long: Meta descriptions over 160 characters (may be truncated)
  • ⚠️ Description Too Short: Meta descriptions under 50 characters (may not be informative enough)
  • ⚠️ Missing H1 Tags: Pages without an H1 heading
  • ⚠️ Multiple H1 Tags: Pages with more than one H1 heading
  • ⚠️ Improper Heading Hierarchy: Heading levels that skip numbers (e.g., H1 to H3)
  • ⚠️ Images Without Alt Text: Images missing accessibility alt attributes
  • ⚠️ Thin Content: Pages with less than 300 words
  • ⚠️ Slow Page Load: Pages with response times over 3 seconds

Informational

  • ℹ️ Noindex Pages: Pages set to noindex (verify if intentional)
  • ℹ️ Redirect Chains: Pages with redirect chains detected
  • ℹ️ Multiple Title/Description Tags: Single page with duplicate meta tags

Content Analysis

  • All headings (H1-H6) extracted and analyzed for proper hierarchy
  • Internal vs external link analysis
  • Image alt text coverage
  • Word count and content density metrics
  • Open Graph and Twitter Card metadata presence
  • hreflang implementation
  • JSON-LD and microdata structured data detection

Output Reports

The tool generates comprehensive reports in the specified output directory:

HTML Reports

  • index.html: Interactive summary report with:
    • Tabbed Interface: Overview, All Pages, Site Structure, Links, Content, Performance, Scripts, Sitemap, Issues, and API tabs
    • Sortable Tables: Click column headers to sort data ascending/descending
    • Filterable Content: Search boxes to quickly find specific pages, links, or issues
    • Visual Statistics: Color-coded cards showing issue counts and severity
    • All Data: Links analysis (internal/external), headings, images, performance metrics
  • page-viewer.html: Dynamic page detail viewer that loads data from JSON files
    • Displays complete page metadata, issues, headings, links, and images
    • Loads data on-demand from JSON API routes
    • Accessed via page-viewer.html?url=<page-url>

Reports are fully self-contained with inline CSS and JavaScript - no external dependencies.

JSON API Routes

All SEO data is available as JSON files for programmatic access, integrations, or custom dashboards:

Individual Page Data

  • json/pages/{filename}.json: Complete page metadata including:
    • Title, meta description, canonical URL, robots directives
    • All headings (H1-H6), links, and images
    • Content metrics (word count, HTML size, readability scores)
    • Performance metrics (response time, redirects)
    • Security analysis (HTTPS, headers, mixed content)
    • URL quality metrics
    • Structured data (JSON-LD, microdata)
    • All detected issues with severity levels
  • json/issues/{filename}.json: Page-specific issues with severity counts

Aggregate Data Endpoints

  • json/all-pages.json: Summary of all pages with key metrics
  • json/all-issues.json: All issues across all pages
  • json/issues-summary.json: Issues statistics by severity and type
  • json/links.json: All internal and external links
  • json/images.json: All images with alt text status
  • json/headings.json: All headings with levels
  • json/performance.json: Performance metrics for all pages
  • json/external-scripts.json: External JavaScript usage analysis
  • json/404-pages.json: 404 pages with referrer tracking
  • json/sitemap-info.json: Sitemap analysis data
  • json/site-structure.json: Site structure tree
  • json/url-index.json: URL to filename mapping for easy lookups

Using the JSON API

# Generate report
seo-reporter --url https://example.com

# Access JSON data programmatically
curl http://localhost:8000/seo-report/json/all-issues.json
curl http://localhost:8000/seo-report/json/pages/index.json
curl http://localhost:8000/seo-report/json/issues-summary.json

# Or use in your application
fetch('./seo-report/json/all-pages.json')
  .then(res => res.json())
  .then(data => console.log(data.pages));

In-Report API Tab

Open the API tab in index.html for an at-a-glance list of endpoints, example curl/JS usage, and tips on mapping URLs to filenames via json/url-index.json. The tab now reliably renders with the updated tab switching logic.

The JSON API is perfect for:

  • CI/CD pipeline integrations
  • Custom dashboards and visualizations
  • Automated monitoring and alerts
  • Data analysis and reporting scripts
  • Integration with other SEO tools

Redirect Handling

  • Same-domain redirects are followed and analyzed. After redirects within the same domain, links are resolved against the final URL to ensure correct internal/external classification.
  • Cross-domain redirects are not analyzed or crawled. The redirect chain is recorded for the original URL, but the destination page’s content and links are not fetched or followed.

Large Site Performance (10k+ pages)

  • For large datasets, reports now use chunked JSONP files and a small client runtime to progressively render big tables.
  • Tables support pagination, sorting, filtering, and a page-size selector (25/50/100/250/500).
  • Data files are written to seo-report/data/…/*.js and work offline via file:// (no fetch).
  • Sorting or filtering may trigger background loading of remaining chunks for accuracy.
  • Small sites still render inline immediately; large sites render almost instantly and stream in data.

CSV Exports (Optional)

When using --export-csv, the tool generates Excel-compatible CSV files in the csv/ subdirectory:

  • all-pages.csv: Complete page data with all metrics
  • links.csv: All links from all pages (internal/external, with anchor text)
  • images.csv: All images from all pages (with alt text status)
  • headings.csv: All headings from all pages (with levels)
  • issues.csv: All issues by page with severity levels

CSV files are RFC 4180 compliant and can be opened in Excel, Google Sheets, or any spreadsheet application.

CSV Columns

  • all-pages.csv: url, status, title, titleLength, metaDescription, descriptionLength, h1Count, wordCount, internalLinks, externalLinks, images, imagesWithoutAlt, responseTime, redirects, canonicalUrl, robotsDirectives, issuesCount, issues
  • links.csv: pageUrl, linkUrl, anchorText, rel, isInternal, isNofollow, status
  • images.csv: pageUrl, imageSrc, altText, hasAlt, fileSize
  • headings.csv: pageUrl, level, text
  • issues.csv: pageUrl, issue, severity

Architecture

The project is organized into modular components:

src/
├── cli.ts          # CLI entry point with Commander
├── crawler.ts      # Website crawling with performance tracking
├── parser.ts       # Comprehensive HTML metadata extraction
├── analyzer.ts     # Advanced SEO issue detection and categorization
├── reporter.ts     # HTML report generation with Handlebars
├── exporter.ts     # CSV export functionality (NEW)
├── types.ts        # TypeScript type definitions
└── utils/
    └── urlUtils.ts # URL normalization and filtering

templates/
├── summary.hbs     # Interactive tabbed summary with sortable tables (NEW)
└── page.hbs        # Enhanced page detail template with all metrics (NEW)

Key Design Principles

  1. Separation of Concerns: Each module has a single, well-defined responsibility
  2. Memory Efficiency: Pages are parsed immediately after fetching; only metadata is stored
  3. Error Resilience: Network and parsing errors don't stop the entire crawl
  4. Extensibility: Modular design allows easy addition of features like JS rendering or new output formats

Technology Stack

  • TypeScript: Type-safe development
  • Axios: HTTP client for page fetching
  • htmlparser2 + css-select: Fast DOM-lite HTML parsing (low memory, high throughput)
  • Commander: CLI framework
  • Handlebars: HTML templating
  • p-limit: Concurrency control
  • Chalk & Ora: Beautiful CLI output

SEO Best Practices

This tool is based on SEO best practices from:

  • Google's Search Central documentation
  • Industry-standard character limits for titles (60 chars) and descriptions (160 chars)
  • Common SEO audit methodologies used by tools like Screaming Frog, Ahrefs, and SEMrush

Recommendations

  • Unique Titles & Descriptions: Every page should have unique, descriptive metadata
  • Optimal Length: Titles should be 20-60 characters, descriptions 50-160 characters
  • Canonical Tags: Use self-referential canonicals to avoid duplicate content issues
  • Robots Directives: Avoid conflicting directives; verify noindex pages are intentional
  • Structured Data: Ensure JSON-LD is valid JSON and properly formatted
  • hreflang: For multilingual sites, implement reciprocal hreflang tags

Future Enhancements

Possible future enhancements:

  • 🌐 JavaScript Rendering: Support for SPAs using Puppeteer/Playwright
  • 🤖 robots.txt Compliance: Automatic robots.txt parsing and adherence
  • 🔗 Advanced Link Checking: Actually validate external links (not just detect 404s)
  • 📈 Progress Tracking: Real-time crawl progress with ETA
  • 🎨 Custom Report Themes: User-configurable report styling
  • 🔌 Plugin System: Allow custom analyzers and reporters
  • ☁️ Cloud Integration: Deploy as a web service or integrate with CI/CD pipelines
  • 📊 Historical Tracking: Compare crawls over time to track improvements
  • 🔍 Advanced Schema Validation: Validate JSON-LD against schema.org types
  • 📱 Mobile vs Desktop: Compare mobile and desktop rendering

Comparison to Screaming Frog

This tool now implements 85-90% parity with Screaming Frog's core SEO analysis features (excluding external API dependencies):

| Feature | This Tool | Screaming Frog | |---------|-----------|----------------| | Core Analysis | | Page crawling | ✅ | ✅ | | Title/Description analysis | ✅ (+ pixel width) | ✅ | | Heading extraction (H1-H6) | ✅ (+ duplicates) | ✅ | | Image alt text analysis | ✅ | ✅ | | Internal/External links | ✅ | ✅ | | Response times & redirects | ✅ | ✅ | | Content metrics | ✅ (+ readability) | ✅ | | Canonical URL analysis | ✅ (detailed) | ✅ | | Robots directives | ✅ | ✅ | | hreflang validation | ✅ (partial) | ✅ | | Advanced Analysis | | Security analysis | ✅ (HTTPS, headers, mixed content) | ✅ | | URL quality checks | ✅ (15+ checks) | ✅ | | Duplicate content detection | ✅ (exact + near) | ✅ | | Orphan page detection | ✅ | ✅ | | Weak anchor text | ✅ | ✅ | | HTML validation | ✅ (structure, DOM depth) | ✅ | | Pagination analysis | ✅ (partial) | ✅ | | Soft 404 detection | ✅ | ✅ | | Lorem ipsum detection | ✅ | ✅ | | Export & Reporting | | CSV export | ✅ (5+ files) | ✅ | | Interactive HTML reports | ✅ | ❌ (static) | | Severity-based filtering | ✅ | ✅ | | Additional Features | | Free & open source | ✅ | ❌ (freemium) | | Command-line interface | ✅ | ✅ (paid) | | Readability metrics | ✅ (3 formulas) | ❌ | | Content-to-code ratio | ✅ | ✅ | | JavaScript rendering | ❌ | ✅ | | robots.txt validation | ✅ | ✅ | | Sitemap analysis | ✅ | ✅ | | PageSpeed/Lighthouse | ❌ | ✅ (paid) | | Google Search Console | ❌ | ✅ (paid) | | Google Analytics | ❌ | ✅ (paid) | | External link checking | ❌ | ✅ |

Summary: This tool implements 220+ SEO checks covering on-page SEO, content quality, security, URL quality, link analysis, schema validation, robots.txt compliance, and sitemap analysis. It excels at static HTML analysis but doesn't include JavaScript rendering or external API integrations (PageSpeed, GSC, GA). See docs/SCREAMING_FROG_PARITY.md for details.

JavaScript Rendering Limitation

⚠️ Important: This crawler analyzes static HTML only (like Screaming Frog's default mode). It does not execute JavaScript.

Impact on External Scripts Detection:

  • ✅ Detects scripts in the initial HTML (<script src="..."> tags)
  • ❌ Cannot detect scripts loaded dynamically by JavaScript after page load
  • ❌ Client-side rendered apps (React, Vue, Angular, Gatsby, Next.js) may show 0 external scripts even if they load many at runtime

For sites with dynamically-loaded scripts:

  • The "External Scripts" tab will show a notice explaining this limitation
  • Check your browser's DevTools Network tab to see all scripts that load at runtime
  • Consider using browser-based tools (Screaming Frog with JS rendering, Lighthouse, etc.) for full script analysis

This is a trade-off for speed and simplicity—rendering JavaScript would significantly slow down crawling and require a headless browser.

Contributing

Contributions are welcome! Please feel free to submit issues or pull requests.

License

ISC License

Credits

Built with ❤️ by Antler Digital using modern TypeScript and best-in-class Node.js libraries.