npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2025 – Pkg Stats / Ryan Hefner

@nelsonlaidev/scoutly

v0.2.0

Published

A fast, lightweight CLI website crawler and SEO analyzer built with Rust.

Readme

Scoutly

A fast, lightweight CLI website crawler and SEO analyzer built with Rust. Scoutly is inspired by Scrutiny and helps you analyze websites for broken links, SEO issues, and overall site health.

Features

  • Website Crawling: Recursively crawl websites with configurable depth limits
  • Link Checking: Validate all internal and external links, detect broken links (404s, 500s)
  • SEO Analysis:
    • Check for missing or poorly optimized title tags
    • Validate meta descriptions
    • Detect missing or multiple H1 tags
    • Find images without alt text
    • Identify thin content
  • Configuration Files: Support for JSON, TOML, and YAML configuration files with automatic detection
  • Flexible Reporting: Output results in human-readable text or JSON format
  • Fast & Concurrent: Built with Tokio for async I/O and parallel link checking
  • robots.txt Support: Respects robots.txt rules by default

Prerequisites

Optional Development Tools

  • Lefthook - Git hooks manager for running linters and formatters automatically
    # macOS
    brew install lefthook
    
    # After installation, initialize hooks
    lefthook install

Installation

From Source

# Clone the repository
git clone https://github.com/nelsonlaidev/scoutly.git
cd scoutly

# Build the project
cargo build --release

# The binary will be at target/release/scoutly

Usage

Basic Usage

# Crawl a website with default settings (depth: 5, max pages: 200)
scoutly https://example.com

# Specify custom depth and page limits
scoutly https://example.com --depth 3 --max-pages 100

# Enable verbose output to see progress
scoutly https://example.com --verbose

Advanced Options

# Follow external links (by default, only internal links are followed)
scoutly https://example.com --external

# Ignore redirect issues in the report
scoutly https://example.com --ignore-redirects

# Treat URLs with fragment identifiers (#) as unique links
scoutly https://example.com --keep-fragments

# Output results in JSON format
scoutly https://example.com --output json

# Save report to a file
scoutly https://example.com --save report.json

# Combine options
scoutly https://example.com --depth 4 --max-pages 200 --verbose --ignore-redirects --save report.json

Configuration Files

Scoutly supports configuration files in JSON, TOML, or YAML format. Configuration files allow you to set default values for options without having to specify them on the command line every time.

Default Configuration Paths

Scoutly automatically looks for configuration files in the following locations (in order of priority):

  1. Current directory:

    • scoutly.json
    • scoutly.toml
    • scoutly.yaml
    • scoutly.yml
  2. User config directory:

    • Linux/macOS: ~/.config/scoutly/config.{json,toml,yaml,yml}
    • Windows: %APPDATA%\scoutly\config.{json,toml,yaml,yml}

Example Configuration Files

All configuration fields are optional. You can provide only the fields you want to customize.

JSON (scoutly.json):

{
  "depth": 10,
  "max_pages": 500,
  "output": "json",
  "external": true,
  "verbose": true,
  "ignore_redirects": false,
  "keep_fragments": false,
  "rate_limit": 2.0,
  "concurrency": 10,
  "respect_robots_txt": true
}

TOML (scoutly.toml):

depth = 10
max_pages = 500
output = "json"
external = true
verbose = true
ignore_redirects = false
keep_fragments = false
rate_limit = 2.0
concurrency = 10
respect_robots_txt = true

YAML (scoutly.yaml):

depth: 10
max_pages: 500
output: json
external: true
verbose: true
ignore_redirects: false
keep_fragments: false
rate_limit: 2.0
concurrency: 10
respect_robots_txt: true

Using a Custom Config File

You can specify a custom configuration file path using the --config option:

scoutly https://example.com --config ./my-config.json

Configuration Priority

Command-line arguments always take precedence over configuration file values. For example:

# If scoutly.json sets depth to 10, this command will use depth 15
scoutly https://example.com --depth 15

This allows you to set sensible defaults in your config file while still being able to override them when needed.

Command Line Options

Usage: scoutly [OPTIONS] <URL>

Arguments:
  <URL>  The URL to start crawling from

Options:
  -d, --depth <DEPTH>              Maximum crawl depth (default: 5)
  -m, --max-pages <MAX_PAGES>      Maximum number of pages to crawl (default: 200)
  -o, --output <OUTPUT>            Output format: text or json [default: text]
  -s, --save <SAVE>                Save report to file
  -e, --external                   Follow external links
  -v, --verbose                    Verbose output
      --ignore-redirects           Ignore redirect issues in the report
      --keep-fragments             Treat URLs with fragment identifiers (#) as unique links
  -r, --rate-limit <RATE_LIMIT>    Rate limit for requests per second
  -c, --concurrency <CONCURRENCY>  Number of concurrent requests (default: 5)
      --respect-robots-txt         Respect robots.txt rules (default: true)
      --config <CONFIG>            Path to configuration file (JSON, TOML, or YAML)
  -h, --help                       Print help

Example Output

Text Report

================================================================================
Scoutly - Crawl Report
================================================================================

Start URL: https://example.com
Timestamp: 2025-11-03T16:05:29.911833+00:00

Summary
  Total Pages Crawled: 15
  Total Links Found:   127
  Broken Links:        2
  Errors:              3
  Warnings:            8
  Info:                5

Pages with Issues

  URL: https://example.com/about
    Status: 200
    Depth:  1
    Title:  About Us
    Issues:
      [WARN ] Page is missing a meta description
      [WARN ] 3 image(s) missing alt text

  URL: https://example.com/contact
    Status: 200
    Depth:  1
    Title:  Contact
    Issues:
      [ERROR] Broken link: https://example.com/old-page (HTTP 404)

JSON Report

Use --output json to get machine-readable output suitable for integration with other tools or CI/CD pipelines.

How It Works

  1. Crawling: Starting from the provided URL, Scoutly fetches each page and extracts all links from various HTML elements (anchor tags, iframes, media elements, embeds, etc.)
  2. Link Discovery: Internal links (same domain) are queued for crawling based on depth limits
  3. Link Validation: All discovered links are checked asynchronously for HTTP status codes
  4. SEO Analysis: Each page is analyzed for common SEO issues
  5. Report Generation: Results are compiled into a comprehensive report

Link Extraction

Scoutly extracts links from multiple HTML elements:

  • <a href> - Standard hyperlinks
  • <iframe src> - Embedded content
  • <video src> and <source src> - Video content
  • <audio src> - Audio content
  • <embed src> - Embedded plugins
  • <object data> - Embedded objects

SEO Checks Performed

  • Title Tag

    • Missing title
    • Title too short (< 50 characters, recommended: 50-60)
    • Title too long (> 60 characters, recommended: 50-60)
  • Meta Description

    • Missing meta description
    • Description too short (< 150 characters, recommended: 150-160)
    • Description too long (> 160 characters, recommended: 150-160)
  • Headings

    • Missing H1 tag
    • Multiple H1 tags
  • Images

    • Missing alt attributes
  • Content

    • Thin content detection (checks if page has fewer than 5 content indicators)
  • Links

    • Broken links (4xx and 5xx status codes)
    • Redirect detection (3xx status codes)

Performance

  • Asynchronous I/O for fast crawling
  • Concurrent link checking
  • Configurable limits to prevent excessive resource usage
  • Typical crawl speed: 10-20 pages per second (depending on target site and network)

Limitations (Basic Version)

  • No JavaScript rendering (only parses initial HTML)
  • Basic content analysis (no detailed text analysis)
  • No authentication support
  • No sitemap generation (planned for future versions)

Future Enhancements

  • JavaScript rendering with headless browser support
  • Sitemap generation (XML)
  • Authentication support
  • More advanced SEO checks (keyword density, structured data)
  • Progress bar for long-running crawls
  • HTML validation
  • Accessibility checks (WCAG compliance)
  • PDF and document crawling

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Author

Donation

If you find this project helpful, consider supporting me by sponsoring the project.

License

This project is open source and available under the MIT License.