npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

@nelsonlaidev/scoutly

v0.4.0

Published

A fast, lightweight CLI website crawler and SEO analyzer built with Rust.

Downloads

68

Readme

Scoutly

A fast, lightweight CLI website crawler and SEO analyzer built with Rust. Scoutly is inspired by Scrutiny and helps you analyze websites for broken links, SEO issues, and overall site health.

Features

  • Website Crawling: Recursively crawl websites with configurable depth limits
  • Link Checking: Validate all internal and external links, detect broken links (404s, 500s), and record transport failures explicitly
  • SEO Analysis:
    • Check for missing or poorly optimized title tags
    • Validate meta descriptions
    • Detect missing or multiple H1 tags
    • Find images without alt text
    • Identify thin content
  • Configuration Files: Support for JSON, TOML, and YAML configuration files with automatic detection
  • Default TUI + CLI: Launch an interactive terminal UI by default, or force the text/JSON CLI when needed
  • Fast & Concurrent: Built with Tokio for async I/O and parallel link checking
  • robots.txt Support: Respects robots.txt rules by default

Prerequisites

Optional Development Tools

  • Lefthook - Git hooks manager for running linters and formatters automatically

    # macOS
    brew install lefthook
    
    # After installation, initialize hooks
    lefthook install

Installation

From Source

# Clone the repository
git clone https://github.com/nelsonlaidev/scoutly.git
cd scoutly

# Build the project
cargo build --release

# The binary will be at target/release/scoutly

Release Process

Release and packaging instructions live in RELEASE.md.

Usage

Default TUI

# Launch the interactive TUI (default in an interactive terminal)
scoutly

# Launch the TUI with a pre-filled URL and start immediately
scoutly https://example.com

# Specify custom depth and page limits
scoutly https://example.com --depth 3 --max-pages 100

# Force the TUI explicitly
scoutly https://example.com --tui

CLI and JSON Modes

# Force the text report instead of the TUI
scoutly https://example.com --cli

# Output machine-readable JSON instead of launching the TUI
scoutly https://example.com --output json

# Save the final report to a file
scoutly https://example.com --cli --save report.json

More Options

# Follow external links (by default, only internal links are followed)
scoutly https://example.com --external

# Ignore redirect issues in the report
scoutly https://example.com --ignore-redirects

# Treat URLs with fragment identifiers (#) as unique links
scoutly https://example.com --keep-fragments

# Combine options
scoutly https://example.com --cli --depth 4 --max-pages 200 --verbose --ignore-redirects --save report.json

TUI Key Bindings

The default TUI is keyboard-first and intentionally close to tools like llmfit. If you launch scoutly without a URL, the TUI opens a URL input first:

| Key | Action | | -------------------------- | ---------------------- | | j / k or Up / Down | Move within active section | | Tab / Shift-Tab | Switch result section | | / | Enter search mode | | f | Cycle severity filter (By Page) | | s | Cycle sort mode (By Page) | | Enter | Toggle the detail pane | | q / Esc | Quit |

When Scoutly is not attached to an interactive terminal, it automatically falls back to the CLI unless you explicitly pass --tui.

Configuration Files

Scoutly supports configuration files in JSON, TOML, or YAML format. Configuration files allow you to set default values for options without having to specify them on the command line every time.

Default Configuration Paths

Scoutly automatically looks for configuration files in the following locations (in order of priority):

  1. Current directory:

    • scoutly.json
    • scoutly.toml
    • scoutly.yaml
    • scoutly.yml
  2. User config directory:

    • Linux/macOS: ~/.config/scoutly/config.{json,toml,yaml,yml}
    • Windows: %APPDATA%\scoutly\config.{json,toml,yaml,yml}

Example Configuration Files

All configuration fields are optional. You can provide only the fields you want to customize.

JSON (scoutly.json):

{
  "depth": 10,
  "max_pages": 500,
  "cli": true,
  "output": "json",
  "external": true,
  "verbose": true,
  "ignore_redirects": false,
  "keep_fragments": false,
  "rate_limit": 2.0,
  "concurrency": 10,
  "respect_robots_txt": true
}

TOML (scoutly.toml):

depth = 10
max_pages = 500
cli = true
output = "json"
external = true
verbose = true
ignore_redirects = false
keep_fragments = false
rate_limit = 2.0
concurrency = 10
respect_robots_txt = true

YAML (scoutly.yaml):

depth: 10
max_pages: 500
cli: true
output: json
external: true
verbose: true
ignore_redirects: false
keep_fragments: false
rate_limit: 2.0
concurrency: 10
respect_robots_txt: true

Using a Custom Config File

You can specify a custom configuration file path using the --config option:

scoutly https://example.com --config ./my-config.json

Configuration Priority

Command-line arguments always take precedence over configuration file values. For example:

# If scoutly.json sets depth to 10, this command will use depth 15
scoutly https://example.com --depth 15

This allows you to set sensible defaults in your config file while still being able to override them when needed.

Command Line Options

Usage: scoutly [OPTIONS] [URL]

Arguments:
  [URL]  The URL to start crawling from (optional in TUI mode)

Options:
  -d, --depth <DEPTH>              Maximum crawl depth (default: 5)
  -m, --max-pages <MAX_PAGES>      Maximum number of pages to crawl (default: 200)
  -o, --output <OUTPUT>            CLI output format: text or json
      --cli                        Force CLI mode instead of launching the TUI
      --tui                        Force the interactive TUI
  -s, --save <SAVE>                Save report to file
  -e, --external                   Follow external links
  -v, --verbose                    Verbose output
      --ignore-redirects           Ignore redirect issues in the report
      --keep-fragments             Treat URLs with fragment identifiers (#) as unique links
  -r, --rate-limit <RATE_LIMIT>    Rate limit for requests per second
  -c, --concurrency <CONCURRENCY>  Number of concurrent requests (default: 5)
      --respect-robots-txt <RESPECT_ROBOTS_TXT>
                                   Respect robots.txt rules (default: true)
      --config <CONFIG>            Path to configuration file (JSON, TOML, or YAML)
  -h, --help                       Print help

Example Output

Default TUI

Running scoutly https://example.com in an interactive terminal opens the Ratatui dashboard with:

  • a live status/header bar
  • pages / links / error / warning counters
  • four result sections: By Page, By Link URL, By Status, and All Links
  • searchable browsing across the active section
  • page-only severity/sort controls in the By Page section
  • a detail pane for the selected page, link URL group, status bucket, or individual link
  • a footer showing the active mode and available keys

Text Report

================================================================================
Scoutly - Crawl Report
================================================================================

Start URL: https://example.com
Timestamp: 2025-11-03T16:05:29.911833+00:00

Summary
  Total Pages Crawled: 15
  Total Links Found:   127
  Broken Links:        2
  Errors:              3
  Warnings:            8
  Info:                5

Pages with Issues

  URL: https://example.com/about
    Status: 200
    Depth:  1
    Title:  About Us
    Issues:
      [WARN ] Page is missing a meta description
      [WARN ] 3 image(s) missing alt text

  URL: https://example.com/contact
    Status: 200
    Depth:  1
    Title:  Contact
    Issues:
      [ERROR] Broken link: https://example.com/old-page (HTTP 404)

JSON Report

Use --output json to get machine-readable output suitable for integration with other tools or CI/CD pipelines. In JSON mode, Scoutly writes the report JSON to stdout and keeps human-oriented progress/status messages off stdout so the output stays parseable. Link objects also include an optional check_error field when a link fails due to a transport-level error instead of an HTTP response.

How It Works

graph TD
    A([Start URL]) --> B[Crawler]
    B -->|Fetch HTML| C{Parse Page}
    C -->|Extract Links| D[Link Queue]
    C -->|Extract Metadata| E[Page Data]
    D -->|Under Depth Limit?| B
    D --> F[Link Checker]
    F -->|Concurrent Requests| G[Link Status]
    E --> H[SEO Analyzer]
    H -->|Check Rules| I[SEO Issues]
    G --> J[Report Generator]
    I --> J
    J --> K([TUI / CLI / JSON Output])
  1. Crawling: Starting from the provided URL, Scoutly fetches each page and extracts all links from various HTML elements (anchor tags, iframes, media elements, embeds, etc.)
  2. Link Discovery: Internal links (same domain) are queued for crawling based on depth limits
  3. Link Validation: All discovered links are checked asynchronously for HTTP status codes
  4. SEO Analysis: Each page is analyzed for common SEO issues
  5. Report Generation: Results are compiled into a comprehensive report

Link Extraction

Scoutly extracts links from multiple HTML elements:

  • <a href> - Standard hyperlinks
  • <iframe src> - Embedded content
  • <video src> and <source src> - Video content
  • <audio src> - Audio content
  • <embed src> - Embedded plugins
  • <object data> - Embedded objects

SEO Checks Performed

  • Title Tag

    • Missing title
    • Title too short (< 50 characters, recommended: 50-60)
    • Title too long (> 60 characters, recommended: 50-60)
  • Meta Description

    • Missing meta description
    • Description too short (< 150 characters, recommended: 150-160)
    • Description too long (> 160 characters, recommended: 150-160)
  • Headings

    • Missing H1 tag
    • Multiple H1 tags
  • Images

    • Missing alt attributes
  • Content

    • Thin content detection (checks if page has fewer than 5 content indicators)
  • Links

    • Broken links (4xx and 5xx status codes)
    • Redirect detection (3xx status codes)

Performance

  • Asynchronous I/O for fast crawling
  • Concurrent link checking
  • Configurable limits to prevent excessive resource usage
  • Typical crawl speed: 10-20 pages per second (depending on target site and network)

Limitations (Basic Version)

  • No JavaScript rendering (only parses initial HTML)
  • Basic content analysis (no detailed text analysis)
  • No authentication support
  • No sitemap generation (planned for future versions)

Future Enhancements

  • JavaScript rendering with headless browser support
  • Sitemap generation (XML)
  • Authentication support
  • More advanced SEO checks (keyword density, structured data)
  • Additional TUI views and filters
  • HTML validation
  • Accessibility checks (WCAG compliance)
  • PDF and document crawling

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Author

Donation

If you find this project helpful, consider supporting me by sponsoring the project.

License

This project is open source and available under the MIT License.