@nelsonlaidev/scoutly
v0.4.0
Published
A fast, lightweight CLI website crawler and SEO analyzer built with Rust.
Downloads
68
Maintainers
Readme
Scoutly
A fast, lightweight CLI website crawler and SEO analyzer built with Rust. Scoutly is inspired by Scrutiny and helps you analyze websites for broken links, SEO issues, and overall site health.
Features
- Website Crawling: Recursively crawl websites with configurable depth limits
- Link Checking: Validate all internal and external links, detect broken links (404s, 500s), and record transport failures explicitly
- SEO Analysis:
- Check for missing or poorly optimized title tags
- Validate meta descriptions
- Detect missing or multiple H1 tags
- Find images without alt text
- Identify thin content
- Configuration Files: Support for JSON, TOML, and YAML configuration files with automatic detection
- Default TUI + CLI: Launch an interactive terminal UI by default, or force the text/JSON CLI when needed
- Fast & Concurrent: Built with Tokio for async I/O and parallel link checking
- robots.txt Support: Respects robots.txt rules by default
Prerequisites
- Rust (1.91 or later) - Install Rust
- Cargo (comes with Rust)
Optional Development Tools
Lefthook - Git hooks manager for running linters and formatters automatically
# macOS brew install lefthook # After installation, initialize hooks lefthook install
Installation
From Source
# Clone the repository
git clone https://github.com/nelsonlaidev/scoutly.git
cd scoutly
# Build the project
cargo build --release
# The binary will be at target/release/scoutlyRelease Process
Release and packaging instructions live in RELEASE.md.
Usage
Default TUI
# Launch the interactive TUI (default in an interactive terminal)
scoutly
# Launch the TUI with a pre-filled URL and start immediately
scoutly https://example.com
# Specify custom depth and page limits
scoutly https://example.com --depth 3 --max-pages 100
# Force the TUI explicitly
scoutly https://example.com --tuiCLI and JSON Modes
# Force the text report instead of the TUI
scoutly https://example.com --cli
# Output machine-readable JSON instead of launching the TUI
scoutly https://example.com --output json
# Save the final report to a file
scoutly https://example.com --cli --save report.jsonMore Options
# Follow external links (by default, only internal links are followed)
scoutly https://example.com --external
# Ignore redirect issues in the report
scoutly https://example.com --ignore-redirects
# Treat URLs with fragment identifiers (#) as unique links
scoutly https://example.com --keep-fragments
# Combine options
scoutly https://example.com --cli --depth 4 --max-pages 200 --verbose --ignore-redirects --save report.jsonTUI Key Bindings
The default TUI is keyboard-first and intentionally close to tools like llmfit. If you launch scoutly without a URL, the TUI opens a URL input first:
| Key | Action |
| -------------------------- | ---------------------- |
| j / k or Up / Down | Move within active section |
| Tab / Shift-Tab | Switch result section |
| / | Enter search mode |
| f | Cycle severity filter (By Page) |
| s | Cycle sort mode (By Page) |
| Enter | Toggle the detail pane |
| q / Esc | Quit |
When Scoutly is not attached to an interactive terminal, it automatically falls back to the CLI unless you explicitly pass --tui.
Configuration Files
Scoutly supports configuration files in JSON, TOML, or YAML format. Configuration files allow you to set default values for options without having to specify them on the command line every time.
Default Configuration Paths
Scoutly automatically looks for configuration files in the following locations (in order of priority):
Current directory:
scoutly.jsonscoutly.tomlscoutly.yamlscoutly.yml
User config directory:
- Linux/macOS:
~/.config/scoutly/config.{json,toml,yaml,yml} - Windows:
%APPDATA%\scoutly\config.{json,toml,yaml,yml}
- Linux/macOS:
Example Configuration Files
All configuration fields are optional. You can provide only the fields you want to customize.
JSON (scoutly.json):
{
"depth": 10,
"max_pages": 500,
"cli": true,
"output": "json",
"external": true,
"verbose": true,
"ignore_redirects": false,
"keep_fragments": false,
"rate_limit": 2.0,
"concurrency": 10,
"respect_robots_txt": true
}TOML (scoutly.toml):
depth = 10
max_pages = 500
cli = true
output = "json"
external = true
verbose = true
ignore_redirects = false
keep_fragments = false
rate_limit = 2.0
concurrency = 10
respect_robots_txt = trueYAML (scoutly.yaml):
depth: 10
max_pages: 500
cli: true
output: json
external: true
verbose: true
ignore_redirects: false
keep_fragments: false
rate_limit: 2.0
concurrency: 10
respect_robots_txt: trueUsing a Custom Config File
You can specify a custom configuration file path using the --config option:
scoutly https://example.com --config ./my-config.jsonConfiguration Priority
Command-line arguments always take precedence over configuration file values. For example:
# If scoutly.json sets depth to 10, this command will use depth 15
scoutly https://example.com --depth 15This allows you to set sensible defaults in your config file while still being able to override them when needed.
Command Line Options
Usage: scoutly [OPTIONS] [URL]
Arguments:
[URL] The URL to start crawling from (optional in TUI mode)
Options:
-d, --depth <DEPTH> Maximum crawl depth (default: 5)
-m, --max-pages <MAX_PAGES> Maximum number of pages to crawl (default: 200)
-o, --output <OUTPUT> CLI output format: text or json
--cli Force CLI mode instead of launching the TUI
--tui Force the interactive TUI
-s, --save <SAVE> Save report to file
-e, --external Follow external links
-v, --verbose Verbose output
--ignore-redirects Ignore redirect issues in the report
--keep-fragments Treat URLs with fragment identifiers (#) as unique links
-r, --rate-limit <RATE_LIMIT> Rate limit for requests per second
-c, --concurrency <CONCURRENCY> Number of concurrent requests (default: 5)
--respect-robots-txt <RESPECT_ROBOTS_TXT>
Respect robots.txt rules (default: true)
--config <CONFIG> Path to configuration file (JSON, TOML, or YAML)
-h, --help Print helpExample Output
Default TUI
Running scoutly https://example.com in an interactive terminal opens the Ratatui dashboard with:
- a live status/header bar
- pages / links / error / warning counters
- four result sections: By Page, By Link URL, By Status, and All Links
- searchable browsing across the active section
- page-only severity/sort controls in the By Page section
- a detail pane for the selected page, link URL group, status bucket, or individual link
- a footer showing the active mode and available keys
Text Report
================================================================================
Scoutly - Crawl Report
================================================================================
Start URL: https://example.com
Timestamp: 2025-11-03T16:05:29.911833+00:00
Summary
Total Pages Crawled: 15
Total Links Found: 127
Broken Links: 2
Errors: 3
Warnings: 8
Info: 5
Pages with Issues
URL: https://example.com/about
Status: 200
Depth: 1
Title: About Us
Issues:
[WARN ] Page is missing a meta description
[WARN ] 3 image(s) missing alt text
URL: https://example.com/contact
Status: 200
Depth: 1
Title: Contact
Issues:
[ERROR] Broken link: https://example.com/old-page (HTTP 404)JSON Report
Use --output json to get machine-readable output suitable for integration with other tools or CI/CD pipelines. In JSON mode, Scoutly writes the report JSON to stdout and keeps human-oriented progress/status messages off stdout so the output stays parseable. Link objects also include an optional check_error field when a link fails due to a transport-level error instead of an HTTP response.
How It Works
graph TD
A([Start URL]) --> B[Crawler]
B -->|Fetch HTML| C{Parse Page}
C -->|Extract Links| D[Link Queue]
C -->|Extract Metadata| E[Page Data]
D -->|Under Depth Limit?| B
D --> F[Link Checker]
F -->|Concurrent Requests| G[Link Status]
E --> H[SEO Analyzer]
H -->|Check Rules| I[SEO Issues]
G --> J[Report Generator]
I --> J
J --> K([TUI / CLI / JSON Output])- Crawling: Starting from the provided URL, Scoutly fetches each page and extracts all links from various HTML elements (anchor tags, iframes, media elements, embeds, etc.)
- Link Discovery: Internal links (same domain) are queued for crawling based on depth limits
- Link Validation: All discovered links are checked asynchronously for HTTP status codes
- SEO Analysis: Each page is analyzed for common SEO issues
- Report Generation: Results are compiled into a comprehensive report
Link Extraction
Scoutly extracts links from multiple HTML elements:
<a href>- Standard hyperlinks<iframe src>- Embedded content<video src>and<source src>- Video content<audio src>- Audio content<embed src>- Embedded plugins<object data>- Embedded objects
SEO Checks Performed
Title Tag
- Missing title
- Title too short (< 50 characters, recommended: 50-60)
- Title too long (> 60 characters, recommended: 50-60)
Meta Description
- Missing meta description
- Description too short (< 150 characters, recommended: 150-160)
- Description too long (> 160 characters, recommended: 150-160)
Headings
- Missing H1 tag
- Multiple H1 tags
Images
- Missing alt attributes
Content
- Thin content detection (checks if page has fewer than 5 content indicators)
Links
- Broken links (4xx and 5xx status codes)
- Redirect detection (3xx status codes)
Performance
- Asynchronous I/O for fast crawling
- Concurrent link checking
- Configurable limits to prevent excessive resource usage
- Typical crawl speed: 10-20 pages per second (depending on target site and network)
Limitations (Basic Version)
- No JavaScript rendering (only parses initial HTML)
- Basic content analysis (no detailed text analysis)
- No authentication support
- No sitemap generation (planned for future versions)
Future Enhancements
- JavaScript rendering with headless browser support
- Sitemap generation (XML)
- Authentication support
- More advanced SEO checks (keyword density, structured data)
- Additional TUI views and filters
- HTML validation
- Accessibility checks (WCAG compliance)
- PDF and document crawling
Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
Author
Donation
If you find this project helpful, consider supporting me by sponsoring the project.
License
This project is open source and available under the MIT License.
