@gulibs/safe-coder-cli
v0.0.3
Published
Standalone CLI tool for documentation crawling with SPA support, error detection, and code validation
Downloads
286
Maintainers
Readme
Safe Coder CLI
Standalone CLI tool for documentation crawling with SPA support, error detection, and code validation.
Overview
@gulibs/safe-coder-cli is an independent command-line tool that crawls documentation websites and generates structured output. It supports both static sites and Single Page Applications (SPAs) using browser automation.
This CLI is designed to work standalone or as part of the Safe Coder ecosystem, where it's called by the @gulibs/safe-coder MCP Server.
Features
- HTTP & Browser Crawling: Supports both static HTTP crawling and browser-based rendering for SPAs
- Intelligent Content Extraction: Cleans and structures documentation content
- Parallel Processing: Multi-worker support for faster crawling
- Progress Reporting: Real-time progress updates via stderr
- JSON Output: Machine-readable JSON output for programmatic use
- Skill Generation: Generates AI-ready SKILL files from documentation
- Checkpoint Support: Resume interrupted crawls
- Proxy Support: Configure HTTP/HTTPS proxies
Installation
Global Installation (Recommended)
npm install -g @gulibs/safe-coder-cliOr using yarn:
yarn global add @gulibs/safe-coder-cliOr using pnpm:
pnpm add -g @gulibs/safe-coder-cliVerify Installation
safe-coder-cli --version
safe-coder-cli --helpUsage
Basic Crawl
safe-coder-cli crawl https://react.devCrawl with Options
# Limit pages and depth
safe-coder-cli crawl https://react.dev --max-pages 50 --max-depth 3
# Use multiple workers for faster crawling
safe-coder-cli crawl https://react.dev --workers 5
# Force browser automation for SPAs
safe-coder-cli crawl https://spa-site.com --spa-strategy auto --browser playwright
# Save output to directory
safe-coder-cli crawl https://react.dev --output-dir ./skillsJSON Output (for MCP Integration)
# Output machine-readable JSON
safe-coder-cli crawl https://react.dev --output-format json
# Capture output to file
safe-coder-cli crawl https://react.dev --output-format json > output.jsonCommand Reference
crawl <url> [options]
Crawl documentation website and optionally generate skill file.
Options
-c, --config <path>- Path to configuration file-b, --browser <type>- Browser type:puppeteer|playwright-d, --max-depth <number>- Maximum crawl depth (default: 3)-p, --max-pages <number>- Maximum number of pages to crawl (default: 50)-w, --workers <number>- Number of parallel workers (default: 1)--spa-strategy <type>- SPA strategy:smart|auto|manual(default: smart)-o, --output-dir <path>- Output directory for skill files-f, --filename <name>- Skill name for directory and file names--checkpoint- Enable checkpoint/resume functionality--resume- Resume from last checkpoint if available--rate-limit <ms>- Delay in milliseconds between requests (default: 500)--output-format <format>- Output format:json|pretty(default: pretty)--include-paths <paths>- Additional path patterns to include (comma-separated)--exclude-paths <paths>- Path patterns to exclude (comma-separated)
detect-errors <file> [options]
Detect errors and warnings in code files.
safe-coder-cli detect-errors ./src/app.ts
safe-coder-cli detect-errors ./src/app.ts --format jsonvalidate-code <file> [options]
Validate and optionally fix code errors.
safe-coder-cli validate-code ./src/app.ts
safe-coder-cli validate-code ./src/app.ts --output ./src/app.fixed.tsConfiguration File
Create a .doc-crawler.json file in your project root:
{
"browser": "puppeteer",
"spaStrategy": "smart",
"crawl": {
"maxDepth": 3,
"maxPages": 200,
"workers": 5,
"rateLimit": 300,
"checkpoint": {
"enabled": true,
"interval": 50
}
},
"proxy": "http://127.0.0.1:7890"
}Output Format
JSON Output Structure
When using --output-format json, the CLI outputs:
{
"success": true,
"data": {
"source": {
"url": "https://react.dev",
"crawledAt": "2024-01-15T10:30:00.000Z",
"pageCount": 50,
"depth": 3
},
"pages": [
{
"url": "https://react.dev/learn",
"title": "Learn React",
"content": "...",
"wordCount": 1500,
"codeBlocks": 5,
"headings": ["Getting Started", "Components"]
}
],
"metadata": {
"technology": "react.dev",
"categories": ["tutorial", "api", "guide"]
},
"statistics": {
"totalPages": 50,
"maxDepthReached": 3,
"errors": 0
},
"skill": {
"skillMd": "...",
"quality": 85
}
}
}Progress Output (stderr)
Progress information is output to stderr in JSON format:
{"type":"progress","message":"Crawled 10/50 pages","timestamp":"...","current":10,"total":50,"percentage":20}Browser Setup
For SPA crawling, you need Chrome/Chromium installed:
macOS
brew install --cask google-chromeWindows
winget install Google.ChromeLinux
sudo apt install google-chrome-stableCustom Browser Path
export CHROME_PATH=/path/to/chromeEnvironment Variables
CHROME_PATH- Path to Chrome executableHTTP_PROXY- HTTP proxy URLHTTPS_PROXY- HTTPS proxy URLLOG_LEVEL- Log level (INFO, DEBUG, ERROR)
Integration with MCP Server
The CLI is designed to be called by @gulibs/safe-coder MCP Server. The MCP Server:
- Checks if CLI is installed
- Spawns CLI with appropriate parameters
- Monitors progress via stderr
- Parses JSON output from stdout
- Post-processes results and generates SKILL guidance
Examples
Simple Documentation Crawl
safe-coder-cli crawl https://docs.example.com --max-pages 30Fast Parallel Crawl
safe-coder-cli crawl https://docs.example.com --workers 8 --max-pages 200SPA Site with Browser
safe-coder-cli crawl https://spa-site.com --spa-strategy auto --browser playwrightGenerate SKILL and Save
safe-coder-cli crawl https://react.dev \
--output-dir ~/.cursor/skills \
--filename react-docs \
--max-pages 100JSON Output for Scripting
safe-coder-cli crawl https://docs.example.com \
--output-format json \
--max-pages 20 > output.json
# Process with jq
cat output.json | jq '.data.statistics'Troubleshooting
CLI Not Found
After installation, if safe-coder-cli is not found:
# Check npm global bin path
npm config get prefix
# Add to PATH if needed (macOS/Linux)
export PATH="$(npm config get prefix)/bin:$PATH"Browser Not Found
If you see "Chrome/Chromium not found":
- Install Chrome (see Browser Setup above)
- Set
CHROME_PATHenvironment variable - Or install full puppeteer:
npm install -g puppeteer
Permission Errors
On Linux/macOS, you may need sudo for global installation:
sudo npm install -g @gulibs/safe-coder-cliOr use a version manager like nvm to avoid sudo.
Development
# Clone repository
git clone <repository-url>
cd safe-coder-cli
# Install dependencies
npm install
# Build
npm run build
# Link for local testing
npm link
# Test
safe-coder-cli --versionLicense
MIT
Related Projects
@gulibs/safe-coder- MCP Server that orchestrates this CLI
