websnap-reader

v1.0.1

Published

3 months ago

Convert any webpage to clean Markdown from your terminal -- reader mode CLI with AI summaries, Chrome CDP support, batch processing, and JSON output

Downloads

websnap-reader

Convert any webpage to clean Markdown from your terminal. Reader mode for the command line with AI-powered summaries.

Strip away ads, navigation, and clutter -- get just the article content as clean Markdown. Supports JavaScript-heavy sites via Chrome CDP, AI-powered summaries, batch processing, and structured JSON output.

Demo

$ websnap https://paulgraham.com/greatwork.html

# How to Do Great Work

The first step is to decide what to work on. The work you choose
needs to have three qualities: it has to be something you have a
natural aptitude for, that you have a deep interest in, and that
offers scope to do great work.

[... clean markdown continues ...]

$ websnap https://arxiv.org/abs/2301.00001 --summary

Summary: This paper introduces a novel approach to neural network
pruning that achieves 40% compression with minimal accuracy loss.
The method uses gradient-based importance scoring during training.
Results show state-of-the-art performance on ImageNet benchmarks.

$ websnap https://example.com --json | jq '.title, .wordCount'
"Example Article"
1234

Install

npm install -g websnap-reader

Or run without installing:

npx websnap-reader https://example.com

Usage

Convert a URL to Markdown

websnap https://blog.example.com/post

Save to a file

websnap https://example.com -o article.md

Get structured JSON output

websnap https://example.com --json

Returns:

{
  "url": "https://example.com",
  "title": "Example Article",
  "author": "John Doe",
  "date": "March 15, 2026",
  "content": "# Example Article\n\nArticle content in markdown...",
  "wordCount": 1234,
  "readingTime": "6 min read",
  "extractedAt": "2026-03-15T10:30:00.000Z"
}

AI-powered summary

websnap https://example.com --summary

Generates a concise 3-sentence summary. Supports multiple AI backends:

| Backend | Setup | Default Model | |---------|-------|---------------| | OpenAI | export OPENAI_API_KEY=sk-... | gpt-4o-mini | | Anthropic | export ANTHROPIC_API_KEY=sk-... | claude-sonnet-4-20250514 | | Ollama | Run ollama serve locally | llama3.2 | | Fallback | No setup needed | Extractive summary |

Batch processing

# Process multiple URLs from a file
websnap batch urls.txt --outdir ./articles

# Batch with JSON output
websnap batch urls.txt --json

# Batch with summaries
websnap batch urls.txt --outdir ./summaries --summary

Chrome CDP Integration (JavaScript-heavy sites)

For SPAs, JavaScript-rendered pages, or login-required sites, websnap can connect to your running Chrome browser:

# Start Chrome with remote debugging
/Applications/Google\ Chrome.app/Contents/MacOS/Google\ Chrome --remote-debugging-port=9222

# websnap automatically detects and uses CDP
websnap https://spa-site.com

# Use a custom CDP endpoint
websnap https://example.com --cdp http://localhost:9333

Login-required pages work automatically if you are already logged in to Chrome. Falls back to plain HTTP if Chrome is not running.

Why websnap-reader over alternatives?

| Feature | websnap-reader | readability-cli | percollate | trafilatura | |---------|:--------------:|:---------------:|:----------:|:-----------:| | Clean Markdown output | Yes | No (HTML) | No (PDF) | Yes | | AI summaries | Yes | No | No | No | | Chrome CDP (JS sites) | Yes | No | Yes | No | | JSON structured output | Yes | No | No | Partial | | Batch processing | Yes | No | Yes | Yes | | Login-required pages | Yes | No | No | No | | Pipe-friendly | Yes | Partial | No | Yes | | Zero config | Yes | Yes | Yes | Yes |

All Options

| Flag | Description | |------|-------------| | --json | Output structured JSON | | --summary | Generate AI-powered 3-sentence summary | | --raw | Output raw extracted HTML | | -o, --output <file> | Write output to file | | --cdp <endpoint> | Chrome CDP endpoint (default: http://127.0.0.1:9222) | | --timeout <ms> | Page load timeout (default: 15000) | | --user-agent <str> | Custom User-Agent string | | -V, --version | Show version number | | -h, --help | Show help |

Batch Options

| Flag | Description | |------|-------------| | --outdir <dir> | Write each result as a separate file | | --delay <ms> | Delay between requests (default: 1000) |

Environment Variables

| Variable | Description | |----------|-------------| | OPENAI_API_KEY | OpenAI API key for summaries | | OPENAI_MODEL | OpenAI model (default: gpt-4o-mini) | | ANTHROPIC_API_KEY | Anthropic API key for summaries | | ANTHROPIC_MODEL | Anthropic model (default: claude-sonnet-4-20250514) | | OLLAMA_URL | Ollama server URL (default: http://127.0.0.1:11434) | | OLLAMA_MODEL | Ollama model (default: llama3.2) |

Examples

# Quick read of a blog post
websnap https://paulgraham.com/greatwork.html

# Save an article as JSON for processing
websnap https://arxiv.org/abs/2301.00001 --json -o paper.json

# Get a quick AI summary
websnap https://news.ycombinator.com/item?id=12345 --summary

# Batch scrape a list of articles
websnap batch research-urls.txt --outdir ./research --json

# Use with jq for data extraction
websnap https://example.com --json | jq '.title, .wordCount'

# Pipe to other tools
websnap https://example.com | glow -        # render with glow
websnap https://example.com | pbcopy        # copy to clipboard (macOS)
websnap https://example.com | llm summarize # pipe to LLM CLI

License

MIT

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

websnap-reader

Demo

Install

Usage

Convert a URL to Markdown

Save to a file

Get structured JSON output

AI-powered summary

Batch processing

Chrome CDP Integration (JavaScript-heavy sites)

Why websnap-reader over alternatives?

All Options

Batch Options

Environment Variables

Examples

License