websnap-reader
v1.0.1
Published
Convert any webpage to clean Markdown from your terminal -- reader mode CLI with AI summaries, Chrome CDP support, batch processing, and JSON output
Maintainers
Readme
websnap-reader
Convert any webpage to clean Markdown from your terminal. Reader mode for the command line with AI-powered summaries.
Strip away ads, navigation, and clutter -- get just the article content as clean Markdown. Supports JavaScript-heavy sites via Chrome CDP, AI-powered summaries, batch processing, and structured JSON output.
Demo
$ websnap https://paulgraham.com/greatwork.html
# How to Do Great Work
The first step is to decide what to work on. The work you choose
needs to have three qualities: it has to be something you have a
natural aptitude for, that you have a deep interest in, and that
offers scope to do great work.
[... clean markdown continues ...]
$ websnap https://arxiv.org/abs/2301.00001 --summary
Summary: This paper introduces a novel approach to neural network
pruning that achieves 40% compression with minimal accuracy loss.
The method uses gradient-based importance scoring during training.
Results show state-of-the-art performance on ImageNet benchmarks.
$ websnap https://example.com --json | jq '.title, .wordCount'
"Example Article"
1234Install
npm install -g websnap-readerOr run without installing:
npx websnap-reader https://example.comUsage
Convert a URL to Markdown
websnap https://blog.example.com/postSave to a file
websnap https://example.com -o article.mdGet structured JSON output
websnap https://example.com --jsonReturns:
{
"url": "https://example.com",
"title": "Example Article",
"author": "John Doe",
"date": "March 15, 2026",
"content": "# Example Article\n\nArticle content in markdown...",
"wordCount": 1234,
"readingTime": "6 min read",
"extractedAt": "2026-03-15T10:30:00.000Z"
}AI-powered summary
websnap https://example.com --summaryGenerates a concise 3-sentence summary. Supports multiple AI backends:
| Backend | Setup | Default Model |
|---------|-------|---------------|
| OpenAI | export OPENAI_API_KEY=sk-... | gpt-4o-mini |
| Anthropic | export ANTHROPIC_API_KEY=sk-... | claude-sonnet-4-20250514 |
| Ollama | Run ollama serve locally | llama3.2 |
| Fallback | No setup needed | Extractive summary |
Batch processing
# Process multiple URLs from a file
websnap batch urls.txt --outdir ./articles
# Batch with JSON output
websnap batch urls.txt --json
# Batch with summaries
websnap batch urls.txt --outdir ./summaries --summaryChrome CDP Integration (JavaScript-heavy sites)
For SPAs, JavaScript-rendered pages, or login-required sites, websnap can connect to your running Chrome browser:
# Start Chrome with remote debugging
/Applications/Google\ Chrome.app/Contents/MacOS/Google\ Chrome --remote-debugging-port=9222
# websnap automatically detects and uses CDP
websnap https://spa-site.com
# Use a custom CDP endpoint
websnap https://example.com --cdp http://localhost:9333Login-required pages work automatically if you are already logged in to Chrome. Falls back to plain HTTP if Chrome is not running.
Why websnap-reader over alternatives?
| Feature | websnap-reader | readability-cli | percollate | trafilatura | |---------|:--------------:|:---------------:|:----------:|:-----------:| | Clean Markdown output | Yes | No (HTML) | No (PDF) | Yes | | AI summaries | Yes | No | No | No | | Chrome CDP (JS sites) | Yes | No | Yes | No | | JSON structured output | Yes | No | No | Partial | | Batch processing | Yes | No | Yes | Yes | | Login-required pages | Yes | No | No | No | | Pipe-friendly | Yes | Partial | No | Yes | | Zero config | Yes | Yes | Yes | Yes |
All Options
| Flag | Description |
|------|-------------|
| --json | Output structured JSON |
| --summary | Generate AI-powered 3-sentence summary |
| --raw | Output raw extracted HTML |
| -o, --output <file> | Write output to file |
| --cdp <endpoint> | Chrome CDP endpoint (default: http://127.0.0.1:9222) |
| --timeout <ms> | Page load timeout (default: 15000) |
| --user-agent <str> | Custom User-Agent string |
| -V, --version | Show version number |
| -h, --help | Show help |
Batch Options
| Flag | Description |
|------|-------------|
| --outdir <dir> | Write each result as a separate file |
| --delay <ms> | Delay between requests (default: 1000) |
Environment Variables
| Variable | Description |
|----------|-------------|
| OPENAI_API_KEY | OpenAI API key for summaries |
| OPENAI_MODEL | OpenAI model (default: gpt-4o-mini) |
| ANTHROPIC_API_KEY | Anthropic API key for summaries |
| ANTHROPIC_MODEL | Anthropic model (default: claude-sonnet-4-20250514) |
| OLLAMA_URL | Ollama server URL (default: http://127.0.0.1:11434) |
| OLLAMA_MODEL | Ollama model (default: llama3.2) |
Examples
# Quick read of a blog post
websnap https://paulgraham.com/greatwork.html
# Save an article as JSON for processing
websnap https://arxiv.org/abs/2301.00001 --json -o paper.json
# Get a quick AI summary
websnap https://news.ycombinator.com/item?id=12345 --summary
# Batch scrape a list of articles
websnap batch research-urls.txt --outdir ./research --json
# Use with jq for data extraction
websnap https://example.com --json | jq '.title, .wordCount'
# Pipe to other tools
websnap https://example.com | glow - # render with glow
websnap https://example.com | pbcopy # copy to clipboard (macOS)
websnap https://example.com | llm summarize # pipe to LLM CLILicense
MIT
