web2md-cli

v1.0.0

Published

24 days ago

Convert any URL to clean Markdown — works with Reddit, YouTube, arXiv, GitHub and more

0High
0Medium
0Low

web2md.ogr

markdown web scraping llm ai cli

web2md

Convert any URL to clean Markdown — from the command line.

Zero install. Works with Wikipedia, arXiv, Hacker News, GitHub, dev.to, Substack, and more. For Reddit and restricted sites, use an API key.

Usage

npx web2md <url>

Examples

# Basic conversion
npx web2md https://arxiv.org/abs/2501.12345

# Pipe to an LLM
npx web2md https://news.ycombinator.com/item?id=123 | llm "summarize"

# Save to file
npx web2md https://example.com -o article.md

# Add YAML frontmatter
npx web2md https://example.com --meta

# JSON output (pipe to jq)
npx web2md https://example.com --json | jq '.metadata.wordCount'

# Batch from file
cat urls.txt | xargs npx web2md

# Reddit (requires API key)
WEB2MD_API_KEY=w2m_xxx npx web2md 'https://reddit.com/r/programming/comments/xxx'

Options

| Option | Description | |--------|-------------| | --no-images | Strip image references | | --no-links | Strip hyperlinks | | --meta | Add YAML frontmatter (title, source, wordCount) | | --json | Output as { markdown, metadata } JSON | | -o, --output | Write to file instead of stdout | | -q, --quiet | Suppress progress messages | | --version | Print version | | -h, --help | Show help |

Environment Variables

| Variable | Description | |----------|-------------| | WEB2MD_API_KEY | API key (w2m_xxx) — enables Reddit + server-side rendering | | WEB2MD_API_URL | Override API base URL |

How it works

Without API key (default): Fetches the URL locally and converts HTML to Markdown using the same extraction engine as the Web2MD Chrome extension. Fast, no rate limits.

With API key: Routes through the Web2MD server, which handles Reddit (bypasses IP blocks), Fandom/Wikia via MediaWiki API, and other restricted sites.

Get an API key at web2md.org/dashboard/api-keys

Supported Sites

Works on any HTML page. Optimized extraction for:

Wikipedia, arXiv, MDN
Hacker News (with comments)
GitHub Issues / PRs
Stack Overflow
dev.to, Medium, Substack
OpenAI Docs, Mintlify-based docs
Reddit (requires API key)

For AI Agents (Claude Code, Cursor, etc.)

Add to your CLAUDE.md or project instructions:

Use `npx web2md <url>` to fetch web pages as clean Markdown.
Supports arXiv, Wikipedia, HN, GitHub, dev.to, Substack.
For Reddit: WEB2MD_API_KEY=w2m_xxx npx web2md <url>
Output goes to stdout, errors to stderr.

License

MIT

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme