web2md-cli
v1.0.0
Published
Convert any URL to clean Markdown — works with Reddit, YouTube, arXiv, GitHub and more
Maintainers
Readme
web2md
Convert any URL to clean Markdown — from the command line.
Zero install. Works with Wikipedia, arXiv, Hacker News, GitHub, dev.to, Substack, and more. For Reddit and restricted sites, use an API key.
Usage
npx web2md <url>Examples
# Basic conversion
npx web2md https://arxiv.org/abs/2501.12345
# Pipe to an LLM
npx web2md https://news.ycombinator.com/item?id=123 | llm "summarize"
# Save to file
npx web2md https://example.com -o article.md
# Add YAML frontmatter
npx web2md https://example.com --meta
# JSON output (pipe to jq)
npx web2md https://example.com --json | jq '.metadata.wordCount'
# Batch from file
cat urls.txt | xargs npx web2md
# Reddit (requires API key)
WEB2MD_API_KEY=w2m_xxx npx web2md 'https://reddit.com/r/programming/comments/xxx'Options
| Option | Description |
|--------|-------------|
| --no-images | Strip image references |
| --no-links | Strip hyperlinks |
| --meta | Add YAML frontmatter (title, source, wordCount) |
| --json | Output as { markdown, metadata } JSON |
| -o, --output | Write to file instead of stdout |
| -q, --quiet | Suppress progress messages |
| --version | Print version |
| -h, --help | Show help |
Environment Variables
| Variable | Description |
|----------|-------------|
| WEB2MD_API_KEY | API key (w2m_xxx) — enables Reddit + server-side rendering |
| WEB2MD_API_URL | Override API base URL |
How it works
Without API key (default): Fetches the URL locally and converts HTML to Markdown using the same extraction engine as the Web2MD Chrome extension. Fast, no rate limits.
With API key: Routes through the Web2MD server, which handles Reddit (bypasses IP blocks), Fandom/Wikia via MediaWiki API, and other restricted sites.
Get an API key at web2md.org/dashboard/api-keys
Supported Sites
Works on any HTML page. Optimized extraction for:
- Wikipedia, arXiv, MDN
- Hacker News (with comments)
- GitHub Issues / PRs
- Stack Overflow
- dev.to, Medium, Substack
- OpenAI Docs, Mintlify-based docs
- Reddit (requires API key)
For AI Agents (Claude Code, Cursor, etc.)
Add to your CLAUDE.md or project instructions:
Use `npx web2md <url>` to fetch web pages as clean Markdown.
Supports arXiv, Wikipedia, HN, GitHub, dev.to, Substack.
For Reddit: WEB2MD_API_KEY=w2m_xxx npx web2md <url>
Output goes to stdout, errors to stderr.License
MIT
