webpage-content-downloader
v1.0.0
Published
CLI tool to extract main content from webpages and save as markdown
Maintainers
Readme
webpage-content-downloader
A command-line tool that extracts the main content from web pages and saves it as nicely formatted markdown files. It removes ads, navigation, and other distracting elements, giving you clean, readable content.
Installation
npm install -g webpage-content-downloaderThis will install the dl command globally on your system.
Usage
Basic usage:
dl <url> [output]Examples
Extract an article and let the tool generate a filename based on the title:
dl https://example.comExtract an article and specify a custom filename:
dl https://example.com my-article.mdOutput Location
By default, all downloaded content is saved in a .dl directory in your current working directory. The tool will:
- Create the
.dldirectory if it doesn't exist - Save the markdown file inside this directory
- Generate a filename based on the article title if none is specified
File Naming
When no output filename is specified:
- The filename is generated from the article's title
- Special characters are replaced with hyphens
- The extension
.mdis automatically added
For example:
- URL:
https://example.com/my-great-article - Article title: "My Great Article About Code"
- Generated filename:
.dl/my-great-article-about-code.md
Output Format
Each markdown file includes:
- YAML frontmatter with metadata
- The main content converted to markdown
Example output:
---
title: Article Title
source: https://example.com/article
date_extracted: 2023-08-10T12:34:56.789Z
---
# Article Title
Article content in clean markdown format...Features
- Clean content extraction using Mozilla's Readability
- Automatic filename generation
- YAML frontmatter with metadata
- Organized file storage in
.dldirectory - Converts HTML content to well-formatted markdown
- Preserves important formatting:
- Headers
- Lists
- Links
- Code blocks
- Tables
- Strike-through text
- And more...
Requirements
- Node.js 14.0.0 or higher
- npm 6.0.0 or higher
License
MIT
