page2md
v1.0.7
Published
A Node.js utility to convert HTML content to Markdown.
Maintainers
Readme
page2md
page2md is a powerful and versatile tool designed to convert web pages into clean, well-structured Markdown format. Leveraging the capabilities of a headless browser, page2md dynamically loads and processes both static and dynamic web content, making it an ideal solution for developers, content creators, and researchers who need to extract and transform web-based information into a portable, text-based format.
Features
- Dynamic Page Rendering: Utilizes a headless browser to fully render dynamic web pages, ensuring accurate capture of content generated by JavaScript or other client-side scripts.
- Markdown Output: Converts web pages into clean, readable Markdown format, suitable for documentation, note-taking, or further processing.
- Dual Usage Modes: Supports both API-based integration for programmatic access and a command-line interface (CLI) for quick, manual conversions.
- Flexible and Robust: Handles a wide range of web content, including complex layouts, dynamic elements, and modern web frameworks.
- Lightweight and Efficient: Optimized for performance, ensuring fast conversions even for content-heavy pages.
Installation
To get started with page2md, follow these steps:
Prerequisites:
- Node.js (version 14 or higher)
- npm (Node Package Manager)
Install via npm:
npm install -g page2mdVerify Installation:
page2md --version
Usage
page2md offers two primary ways to convert web pages to Markdown: via the command-line interface (CLI) or through its API for programmatic integration.
Command-Line Usage
The CLI provides a straightforward way to convert web pages to Markdown directly from your terminal.
Basic Command:
page2md <url> -o <output-file>.mdExample:
page2md https://example.com -o output.mdOptions:
-o, --output <file>: Specify the output Markdown file (default:output.md).-t, --timeout <ms>: Set the maximum time to wait for page loading (default: 30000ms).--no-js: Disable JavaScript execution for static content only.-h, --help: Display help information.
Advanced Example:
page2md https://example.com --timeout 5000 --no-js -o example-static.mdAPI Usage
For developers looking to integrate page2md into their applications, the API provides a flexible and programmatic way to convert web pages.
Installation:
npm install page2mdExample Code:
const page2md = require('page2md');
async function convertPage() {
try {
const markdown = await page2md.convert({
url: 'https://example.com',
options: {
timeout: 30000,
disableJavaScript: false,
},
});
console.log(markdown);
} catch (error) {
console.error('Error converting page:', error);
}
}
convertPage();API Parameters:
url(string): The URL of the web page to convert.options(object):timeout(number): Maximum time to wait for page loading (in milliseconds).disableJavaScript(boolean): Disable JavaScript execution for static content.outputPath(string, optional): File path to save the Markdown output.
How It Works
page2md operates by launching a headless browser to dynamically load and render the target web page. This approach ensures that all content, including elements generated by JavaScript, is fully loaded before conversion. The tool then parses the rendered DOM, extracting text, headings, links, images, and other elements, and transforms them into a structured Markdown format. The result is a clean, portable document that preserves the essential content and structure of the original page.
Use Cases
- Documentation: Convert web-based documentation into Markdown for offline use or integration into tools like GitHub, GitLab, or wikis.
- Content Archiving: Archive dynamic web content in a lightweight, text-based format for long-term storage.
- Research and Data Collection: Extract content from web pages for analysis or reporting.
- Automation: Integrate page2md into workflows to automate content extraction and conversion.
Contributing
We welcome contributions to page2md! To contribute:
- Fork the repository on GitHub.
- Create a new branch for your feature or bug fix.
- Submit a pull request with a clear description of your changes.
Please ensure your code follows the project's coding standards and includes appropriate tests.
License
page2md is licensed under the MIT License. See the LICENSE file for details.
Support
For issues, feature requests, or questions, please open an issue on the GitHub repository.
Happy converting! 🚀
