npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2025 – Pkg Stats / Ryan Hefner

@aiquants/html-to-markdown

v1.2.0

Published

HTML to Markdown converter

Readme

@aiquants/html-to-markdown

A tool to dynamically fetch a web page from a given URL and convert it to Markdown. Now with Model Context Protocol (MCP) server support!

This tool fetches the fully rendered HTML after JavaScript execution and converts it to Markdown. It supports complex pages, including footnotes and table structures.

Key Features

  • Dynamic Content Fetching: Accurately fetches content from dynamic sites like SPAs (Single Page Applications) using Playwright.
  • High-Fidelity Conversion: Performs powerful AST (Abstract Syntax Tree)-based conversion using rehype and remark.
  • Preserves Table Content: Retains HTML tags within table cells (<td>, <th>) as much as possible to maintain rich formatting.
  • Link Normalization: Automatically converts relative links on the page to absolute links to prevent broken links.
  • Multiple Interfaces: Usable as a Node.js library, command-line tool, and MCP server.
  • MCP Server Support: Provides Model Context Protocol server functionality for AI assistants.
  • Streamable MCP Support: Supports streamable MCP protocols for real-time progress updates.

Installation

npm install @aiquants/html-to-markdown

Usage

As a Command-Line Tool

You can run the tool directly using npx without installation. By default, the converted Markdown will be saved in the .outputs/raw directory, but you can specify a custom output path using the --output option.

npx @aiquants/html-to-markdown <URL> [--locale <locale>] [--output <path>]

Or convert HTML content directly:

npx @aiquants/html-to-markdown --html-content <HTML_TEXT> [--locale <locale>] [--output <path>]

Options:

  • --locale <locale>: Set the locale for the browser context and console messages (en-US or ja-JP). Defaults to en-US.
  • --output <path>, -o <path>: Specify the output file path. If not specified, the file will be saved in the .outputs/raw directory with an auto-generated filename.
  • --html-content <HTML_TEXT>, -h <HTML_TEXT>: Convert HTML content directly instead of fetching from a URL.

Examples:

# Convert a Wikipedia page with the Japanese locale
npx @aiquants/html-to-markdown https://ja.wikipedia.org/wiki/Node.js --locale ja-JP

# Convert an English page (locale defaults to en-US)
npx @aiquants/html-to-markdown https://en.wikipedia.org/wiki/Node.js

# Save to a specific file
npx @aiquants/html-to-markdown https://en.wikipedia.org/wiki/Node.js --output ./my-output.md

# Use short option for output
npx @aiquants/html-to-markdown https://ja.wikipedia.org/wiki/Node.js --locale ja-JP -o ./nodejs-ja.md

# Convert HTML content directly
npx @aiquants/html-to-markdown --html-content '<html><body><h1>Sample Title</h1><p>This is a sample paragraph.</p></body></html>' --output ./sample.md

# Convert HTML content with Japanese locale
npx @aiquants/html-to-markdown --html-content '<html><body><h1>サンプルタイトル</h1><p>これはサンプルの段落です。</p></body></html>' --locale ja-JP -o ./sample-ja.md

As a Library

import { htmlToMarkdown } from '@aiquants/html-to-markdown';
import fs from 'fs';

async function main() {
  // Example 1: Convert from URL
  const url = 'https://en.wikipedia.org/wiki/Node.js';
  const options = {
    locale: 'en-US', // 'en-US' (default) or 'ja-JP'
  };

  try {
    const { markdown } = await htmlToMarkdown(url, options);
    fs.writeFileSync('output.md', markdown);
    console.log('Markdown file has been saved as output.md');
  } catch (error) {
    console.error('Error converting HTML to Markdown:', error);
  }

  // Example 2: Convert HTML content directly
  const htmlContent = `
    <html>
      <body>
        <h1>Sample Title</h1>
        <p>This is a sample paragraph.</p>
        <ul>
          <li>Item 1</li>
          <li>Item 2</li>
        </ul>
      </body>
    </html>
  `;

  try {
    const { markdown } = await htmlToMarkdown('', {
      locale: 'en-US',
      htmlContent: htmlContent
    });
    fs.writeFileSync('output-from-html.md', markdown);
    console.log('Markdown file has been saved as output-from-html.md');
  } catch (error) {
    console.error('Error converting HTML to Markdown:', error);
  }
}

main();

As an MCP Server

This package can be used as a Model Context Protocol (MCP) server, allowing AI assistants and MCP-compatible applications to convert web pages to Markdown.

MCP Server Configuration

For VS Code with MCP extension, add to your mcp.json:

{
  "html-to-md": {
    "command": "npx",
    "args": [
      "--package=@aiquants/html-to-markdown",
      "aiq-html2md-mcp"
    ],
    "type": "stdio"
  },
  "html-to-md-streamable": {
    "url": "http://127.0.0.1:4001/mcp",
    "type": "http",
    "_comment": "Note: You need to start the streamable MCP server separately using 'npx --package=@aiquants/html-to-markdown aiq-html2md-mcp-stream'"
  }
}

For Claude Desktop or other MCP clients, add to your configuration file:

{
  "mcpServers": {
    "html-to-markdown": {
      "command": "npx",
      "args": ["--package=@aiquants/html-to-markdown", "aiq-html2md-mcp"],
      "description": "Convert HTML content from URLs to Markdown format"
    }
  }
}

Note: The streamable MCP server runs as an HTTP server on port 4001 and needs to be started separately:

# Start the streamable MCP server (requires global installation)
npm install -g @aiquants/html-to-markdown
aiq-html2md-mcp-stream

Alternatively, you can use npx to run without global installation:

# Using npx with the streamable server binary
npx --package=@aiquants/html-to-markdown aiq-html2md-mcp-stream

MCP Tools Available

  • html_to_markdown: Convert HTML content from a URL to Markdown format

    • url (required*): The URL of the web page to convert
    • html_content (required*): HTML content as a string to convert (alternative to URL)
    • locale (optional): Browser locale (en-US or ja-JP, defaults to en-US)

    *Either url or html_content is required

  • save_content_to_file: Save text content to a file with specified path or directory

    • content (required): Text content to save to file (will be saved as Markdown format)
    • save_path (optional): Complete file path including filename to save content
    • save_directory (optional): Directory path to save content with auto-generated filename
    • filename (optional): Base filename to use when save_directory is specified (extension .md will be added automatically)
  • url_to_markdown_file: Convert web pages or HTML strings directly to Markdown files (combines conversion and file saving)

    • url (required*): The URL of the web page to convert and save
    • html_content (required*): HTML content as a string to convert and save (alternative to URL)
    • locale (optional): Browser locale (en-US or ja-JP, defaults to en-US)
    • save_path (optional): Complete file path including filename to save the converted content
    • save_directory (optional): Directory path to save the converted content with auto-generated filename
    • filename (optional): Base filename to use when save_directory is specified (extension .md will be added automatically)

    *Either url or html_content is required, and either save_path or save_directory is required

Streamable MCP Tools (for real-time progress updates):

  • html_to_markdown_streamable: Same as html_to_markdown but with real-time progress updates
  • save_content_to_file: Same file saving functionality as in standard MCP
  • url_to_markdown_file_streamable: Same as url_to_markdown_file but with real-time progress updates and streaming support

File Saving Options:

  • save_path: Specify the complete file path including filename and extension where you want to save the content.

    • Example: /path/to/output/my-page.md
    • The directory will be created automatically if it doesn't exist.
    • You cannot use both save_path and save_directory at the same time.
  • save_directory: Specify only the directory where you want to save the file. The filename will be auto-generated based on the URL.

    • Example: /path/to/output/ (filename will be auto-generated like page.md)
    • For URLs like https://example.com/articles/my-article, the filename becomes my-article.md
    • The directory will be created automatically if it doesn't exist.
    • You cannot use both save_path and save_directory at the same time.

File Saving Behavior:

  • All content is saved as Markdown format (.md files)
  • The directory will be created automatically if it doesn't exist
  • With save_path: File is saved to the exact specified path
  • With save_directory: File is saved with auto-generated filename based on the URL or custom filename if provided

MCP Usage Examples

Example 1: Basic HTML to Markdown conversion
{
  "method": "tools/call",
  "params": {
    "name": "html_to_markdown",
    "arguments": {
      "url": "https://example.com",
      "locale": "en-US"
    }
  }
}
Example 2: Save content to a specific file
{
  "method": "tools/call",
  "params": {
    "name": "save_content_to_file",
    "arguments": {
      "content": "# My Content\n\nThis is my markdown content.",
      "save_path": "/path/to/my-file.md"
    }
  }
}
Example 3: Convert URL directly to Markdown file (One-step operation)
{
  "method": "tools/call",
  "params": {
    "name": "url_to_markdown_file",
    "arguments": {
      "url": "https://example.com/article",
      "save_directory": "/path/to/output/",
      "filename": "my-article",
      "locale": "en-US"
    }
  }
}
Example 4: Streamable conversion with real-time progress
{
  "method": "tools/call",
  "params": {
    "name": "url_to_markdown_file_streamable",
    "arguments": {
      "url": "https://example.com/large-page",
      "save_path": "/path/to/large-page.md",
      "locale": "ja-JP"
    }
  }
}

Using MCP Server Programmatically

import { createMcpServer, createStreamableMcpServer } from '@aiquants/html-to-markdown';

// Create standard MCP server
const mcpServer = createMcpServer();
await mcpServer.start(8000);  // Port 8000

// Create streamable MCP server
const streamableMcpServer = createStreamableMcpServer();
await streamableMcpServer.start(8000);  // Port 8000

API

htmlToMarkdown(urlOrHtml, options?)

Converts the HTML content of a given URL or HTML string to Markdown.

  • urlOrHtml (string, required): The URL of the web page to convert, or when using htmlContent option, this can be any string (commonly used as identifier).
  • options (object, optional): Options for the conversion process.

options object

  • locale (string, optional): Specifies the locale to use for the browser context and console messages.
    • 'en-US' (default)
    • 'ja-JP'
  • htmlContent (string, optional): HTML content as a string instead of fetching from URL. When provided, the function will convert this HTML content directly instead of fetching content from the urlOrHtml parameter.

How It Works

This tool follows these steps for conversion:

  1. Fetch HTML (Playwright):

    • Launches a browser with Playwright using the specified locale.
    • Navigates to the page and waits until the networkidle event, ensuring dynamic content is fully loaded before fetching the HTML.
  2. Pre-process HTML (Rehype):

    • rehype-parse: Parses the HTML into a HAST (HTML Abstract Syntax Tree).
    • rehype-raw: Preserves elements like script and style.
    • rehypeSanitizeHtml (custom plugin): Removes empty comment nodes and unnecessary whitespace.
    • rehypeAbsoluteLinks (custom plugin): Converts relative paths in href and src attributes to absolute paths.
    • rehypeWikipediaFootnotes (custom plugin): Transforms Wikipedia footnotes into standard Markdown format.
    • rehypeSlug & rehypeAutolinkHeadings: Adds IDs to headings and automatically generates anchor links.
  3. Convert to Markdown (Remark):

    • rehype-remark: Converts the HAST to an MDAST (Markdown Abstract Syntax Tree), with custom handling for links (<a>) and table cells (<td>, <th>).
    • remark-gfm: Adds support for GitHub Flavored Markdown (GFM), including tables and strikethrough.
    • remark-stringify: Serializes the MDAST into a Markdown string.

Dependencies and Licenses

This project is built upon the following open-source software. We are grateful to the developers of these libraries.

| Package | License | | -------------------------- | ---------- | | @modelcontextprotocol/sdk | MIT | | cors | MIT | | express | MIT | | github-slugger | ISC | | happy-dom | MIT | | hast | MIT | | hast-util-to-html | MIT | | playwright | Apache-2.0 | | rehype-parse | MIT | | rehype-raw | MIT | | rehype-remark | MIT | | rehype-slug | MIT | | remark-gfm | MIT | | remark-stringify | MIT | | unified | MIT | | unist-util-visit | MIT | | yargs-parser | ISC |

This list is generated based on the dependencies in package.json. For the most accurate and up-to-date license information, please refer to the individual packages.

Author

License

MIT