@markitdownjs/html
v0.2.0
Published
HTML converter for MarkItDownJS
Readme
@markitdownjs/html
HTML to AST converter for MarkItDownJS. Uses Mozilla Readability for main-content extraction — well-suited for web pages, blog posts, and documentation sites.
Install
npm install @markitdownjs/htmlUsage
import { MarkItDown } from "@markitdownjs/core";
import { HtmlConverter } from "@markitdownjs/html";
const parser = new MarkItDown();
parser.registerConverter(new HtmlConverter());
const result = await parser.convert({ source: htmlString, mimeType: "text/html" });
console.log(result.markdown);Key Exports
| Export | Description |
|---|---|
| HtmlConverter | Converter plugin — register with MarkItDown |
What Gets Extracted
- Main article content via Mozilla Readability (strips nav, ads, footers)
- Headings, paragraphs, lists, tables, blockquotes, and code blocks
<img>alt text and src preserved asImageNode- Page
<title>and<meta>description captured as metadata
Options
parser.registerConverter(new HtmlConverter({
useReadability: true, // set false to parse full document without extraction
baseUrl: "https://example.com", // used to resolve relative image/link URLs
}));Accepted MIME Types
text/htmlapplication/xhtml+xml
Part of the MarkItDownJS monorepo.
