mdka
v2.1.5
Published
A HTML to Markdown converter that balances conversion quality with runtime efficiency written in Rust
Maintainers
Readme
mdka
A HTML to Markdown converter written in Rust.

mdka balances conversion quality with runtime efficiency —
readable output from real-world HTML, without sacrificing speed or memory.
"ka" means "化 (か)" pointing to conversion.
Why mdka?
There are several good HTML-to-Markdown converters in the Rust ecosystem. mdka's specific focus is:
- Reliable output from diverse HTML sources. It is built on scraper, which uses html5ever — the HTML5 parser from the Servo browser engine. html5ever applies the same parsing algorithm that web browsers use, so it handles malformed tags, deeply nested structures, CMS output, and SPA-rendered DOM without special-casing.
- Crash resistance. Conversion uses non-recursive DFS throughout. There is no stack overflow, no matter the nesting depth.
- Configurable pre-processing. Five conversion modes let you tune what gets kept or stripped — from noise-free LLM input to lossless archiving.
- Multi-language. The same Rust implementation is accessible from Node.js (napi-rs) and Python (PyO3).
Quick Start
Try it from the command line
cargo (Rust language) installed is required.
cargo install mdka-cli
echo '<h1>Hello</h1><p><strong>world</strong></p>' | mdka
# # Hello
#
# **world**mdka page.html # → page.md (same directory)
mdka --mode minimal --drop-shell *.html # strip nav/header/footer
mdka --help # full option listAdd to a Rust project
# Cargo.toml
[dependencies]
mdka = "2"use mdka::html_to_markdown;
let md = html_to_markdown("<h1>Hello</h1><p><em>world</em></p>");
// "# Hello\n\n*world*\n"With options:
use mdka::{html_to_markdown_with};
use mdka::options::{ConversionMode, ConversionOptions};
let mut opts = ConversionOptions::for_mode(ConversionMode::Minimal);
opts.drop_interactive_shell = true;
let md = html_to_markdown_with(html, &opts);Add to a Node.js project
npm install mdkaconst { htmlToMarkdown, htmlToMarkdownWith } = require('mdka')
const md = htmlToMarkdown('<h1>Hello</h1>')
const md = await htmlToMarkdownWithAsync(html, {
mode: 'minimal',
dropInteractiveShell: true,
})Add to a Python project
pip install mdkaimport mdka
md = mdka.html_to_markdown('<h1>Hello</h1>')
md = mdka.html_to_markdown_with(
html,
mode=mdka.ConversionMode.Minimal,
drop_interactive_shell=True,
)Conversion Modes
| Mode | Use when |
|---|---|
| Balanced | General use — default |
| Strict | Debugging, diff comparison |
| Minimal | LLM input, text extraction |
| Semantic | SPA content, ARIA-aware pipelines |
| Preserve | Archiving, audit trails |
Learn More
Full documentation lives in the docs/ folder, published as GitHub Pages.
https://nabbisen.github.io/mdka-rs/
| Topic | Link | |---|---| | Installation | /getting-started/installation | | Rust Usage & Examples | /getting-started/usage-rust | | Node.js Usage | /getting-started/usage-nodejs | | Python Usage | /getting-started/usage-python | | CLI Reference | /getting-started/usage-cli | | API Reference | /api/index | | Conversion Modes | /api/modes | | ConversionOptions | /api/options | | Supported Elements | /api/elements | | Design Philosophy | /design/philosophy | | Performance Characteristics | /design/performance-characteristics | | Architecture | /design/architecture | | Features | /design/features |
Open-source, with care
This project is lovingly built and maintained by volunteers.
We hope it helps streamline your work.
Please understand that the project has its own direction — while we welcome feedback, it might not fit every edge case 🌱
Acknowledgements
Depends on scraper (+ html5ever), ego-tree, rayon, tikv-jemallocator / tikv-jemalloc-ctl, thiserror.
Also, napi-rs on binding for Node.js and PyO3's pyo3 / maturin on bindings for Python.
