@thi.ng/hiccup-html-parse
v0.3.111
Published
Well-formed HTML parsing and customizable transformation to nested JS arrays in @thi.ng/hiccup format
Maintainers
Readme
[!NOTE] This is one of 214 standalone projects, maintained as part of the @thi.ng/umbrella monorepo and anti-framework.
🚀 Please help me to work full-time on these projects by sponsoring me on GitHub. Thank you! ❤️
About
Well-formed HTML parsing and customizable transformation to nested JS arrays in @thi.ng/hiccup format.
Note: This parser is intended to work with wellformed HTML and will likely fail for any "quirky" (aka malformed/dodgy) markup...
Basic usage
import { parseHtml } from "@thi.ng/hiccup-html-parse";
const src = `<!doctype html>
<html lang="en">
<head>
<script lang="javascript">
console.log("</"+"script>");
</script>
<style>
body { margin: 0; }
</style>
</head>
<body>
<div id="foo" bool data-xyz="123" empty=''>
<a href="#bar">baz <b>bold</b></a><br/>
</div>
</body>
</html>`;
const result = parseHtml(src);
console.log(result.type);
// "success"
console.log(result.result);
// [
// ["html", { lang: "en" },
// ["head", {},
// ["script", { lang: "javascript" }, "console.log(\"</\"+\"script>\");" ],
// ["style", {}, "body { margin: 0; }"] ],
// ["body", {},
// ["div", { id: "foo", bool: true, "data-xyz": "123" },
// ["a", { href: "#bar" },
// "baz ",
// ["b", {}, "bold"]],
// ["br", {}]]]]
// ]Parsing & transformation options
Parser behavior & results can be customized via supplied options and user transformation functions:
| Option | Description | Default |
|------------------|-----------------------------------------------------|---------|
| ignoreElements | Array of element names to ignore | [] |
| ignoreAttribs | Array of attribute names to ignore | [] |
| dataAttribs | Keep data attribs | true |
| comments | Keep <!-- ... --> comments | false |
| doctype | Keep <!doctype ...> element | false |
| whitespace | Keep whitespace-only text bodies | false |
| collapse | Collapse whitespace(1) | true |
| unescape | Replace named & numeric HTML entities(1) | true |
| tx | Element transform/filter function | |
| txBody | Plain text transform/filter function | |
- (1) - Not in CData content sections like inside
<script>or<style>elements
Status
ALPHA - bleeding edge / work-in-progress
Search or submit any issues for this package
Related packages
- @thi.ng/hiccup-html - 100+ type-checked HTML5 element functions for @thi.ng/hiccup related infrastructure
- @thi.ng/hiccup-markdown - Markdown parser & serializer from/to Hiccup format
- @thi.ng/zipper - Functional tree editing, manipulation & navigation
Installation
yarn add @thi.ng/hiccup-html-parseESM import:
import * as hp from "@thi.ng/hiccup-html-parse";Browser ESM import:
<script type="module" src="https://esm.run/@thi.ng/hiccup-html-parse"></script>For Node.js REPL:
const hp = await import("@thi.ng/hiccup-html-parse");Package sizes (brotli'd, pre-treeshake): ESM: 1.18 KB
Dependencies
Note: @thi.ng/api is in most cases a type-only import (not used at runtime)
Usage examples
One project in this repo's /examples directory is using this package:
| Screenshot | Description | Live demo | Source | |:---------------------------------------------------------------------------------------------------------------------|:--------------------------------------------------------------------------------------------------------|:----------------------------------------------------|:---------------------------------------------------------------------------------| | | Mastodon API feed reader with support for different media types, fullscreen media modal, HTML rewriting | Demo | Source |
API
TODO
Benchmarks
Results from the benchmark parsing the HTML of the thi.ng website (MBA M1 2021, 16GB RAM, Node.js v20.5.1):
benchmarking: thi.ng html (87.97 KB)
warmup... 1951.76ms (100 runs)
total: 19375.49ms, runs: 1000 (@ 1 calls/iter)
mean: 19.38ms, median: 19.26ms, range: [18.12..28.45]
q1: 18.75ms, q3: 19.68ms
sd: 4.66%Authors
If this project contributes to an academic publication, please cite it as:
@misc{thing-hiccup-html-parse,
title = "@thi.ng/hiccup-html-parse",
author = "Karsten Schmidt",
note = "https://thi.ng/hiccup-html-parse",
year = 2023
}License
© 2023 - 2026 Karsten Schmidt // Apache License 2.0
