@tkeron/html-parser

v1.5.3

Published

3 days ago

A fast and lightweight HTML parser for Bun

Downloads

1,574

0High
0Medium
0Low

tkeronadmin

lemaguilera

html parser dom bun tokenizer

HTML Parser

A fast and lightweight HTML parser for Bun that converts HTML strings into DOM Document objects. Built with a custom tokenizer optimized for Bun runtime.

Features

⚡ Custom Tokenizer: Tokenizer specifically optimized for Bun runtime
🚀 Ultra Fast: Leverages Bun's native optimizations
🪶 Lightweight: Zero external dependencies
🌐 Standards Compliant: Returns standard DOM Document objects
🔧 TypeScript Support: Full TypeScript definitions included
✅ Well Tested: Comprehensive test suite (5660+ tests passing)
🎯 HTML5 Spec: Implements Adoption Agency Algorithm for proper formatting element handling
🧩 Fragment Parsing: Parse HTML fragments with context element support

Installation

npm install @tkeron/html-parser

Or with Bun:

bun add @tkeron/html-parser

Usage

import { parseHTML } from "@tkeron/html-parser";

// Parse HTML string into DOM Document
const html =
  "<html><head><title>Test</title></head><body><h1>Hello World</h1></body></html>";
const document = parseHTML(html);

// Use standard DOM methods
const title = document.querySelector("title")?.textContent;
const heading = document.querySelector("h1")?.textContent;

console.log(title); // "Test"
console.log(heading); // "Hello World"

Simple Example

import { parseHTML } from "@tkeron/html-parser";

const html = `
  <div class="container">
    <p>Hello, world!</p>
    <span id="info">This is a test</span>
  </div>
`;

const doc = parseHTML(html);
const container = doc.querySelector(".container");
const info = doc.getElementById("info");

console.log(container?.children.length); // 2
console.log(info?.textContent); // "This is a test"

API

`parseHTML(html: string): Document`

Parses an HTML string and returns a DOM Document object.

Parameters:

html (string): The HTML string to parse

Returns:

Document: A standard DOM Document object with all the usual methods like querySelector, getElementById, etc.

`parseHTMLFragment(html: string, contextTagName: string): Node[]`

Parses an HTML string as a fragment within a context element. Useful for parsing innerHTML-style content.

Parameters:

html (string): The HTML string to parse
contextTagName (string): The tag name of the context element (e.g., "div", "body")

Returns:

Node[]: An array of parsed nodes

Example:

import { parseHTMLFragment } from "@tkeron/html-parser";

const nodes = parseHTMLFragment("<b>Hello</b> <i>World</i>", "div");
console.log(nodes.length); // 3 (b element, text node, i element)

Development

This project is built with Bun. To get started:

# Install dependencies
bun install

# Run tests
bun test

Testing

Run the test suite:

bun test

License

MIT

Contributing

Fork the repository
Create your feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'Add some amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

Support

If you encounter any issues or have questions, please file an issue on the GitHub repository.

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme