@caboodle-tech/simple-html-parser
v2.3.0
Published
Lightweight HTML/CSS parser with DOM-like manipulation.
Downloads
469
Maintainers
Readme
Simple HTML Parser
A lightweight, DOM-like HTML and CSS parser for Node.js that creates a simple tree structure (Simple Object Model - SOM) for easy manipulation and serialization back to HTML/CSS strings. 21kb minified, zero dependencies.
Features
- HTML Parsing: Parse HTML into a tree structure with proper handling of nested elements
- CSS Parsing: Parse inline
<style>tags with support for modern CSS features - DOM Manipulation: Insert, move, replace, and remove nodes
- Query Selectors: Find elements using CSS-like selectors
- Preserves Formatting: Maintains whitespace and indentation when manipulating nodes
- No Dependencies: Pure JavaScript implementation
Installation
Add to your project via pnpm or npm:
pnpm install simple-html-parser
# or
npm install simple-html-parserOr include manually by downloading the minified ESM dist/simple-html-parser.min.js file.
Quick Start
import { SimpleHtmlParser } from 'simple-html-parser';
const parser = new SimpleHtmlParser();
const dom = parser.parse('<div id="app"><h1>Hello World</h1></div>');
// Query elements
const app = dom.querySelector('#app');
const heading = dom.querySelector('h1');
// Manipulate
heading.setAttribute('class', 'title');
// Output
console.log(dom.toHtml());
// <div id="app"><h1 class="title">Hello World</h1></div>API Reference
SimpleHtmlParser
parse(html: string): Node
Parses an HTML string into a SOM tree structure.
const parser = new SimpleHtmlParser();
const dom = parser.parse('<div>Hello</div>');version(): string
Returns the parser version.
Node
The core building block of the SOM tree. Every element, text node, and comment is a Node.
Properties
type:'root' | 'tag-open' | 'tag-close' | 'text' | 'comment'name: Tag name (for element nodes)attributes: Object containing element attributeschildren: Array of child nodesparent: Reference to parent nodecontent: Text content (for text/comment nodes)
Querying Methods
querySelector(selector: string): Node | null
Find the first element matching a CSS selector.
const div = dom.querySelector('div');
const byId = dom.querySelector('#myId');
const byClass = dom.querySelector('.myClass');
const complex = dom.querySelector('div.container > p');Supported selectors:
- Tag names:
div,p,span - IDs:
#myId - Classes:
.myClass,.class1.class2 - Attributes:
[data-id],[data-id="value"] - Descendant:
div p(p inside div) - Pseudo-classes:
:not(selector)
querySelectorAll(selector: string): Node[]
Find all elements matching a CSS selector.
const allDivs = dom.querySelectorAll('div');
const allLinks = dom.querySelectorAll('a[href]');findAllByAttr(attrName: string): Node[]
Find all nodes with a specific attribute.
const withDataId = dom.findAllByAttr('data-id');Manipulation Methods
appendChild(...nodes: Node[]): Node[]
Add child nodes to this node.
const div = dom.querySelector('div');
const p = new Node('tag-open', 'p', {}, div);
div.appendChild(p);insertBefore(...nodes: Node[]): Node
Insert nodes before this node (outside the element).
Note: target.insertBefore(node) inserts node before target.
const b = dom.querySelector('#B');
const a = dom.querySelector('#A');
a.insertBefore(b); // Inserts B before AinsertAfter(...nodes: Node[]): Node
Insert nodes after this node (outside the element).
Note: target.insertAfter(node) inserts node after target.
const a = dom.querySelector('#A');
const b = dom.querySelector('#B');
b.insertAfter(a); // Inserts A after BreplaceWith(...nodes: Node[]): Node
Replace this node with other nodes.
const old = dom.querySelector('#old');
const newNode = dom.querySelector('#new');
old.replaceWith(newNode); // Removes old, replaces with newremove(): Node
Remove this node from the tree. Automatically removes matching closing tags.
const div = dom.querySelector('div');
div.remove();Attribute Methods
getAttribute(name: string): string | undefined
Get an attribute value.
const href = link.getAttribute('href');setAttribute(name: string, value: string): void
Set an attribute value.
div.setAttribute('class', 'container');removeAttribute(name: string): void
Remove an attribute.
div.removeAttribute('class');updateAttribute(name: string, value: string, separator?: string): void
Append to an attribute value.
div.updateAttribute('class', 'active'); // class="container active"CSS Methods
CSS methods are available when parsing <style> tags.
cssFindAtRules(name?: string): Node[]
Find at-rules (@media, @keyframes, @supports, etc.) in the CSS tree.
// Find all @media rules
const mediaRules = style.cssFindAtRules('media');
// Find all at-rules
const allAtRules = style.cssFindAtRules();cssFindRules(selector: string, options?: object): Node[]
Find CSS rules matching a selector.
Options:
includeCompound(default:true) - Include compound selectors like.card.activeshallow(default:false) - Exclude nested children and descendant selectors
// Find all .card rules (includes .card.active)
const cardRules = style.cssFindRules('.card');
// Find only exact .card rules
const exactCard = style.cssFindRules('.card', { includeCompound: false });
// Find #wrapper rules, excluding nested rules
const wrapperOnly = style.cssFindRules('#wrapper', { shallow: true });cssFindVariable(name: string, rule?: Node): string | null
Find a specific CSS variable (custom property) by name.
// Find --primary-color
const primary = style.cssFindVariable('--primary-color');
// Find variable without -- prefix
const spacing = style.cssFindVariable('spacing');cssFindVariables(options?: object): Array
Find all CSS variables with their scope paths.
Options:
includeRoot(default:false) - Include 'root' in scope path for root-level variables
const vars = style.cssFindVariables();
// [{name: '--primary', value: '#007bff', scope: ':root', rule: Node}]cssToString(nodes?: Node|Node[], options?: object): string
Convert CSS rules to a formatted CSS string.
Behavior:
- Called with nodes: Converts those specific nodes
- Called on HTML node: Finds and combines all
<style>tags - Called on CSS/style node: Converts this node's CSS tree
Options:
includeComments(default:false) - Include CSS commentsincludeNestedRules(default:true) - Include nested rules within parent rulesflattenNested(default:false) - Flatten nested rules to separate top-level rules with full selectorsincludeBraces(default:true) - Include { } around declarationsincludeSelector(default:true) - Include the selectorcombineDeclarations(default:true) - Merge declarations from multiple rulessingleLine(default:false) - Output on single lineindent(default:0) - Indentation level in spaces
// Convert specific rules
const rules = style.cssFindRules('.card');
const css = style.cssToString(rules, { includeNestedRules: false });
// Convert entire style tag
const style = dom.querySelector('style');
const css = style.cssToString({ flattenNested: true });
// Combine all styles in document
const css = dom.cssToString();
// Just declarations
const css = style.cssToString(rules, {
includeSelector: false,
includeBraces: false
});
// "background: white; padding: 1rem;"Output Methods
toHtml(showComments?: boolean): string
Convert the node tree back to an HTML string.
const html = dom.toHtml();
const htmlWithComments = dom.toHtml(true);toString(): string
Alias for toHtml(true).
Iteration
Nodes are iterable, allowing depth-first traversal:
for (const node of dom) {
if (node.type === 'tag-open') {
console.log(node.name);
}
}Advanced Usage
Moving Elements
const table = dom.querySelector('table');
const rowA = dom.querySelector('#rowA');
const rowB = dom.querySelector('#rowB');
// Swap rows - insert B before A
rowA.insertBefore(rowB); // B now comes before ACreating New Elements
const div = new Node('tag-open', 'div', { class: 'new' });
const text = new Node('text');
text.content = 'Hello';
div.appendChild(text);
const parent = dom.querySelector('#parent');
parent.appendChild(div);CSS Manipulation
const style = dom.querySelector('style');
// Get all CSS variables
const variables = style.cssFindVariables();
console.log(variables);
// [{ name: '--primary', value: '#007bff', scope: ':root', rule: Node }]
// Find specific variable
const primaryColor = style.cssFindVariable('--primary-color');
// Get .card rules (shallow - no nested)
const rules = style.cssFindRules('.card', { shallow: true });
// Convert to CSS string without nested rules
const css = style.cssToString(rules, { includeNestedRules: false });
// ".card { background: white; padding: 1rem; }"Special Tag Handling
The parser treats certain tags specially:
- Void elements (
img,br,hr,input, etc.): No closing tag created - Style tags: Contents parsed as CSS
- Script tags: Can be configured via
specialTagsparameter
const parser = new SimpleHtmlParser(['script', 'custom-tag']);Node Structure
The parser creates a tree where:
- Opening and closing tags are siblings in the parent's children array
- Element content is in the opening tag's
childrenarray - Text nodes (including whitespace) are preserved
Example:
<div>
<p>Hello</p>
</div>Becomes:
root
└─ <div>
├─ text "\n "
├─ <p>
│ └─ text "Hello"
├─ </p>
├─ text "\n"
└─ </div>Performance Considerations
- Regex patterns are extracted to module-level constants for reuse
- Whitespace-only text nodes are only checked during manipulation, not parsing
- Methods use private helpers to avoid duplication
License
Common Clause with MIT
Contributing
Contributions welcome! Please ensure all tests pass and add tests for new features.
Author
Christopher Keers - caboodle-tech
