npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2024 – Pkg Stats / Ryan Hefner

html5parser-fork

v1.1.5-beta

Published

A fast, accurate AST parser for HTML5

Downloads

5

Readme

html5parser

A simple and fast html5 parser, the result could be manipulated like ECMAScript ESTree, especially about the attributes.

Introduction

Currently, all the public parsers, like htmlparser2, parser5, etc, could not be used for manipulate attributes. For example: the htmlparser2 has startIndex and endIndex for tags and texts, but no range information about attribute name and values. This project is used for resolve this problem. Just added ranges for tags, texts, and attribute name and values, and else, with the information of attribute quote type, (without or with '/").

Install

# var npm
npm install html5parser -S

# var yarn
yarn add html5parser

Quick Start

import * as html from 'html5parser';

const input = `
<!DOCTYPE html>
<html>
  <body>
    <h1 id="hello">Hello world</h1>
  </body>
</html>
`;

const ast = html.parse(input);

html.walk(ast, {
  enter: (node) => {
    if (node.type === html.SyntaxKind.Tag) {
      for (const attr of node.attributes) {
        if (attr.value !== void 0) {
          // This is used for present the ranges of attributes.
          console.log(input.substring(attr.value.start, attr.value.end));
          // you can get the value directly:
          console.log(attr.value.value);
        }
      }
    }
  },
});

// Should output:
// hello

API

// Top level API, parse html to ast tree
export function parse(input: string, options?: ParseOptions): INode[];

export interface ParseOptions {
  // create tag's attributes map
  // if true, will set ITag.attributeMap property
  // as a `Record<string, IAttribute>`
  // see {ITag#attributeMap} bellow
  setAttributeMap: boolean;
}

// Low level API, get tokens
export function tokenize(input: string): IToken[];

// Utils API, walk the ast tree
export function walk(ast: INode[], options: IWalkOptions): void;

Abstract Syntax Tree Spec

  1. IBaseNode: the base struct for all the nodes:

    export interface IBaseNode {
      start: number; // the start position of the node (include)
      end: number; // the end position of the node (exclude)
    }
  2. IText: The text node struct:

    export interface IText extends IBaseNode {
      type: SyntaxKind.Text;
      value: string; // text value
    }
  3. ITag: The tag node struct

    export interface ITag extends IBaseNode {
      type: SyntaxKind.Tag;
      open: IText;
      name: string;
      attributes: IAttribute[];
      // the attribute map, if `options.setAttributeMap` is `true`
      // this will be a Record, key is the attribute name literal,
      // value is the attribute self.
      attributeMap: Record<string, IAttribute> | undefined;
      body:
        | Array<ITag | IText> // with close tag
        | undefined // self closed
        | null; // EOF before open tag end
      close:
        | IText // with close tag
        | undefined // self closed
        | null; // EOF before end or without close tag
    }
  4. IAttribute: the attribute struct:

    export interface IAttribute extends IBaseNode {
      name: IText; // the name of the attribute
      value: IAttributeValue | void; // the value of the attribute
    }
  5. IAttributeValue: the attribute value struct:

    // NOTE: the range start and end contains quotes.
    export interface IAttributeValue extends IBaseNode {
      value: string; // the value text, exclude leading and tailing `'` or `"`
      quote: "'" | '"' | void; // the quote type
    }
  6. INode: the exposed nodes:

    export type INode = ITag | IText;

Warnings

This is use for HTML5, that means:

  1. All tags like <? ... ?>, <! ... > (except for <!doctype ...>, case insensitive) is treated as Comment, that means CDATASection is treated as comment.
  2. Special tag names:
  • "!doctype" (case insensitive), the doctype declaration
  • "!": short comment
  • "!--": normal comment
  • ""(empty string): short comment, for <? ... >, the leading ? is treated as comment content

Benchmark

Thanks for htmlparser-benchmark, I created a pull request at pulls/7, and its result on my MacBook Pro is:

$ npm test

> [email protected] test ~/htmlparser-benchmark
> node execute.js

gumbo-parser failed (exit code 1)
high5 failed (exit code 1)

html-parser        : 28.6524 ms/file ± 21.4282

html5              : 130.423 ms/file ± 161.478

html5parser        : 2.37975 ms/file ± 3.30717

htmlparser         : 16.6576 ms/file ± 109.840

htmlparser2-dom    : 3.45602 ms/file ± 5.05830

htmlparser2        : 2.61135 ms/file ± 4.33535
hubbub failed (exit code 1)
libxmljs failed (exit code 1)

neutron-html5parser: 2.89331 ms/file ± 2.94316
parse5 failed (exit code 1)

sax                : 10.2110 ms/file ± 13.5204

License

MIT