npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2025 – Pkg Stats / Ryan Hefner

parse5-to-json

v0.0.1-alpha.6

Published

使用 parse5 将 HTML 转换成自定义的 JSON 格式。

Readme

parse5-to-json

使用 parse5 将 HTML 转换成自定义的 JSON 格式。

parse5 内置了一个 defaultTreeAdapter 用来将 HTML 转换成 JSON 格式的 AST。defaultTreeAdapter 的效果可以直接在 AST Explorer 查看,下面是一个具体的例子。

想实现自定义的 JSON,可以参考 defaultTreeAdapter 重新实现一个 TreeAdapter,但是成本比较高,例如 parse5-htmlparser2-tree-adapter 就是一个实现了 TreeAdapter 的库,专门将 HTML 转成 htmlparser2 的 AST 格式。

因此我们直接将 parse5 的 AST 转成自己想要的格式,利用深度优先搜索的方式遍历 AST,然后将每个节点转成自定义的 JSON 格式就可以了。

简洁实现

简单实现一个转换逻辑:

export function ast2CustomizedJson(ast: any): any {
  if (ast.nodeName === '#document' || ast.nodeName === '#document-fragment') {
    return {
      type: 'doc',
      content: ast.childNodes.map(ast2CustomizedJson).filter(Boolean),
    }
  } else if (ast.nodeName === '#text') {
    return {
      type: 'text',
      text: ast.value,
    }
  } else if (ast.nodeName === '#comment') {
    return null
  } else {
    return {
      type: nodeName2Type[ast.nodeName],
      content: ast.childNodes.map(ast2CustomizedJson),
    }
  }
}

export const html2json = (html: string) => {
  const ast = parseFragment(html)

  return ast2CustomizedJson(ast)
}

就可以将原来的 AST 转换成自己想要的 JSON 格式:

{
  "type": "doc",
  "content": [
    {
      "type": "text",
      "text": "\n\n\n\n\n    "
    },
    {
      "type": "heading",
      "content": [
        {
          "type": "text",
          "text": "My First Heading"
        }
      ]
    },
    {
      "type": "text",
      "text": "\n    "
    },
    {
      "type": "paragraph",
      "content": [
        {
          "type": "text",
          "text": "My first paragraph."
        }
      ]
    },
    {
      "type": "text",
      "text": "\n\n\n\n"
    }
  ]
}

复杂逻辑

这里举几个特殊处理的情况作为例子,扩展转换逻辑。

<div class="ct-note" data-type="info" data-hidden-title style="text-align: center">
  <div class="ct-note-title"><span class="ct-note-title-text"></span></div>
  <div class="ct-note-content">内容</div>
</div>
{
  "type": "doc",
  "content": [
    {
      "type": "note",
      "content": [
        {
          "type": "text",
          "text": "\n  "
        },
        {
          "type": "note_title",
          "content": [
            {
              "type": "note_title_text",
              "content": [],
              "attrs": {}
            }
          ],
          "attrs": {}
        },
        {
          "type": "text",
          "text": "\n  "
        },
        {
          "type": "note_content",
          "content": [
            {
              "type": "text",
              "text": "内容"
            }
          ],
          "attrs": {}
        },
        {
          "type": "text",
          "text": "\n"
        }
      ],
      "attrs": {
        "type": "info",
        "hiddenTitle": true,
        "textAlign": " center"
      }
    },
    {
      "type": "text",
      "text": "\n"
    }
  ]
}

在这个例子中,增加了几个逻辑:

  1. data-* 属性解析成 attrs 字段,并且将中划线命名格式变成了驼峰格式。
  2. classct-* 格式的类名作为节点的 type 字段。
  3. style 属性解析成 attrs 字段,并且将样式名变成了驼峰格式。
export function ast2CustomizedJson(ast: any): any {
  if (ast.nodeName === '#document' || ast.nodeName === '#document-fragment') {
    // ...
  } else if (...) {
  // ...
  } else {
    let attrs: Record<string, string | boolean> = {}
    let type = null
    ast.attrs.forEach((attr: Attribute) => {
      if (attr.name.startsWith('data-')) {
        attrs[camelCase(attr.name.slice(5))] = attr.value || true
      } else {
        // 类名
        if (attr.name === 'class') {
          const classes = attr.value.split(' ')
          classes.forEach((className: string) => {
            if (className.startsWith('ct-')) {
              type = snakeCase(className.slice(3))
            }
          })
        }
        // 样式
        else if (attr.name === 'style') {
          const styles = attr.value.split(';')
          styles.forEach((style: string) => {
            const [key, value] = style.split(':')
            attrs[camelCase(key)] = value
          })
        }
      }

      return {
        type: type || nodeName2Type[ast.nodeName],
        content: ast.childNodes.map(ast2CustomizedJson),
        attrs
      }
    })
  }
}