npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

x-html-refine

v1.0.1

Published

将复杂的 HTML 内容结构清洗为标准的富文本结构,主要用于富文本编辑器,爬虫,AI理解场景

Readme

x-html-refine

一、介绍

x-html-refine 是一个面向Web清洗HTML结构数据的引擎,可以将复杂不规则的HTML内容节点清洗为简单规范的内容HTML节点,使用场景多数为富文本编辑器,数据抓取清洗,AI数据理解等场景

功能

  • 不规则HTML节点清洗为标准简单的HTML节点
  • 模块化设计,支持按需引入
  • 良好的 TypeScript 支持

二、使用

安装

npm install x-html-refine
# 或
yarn add x-html-refine
# 或
pnpm add x-html-refine

调用

import { HtmlRefine } from "x-html-refine";
// import { JSDOM } from "jsdom"; // NODE 环境需要,在WEB环境无需

const refine = new HtmlRefine({
    // dom_parser: new JSDOM().window.DOMParser(), // NODE 环境需要,在WEB环境无需
    is_base_64: true, // 默认为 true
    right_align_prefix:[
        "来源:","来源:","来源 :","图文:","图文:",
        "图文 :","图片:", "(原载","(图/文","文:",
        "图:","图:","(文/","(文/图","(图文:",
        "文字:","图/文:","(文/图:","文/图",
        "【图/文:","(供稿:","院办:","编辑:",
        "供稿:","(文图:","文图:","(文:","供稿者:"
    ] // 默认为 []
});

const html = `
    <div>
      <div>内容一</div>
      <sapn>内容二</sapn>
      <div>
        <a href="/a.html">我是超链接</a>
      </div>
      <span>
        <span>
          <div>多 span 嵌套</div>
        </span>
      </span>
      <p>我是正常文本</p>
      <p>
        我是
        <span>分开的文本</span>
        <span>呀!</span>
      </p>
      <img src="/1.png" alt="" />
      <video src="/1.png" alt=""></video>
      <table>
        <tr>
          <th>标题 1</th>
          <th>标题 2</th>
          <th>标题 3</th>
        </tr>
        <tr>
          <td>内容 1</td>
          <td>内容 2</td>
          <td>内容 3</td>
        </tr>
      </table>
      <table>
        <tr>
          <td>
            <p>我是表格布局的内容 1</p>
            <p>我是表格布局的内容 2</p>
            <p>我是表格布局的内容 3</p>
            <img src="/表格里的图片.png" alt="" />
          </td>
        </tr>
      </table>
    </div>
`

// 处理回复为 HTML Text
const disposeHtmlText = refine.dispose(html);
// 处理回复为 HTML DOM
// const disposeHtmlDom = refine.disposeDOM(html);

// 打印结果
console.log(disposeHtml); 

// 结果示例
{
    html:`
      <p>内容一</p>
      <p>内容二</p>
      <p><a href="/a.html" target="_blank">我是超链接</a></p>
      <p>多 span 嵌套</p>
      <p>我是正常文本</p>
      <p>我是分开的文本呀!</p>
      <p><img src="/1.png" alt=""></p>
      <p><video src="/1.png" poster="" controls=""></video></p>
      <p>
      <table>
        <tbody>
          <tr>
            <th>标题 1</th>
            <th>标题 2</th>
            <th>标题 3</th>
          </tr>
          <tr>
            <td>内容 1</td>
            <td>内容 2</td>
            <td>内容 3</td>
          </tr>
        </tbody>
      </table>
      </p>
      <p>我是表格布局的内容 1</p>
      <p>我是表格布局的内容 2</p>
      <p>我是表格布局的内容 3</p>
      <p><img src="/表格里的图片.png" alt=""></p>
    `
}