npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2024 – Pkg Stats / Ryan Hefner

htmlclean

v3.0.8

Published

Simple and safety HTML/SVG cleaner to minify without changing its structure.

Downloads

31,620

Readme

htmlclean

npm GitHub issues dependencies license

Since v3, the CLI was separated into htmlclean-cli.


Simple and safety HTML/SVG cleaner to minify without changing its structure.

For example, more than two whitespaces (even if those are divided by tags) in a line are reduced.

Before:

<p>The <strong> clean <span> <em> HTML is here. </em> </span> </strong> </p>

After:

<p>The <strong>clean <span><em>HTML is here.</em></span></strong></p>

The whitespace that was on the right side of the <strong> was removed, and one on the left side was kept. And whitespaces on the both side of the <em> were removed.

For example, unneeded whitespaces in path data of SVG are reduced. In the case of this SVG file, 4,784 bytes were reduced without changing its structure:

Removing

htmlclean removes following texts.

  • Leading and trailing whitespaces (tabs and line-breaks are included)
  • Unneeded whitespaces between HTML/SVG tags
  • More than two whitespaces (reduced to one space)
  • HTML/SVG comments
  • Unneeded whitespaces, meaningless zeros, numbers, signs, etc. in path data of SVG (e.g. d attribute of path element, path attribute of animateMotion element, etc.)

Protecting

Following texts are protected (excluded from the Removing list).

  • Texts in textarea, script and style elements, and text nodes in pre elements
  • Quoted texts in tag attributes except path data of SVG
  • Texts in SSI tags (PHP, JSP, ASP/ASP.NET and Apache SSI)
  • IE conditional comments (e.g. <!--[if lt IE 7]>)
  • Texts between <!--[htmlclean-protect]--> and <!--[/htmlclean-protect]-->
  • Texts that is matched by protect option

Usage

cleanCode = htmlclean(sourceCode[, options])

require('htmlclean') returns a Function. This Function accepts a HTML/SVG source code, and returns a clean HTML/SVG source code. You can specify an options Object for second argument (see Options).

var htmlclean = require('htmlclean');
html = htmlclean(html);

// Or
html = require('htmlclean')(html);

Options

You can specify an options Object for second argument. This Object can have following properties.

protect

Type: RegExp or Array

Texts which are matched to this RegExp are protected in addition to the Protecting list. Multiple RegExps can be specified via an Array.

unprotect

Type: RegExp or Array

Texts which are matched to this RegExp are cleaned even if those text are included in the Protecting list. Multiple RegExps can be specified via an Array.

For example, a HTML source code as template in <script type="text/x-handlebars-template"> is cleaned via following code:

html = htmlclean(html, {
  unprotect: /<script [^>]*\btype="text\/x-handlebars-template"[\s\S]+?<\/script>/ig
});

The x-handlebars-template in the type attribute above is a case of using Template Framework Handlebars. e.g. AngularJS requires ng-template instead of it.

NOTE: The RegExp has to match to a text which is not a part of protected texts. For example, the RegExp matches a color: red; in a <style> element, but this is not cleaned because all texts in the <style> element are protected. A color: red; is a part of the protected text. The RegExp has to match to a text which is all of a <style> element like /<style[\s\S]+?<\/style>/.

edit

Type: Function

This Function more edits the HTML/SVG source code.
Protected texts are hidden from the HTML/SVG source code, and the HTML/SVG source code is passed to this Function. Therefore, this Function doesn't break the protected texts. The HTML/SVG source code which returned from this Function is restored.

NOTE: Markers \fID\x07 (\f is "form feed" \x0C code, \x07 is "bell", ID is number) are inserted to the HTML/SVG source code instead of protected texts. This Function can remove those markers, but can't add new markers. (Invalid markers will be just removed.)

Example

See a source HTML file and result HTML files in the examples directory.

var htmlclean = require('htmlclean'),
  fs = require('fs'),
  htmlBefore = fs.readFileSync('./before.html', {encoding: 'utf8'});

var htmlAfter1 = htmlclean(htmlBefore);
fs.writeFileSync('./after1.html', htmlAfter1);

var htmlAfter2 = htmlclean(htmlBefore, {
  protect: /<\!--%fooTemplate\b.*?%-->/g,
  unprotect: /<script [^>]*\btype="text\/x-handlebars-template"[\s\S]+?<\/script>/ig,
  edit: function(html) { return html.replace(/\begg(s?)\b/ig, 'omelet$1'); }
});
fs.writeFileSync('./after2.html', htmlAfter2);

Note

Malformed Nested Tags, and Close Tags in Script

htmlclean may not be able to parse malformed nested tags like <p>foo<pre>bar</p>baz</pre> precisely. Also, close tags in script code such as <script>var foo = '</script>';</script>, ?> in PHP code, etc..
Some language parsers also mistake by those, then they recommend us to write code like '<' + '/script>'. This is better even if htmlclean is not used.

SSI Tags in HTML Comments

htmlclean removes HTML/SVG comments that include SSI tags like <!-- Info for admin - Foo:<?= expression ?> -->. I think it's no problem because htmlclean is used to minify HTML. If that SSI tag includes a important code for logic, use protect option, or <!--[htmlclean-protect]--> and <!--[/htmlclean-protect]-->.

htmlclean Job

htmlclean never changes structure of document even if elements or attributes look like meaningless, because those might be used by your program, and the structuring is not job htmlclean should do. It should prevent unexpectedly breaking the data after all your efforts.
If you would like to enforce rules relating to code style, check out documents such as code style guide.
Also, htmlclean supposes valid HTML code. Since htmlclean never checks the syntax, it might not work correctly when wrong document was passed. (Also: Malformed Nested Tags, and Close Tags in Script)

See Also

If you want to control details of editing, HtmlCompressor, HTMLMinifier and others are better choice.


Thanks for images: Wikimedia Commons