npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

nlp-escape

v1.0.1

Published

A generic pre escape and post recover text tags for NLP/ML pipelines - Use at your own risk !

Readme

NLP Escape

A generic pre escape and post recover text tags for NLP/ML pipelines - Use at your own risk !

Context

NLP (natural language processing) by definition deals with human language as it is spoken and written. We are considering any written language here.
In the same time, text to be processed could be tagged in many ways. HTML, XML, POS tagging, etc.

Since you are here, You are most likely using external NLP libraries that has their own logics and considerations; ie, external libraries don't know about your data ! and You should remove tags before processing the natural language

But why you need to remove tags before processing ? Simply because it is very likely that tags would confuse or lower machine learning efficiency.

Again, their might be some solution to some case and library, but -I think there is no general solution to strip tags from text, do the processing then recover the initial structure. It is hard to think of a general solution because I think there is simply no one. But why ?

Having:

Hello world <tag>blablablabla hellooooo</tag> and so on. That was a vEⓡ𝔂 𝔽𝕌Ňℕy ţ乇𝕏𝓣. Not only, I am leaving my credit card for You: <red>4929 9425 8354 2322 - Visa</red> here you have it !

You can think of indexing tags, processing text then recover tags. But NLP is more than capitalizing text, imagine you are using doing these processing:

These nice libraries will change text, and some of them will shrink or grow its size !

NLP-Escape is a generic solution that will make your life easier :)

Considerations

NLP-Escape simply maps and replaces each tag with a unique codification using the Null character and the null character in JavaScript is \0.
My first version encodes text by replacing tags with a succession of \0 (this might change in future versions). As you have understood, this assumes there are no \0 already in the initial text and comes with the obvious costs:

  • Replacing a tag with a succession of \0 could shrink or grow the text to be processed.
  • The succession of \0 might confuse the NLP libraries as well (just like tags themselves). But I think this is unlikely to happen (null character is rarely dealt with in one way or another) so YOU MUST DO TESTS TO VALIDATE THIS
  • Because of the first consideration, this is subject to some kind of zip bombing attack ! a text like <a><b><c>...<i>...<n> would grow considerably.
  • The remedy to this is the construction of a limited tags dictionary (this is what we do here ! You specify tags to be escaped manually).