npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

ez-regex-patterns

v1.0.2

Published

Readable, composable patterns that compile to native RegExp.

Readme

ez-regex-patterns

Readable, composable patterns that compile to a native RegExp.

Regex isn't unreadable because of its primitives — [\w-]+ is terse and fine. It's unreadable because it has no naming and no composition: one undifferentiated expression, no sub-parts you can name and reuse, capture groups far from their meaning.

This library adds exactly that and nothing else. You write fragments in a small, grammar-like language, name them as ordinary variables, and compose them by interpolation. The result compiles — once, at definition time — to a real regex.

Install

npm install ez-regex-patterns

Usage

import { pattern } from "ez-regex-patterns";

const alpha = pattern`'a'..'z' | 'A'..'Z'`;
const digit = pattern`'0'..'9'`;
const word  = pattern`${alpha} | ${digit} | '_' | '-'`;
const ident = pattern`${word}+`;

ident.source;   // "[a-zA-Z0-9_\\-]+"

A Pattern is usable anywhere a RegExp is — it carries the native surface (exec, lastIndex, test, source, flags, and the Symbol.replace/match/split hooks) and drives the String methods directly:

ident.test(input);
ident.exec(input);                         // and `pattern.lastIndex = i` to scan from a cursor
input.replace(ident, (m) => m.toUpperCase());

// The DSL also adds a parts-returning matcher:
ident.match(input);                        // { text, index, parts } | undefined
ident.matchAll(input);                     // every match, with parts

Patterns are flag-agnostic; ask for a flag only where you need it. .global, .ignoreCase, and .sticky each return a flag-flavoured copy (the node is shared, only the flag differs, and they compose):

const stmt = grammar`r = start 'use ' word+ ';'`.r;
input.replace(stmt.global, fix);           // every occurrence
tag.ignoreCase.test('<SCRIPT>');           // case-insensitive
scanner.sticky;                            // anchors each match at lastIndex

.toRegExp(flags) is still there for an explicit copy under a one-off flag set.

The emitter does the two things hand-written regex makes you do by eye:

  • Merges an alternation of single chars / ranges into one class — 'a'..'z' | 'A'..'Z' | '_' becomes [a-zA-Z_], not (?:[a-z]|[A-Z]|_).
  • Parenthesizes only where precedence demands it${word}+ becomes [a-zA-Z0-9_\-]+, and a quantified alternation is auto-grouped, so the + never silently binds to the last branch.

Grammar

| Form | Meaning | Example | | --------------------- | ---------------------------------- | ----------------------------- | | 'x' | literal | '=', '.' | | 'a'..'z' | character range | '0'..'9' | | char(202A) | character by Unicode codepoint | char(00A0) | | char(a)..char(b) | codepoint range | char(00C0)..char(00D6) | | a b | sequence (juxtaposition) | ${sigil} ${ident} | | a \| b | alternation | '.' \| '#' \| ':' | | ( … ) | grouping | ('=' ${ident})? | | a+ a* a? | quantifiers | ${word}+ | | a{n} a{n..} a{n..m} | bounded repetition | ${word}{1..6} | | !a | negate a char / class | !digit, !('a' \| '_') | | before x after x | lookahead / lookbehind (zero-width)| before ')' | | same name | backreference to a captured rule | quote ... same quote | | until x | lazy run up to (not eating) x | until '</script>' | | ${fragment} | splice another pattern as an atom | ${word} | | // … | line comment (stripped) | |

Whitespace between tokens is insignificant, so patterns can be laid out and annotated — the free-spacing mode regex literals never had.

char — codepoints and ranges

char(HEX) names a single character by its Unicode codepoint, emitting \uXXXX (BMP) or \u{…} (astral) — the readable way to put an otherwise-invisible character (a bidi control, a zero-width space) into a pattern. char(a)..char(b) is a codepoint range. Both are class-safe (they merge into […] and negate like a literal); an astral codepoint forces the engine's unicode (u) flag on automatically.

grammar`r = char(202A)`.r.source;                  // "\\u202a"
grammar`r = char(00C0)..char(00D6)`.r.source;      // "[\\u00c0-\\u00d6]"

Bounded quantifiers

Beyond + * ?, {…} gives explicit counts, reusing .. for the range (decimal, not hex like char):

grammar`r = ${word}{3}`.r.source;     // "…{3}"      exactly 3
grammar`r = ${word}{2..}`.r.source;   // "…{2,}"     2 or more
grammar`r = ${word}{2..5}`.r.source;  // "…{2,5}"    2 to 5
grammar`r = ${word}{..5}`.r.source;   // "…{0,5}"    up to 5

Lookaround

before x / after x are zero-width positive lookahead / lookbehind; ! in front gives the negatives:

grammar`r = before ')'`.r.source;     // "(?=\\))"
grammar`r = !before ')'`.r.source;    // "(?!\\))"
grammar`r = after '('`.r.source;      // "(?<=\\()"
grammar`r = !after '('`.r.source;     // "(?<!\\()"

Backreference

same name re-matches the exact text an earlier capture (a rule reference) of that name matched — emits \k<name>, and only means anything inside a grammar where the named capture exists:

const quoted = grammar`
  quote = '"' | char(27)
  r = quote 'hi' same quote
`.r;
quoted.test('"hi"');   // true   — closing quote must match the opener
quoted.test('"hi\'');  // false

Negation

!a matches one character that is not a. It compiles to a negated character class, so its operand must be a single character or a class — a shorthand, a literal, a range, or an alternation of those:

pattern`!'a'..'z'`.source;          // "[^a-z]"
pattern`!('a' | '_')`.source;       // "[^a_]"
grammar`x = !digit`.x.source;       // "\\D"  — named terminals resolve inside a grammar
grammar`x = !whitespace+`.x.source; // "\\S+" — a trailing quantifier binds to the negation

Negating a sequence (!('a' 'b')) is an error: regex has no single consuming token for "not the string ab" — that's a negative lookahead (!before 'ab').

until

until x matches the shortest run of any characters (newlines included) up to the first x, without consuming x — the readable form of the [\s\S]*? block-grab you'd otherwise hand-write:

const g = grammar`
  body   = until '</script>'
  script = '<script>' body '</script>'
`;
g.body.source;                                  // "[\\s\\S]*?(?=</script>)"
g.script.match("<script>a</script>b</script>").parts.body;   // "a"  — stops at the first

Built-in terminals

A handful of names resolve to common character classes and zero-width anchors without you defining them. They emit inline (a terminal, never a capture), and a rule of the same name in your own grammar shadows the built-in:

| Name | Compiles to | Matches | | ------------ | ----------- | -------------------------------- | | whitespace | \s | space, tab, newline, … | | digit | \d | 09 | | word | \w | [A-Za-z0-9_] | | letter | [a-zA-Z] | an ASCII letter | | boundary | \b | a word boundary (zero-width) | | start | ^ | start of input/line (zero-width) | | end | $ | end of input/line (zero-width) |

Built-in terminals only resolve inside a grammar block. A bare pattern`…` has no terminal scope — it composes by interpolating fragments you defined yourself (pattern`${word}+`), so pattern`boundary` throws unresolved reference.

Unicode property classes

Unicode.* is a namespace of engine-supplied character categories — the General_Category set plus the identifier properties — that can't be enumerated by hand. Each emits a \p{…} escape (class-safe, so it merges and negates like a shorthand) and forces the u flag on:

grammar`r = Unicode.Letter.Uppercase`.r.source;   // "\\p{Lu}"
grammar`r = Unicode.Number.Decimal`.r.source;     // "\\p{Nd}"
grammar`r = Unicode.Identifier.Start`.r.source;   // "\\p{ID_Start}"

Names follow the categories: Unicode.Letter(.Uppercase/.Lowercase/…), Unicode.Number, Unicode.Mark, Unicode.Punctuation, Unicode.Symbol, Unicode.Separator, Unicode.Other, and Unicode.Identifier.Start/.Continue.

Grammar blocks & captures

A grammar block defines named rules that reference each other by bare name. A reference becomes a named capture, so a match hands back parts keyed by rule name — no separate capture syntax, the rule name is the key.

import { grammar } from "ez-regex-patterns";

const g = grammar`
  alpha = 'a'..'z' | 'A'..'Z'
  digit = '0'..'9'
  word  = alpha | digit | '_' | '-'
  ident = word+
  sigil = '.' | '#' | ':'
  name  = ident
  value = ident
  part  = sigil name ('=' value)?
`;

const m = g.part.match("#hp=low");
m.parts;   // { sigil: "#", name: "hp", value: "low" }

A rule reused under two names (name, value both ident) gets distinct keys, and nested references flatten to plain regex so they never collide. Recursion is rejected — regex can't recurse.

License

MIT