literate-regex

v0.6.11

Published

3 months ago

Write readable (literate) regex sources with # comments, then normalize them to JS RegExp.source at the type level.

0High
0Medium
0Low

motrohi

regex regexp literate types typescript pcre jsdoc

npm bundle size npm npms.io

literate-regex

A literate, typed JavaScript regex toolkit — powered by TypeScript.

Write regex like a human. Ship it like a machine.

✨ Motivation

Regular expressions are absurdly powerful — a tiny automaton you can carry in your pocket.
But once a regex grows beyond “a few tokens and a prayer”, it becomes:

hard to read
easy to break
painful to maintain

This library exists to keep regexes literate.

Write a multi-line, commented, PCRE-style regex source (with # notes), then normalize it into a compact JavaScript RegExp.source while preserving the normalized source as a TypeScript string literal type.

That gives you two superpowers:

Human-friendly editing (readable formatting + comments)
Machine-friendly safety (typed, normalized sources that flow through your codebase)

In short: fewer regex jump-scares, more confidence.

Before

const re = /^(?:\s*\/\*\*\s+|\s+\*?\s+)(?:(?=@(...))|...)/gm;

After

const RE_SOURCE = `
/^         # start
(?: ... )  # jsdoc start
...        # more notes
/gm` as const;

const re = compilePCREStyleRegExpLiteral(RE_SOURCE);

✨ Features

PCRE-ish style regex source:
- multi-line formatting
- # ... line comments
- \# escape for literal #
Type-level normalization:
- derive normalized JS RegExp.source as a string literal type
Optional global augmentation:
- opt-in only (import "literate-regex/global")
Designed to reduce TypeScript instantiation pain:
- line-oriented normalization (helps avoid ts(2589) compared to naive full-string scanning)

📦 Install

npm i literate-regex
# or
pnpm add literate-regex
# or
yarn add literate-regex

🚀 Quick Start

import { PCREStyleToJsRegExpSource } from "literate-regex";

// Only those who want to expand globally
import "literate-regex/global";

🧠 Type-level normalization

1) Write a readable PCRE-style source

# starts a line comment (unless escaped)
\# is kept as a literal #
whitespace characters are stripped during normalization

import type { PCREStyleToJsRegExpSource } from "./literate-regex";

// sample 1
const RE_SOURCE = `
^           # start
(?:\\#\\w+) # literal "#"
\\s+        # whitespace
` as const;

// type JsSource = "^(?:#\\w+)\\s+" 
type JsSource = PCREStyleToJsRegExpSource<typeof RE_SOURCE>;

Tip: You must use as const to preserve the source as a string literal type.

🔧 Runtime normalization (optional)

PCREStyleToJsRegExpSource<...> is purely type-level. If you also normalize at runtime, mirror the same rules:

import { normalizePCREStyleSource } from "literate-regex";
// import type { PCREStyleToJsRegExpSource } from "literate-regex";

// sample 2
const src = `
^        # start
\\#\\w+  # literal
` as const;

// '^#\\w+'
// const normalized: "^#\\w+"
const normalized = normalizePCREStyleSource(src);

🔧 Runtime creation Compile PCRE Style RegExpLiteral

import {
  TypedRegExp,
  // normalizePCREStyleSource,
  compilePCREStyleRegExpLiteral,
} from "literate-regex";
import type {
  RegExpLiteralParts,
  PCREStyleToJsRegExpSource,
  RegExpExecArrayFixedPretty,
  ReplacerFunctionSignature,
} from "literate-regex";

//
// sample of compilePCREStyleRegExpLiteral
//
const pcreStyledRegex = `/
(\\(\\?\#[\\s\\S]*?(?<!\\\\)\\)(?=\\s*$|.))         # multi line comment
|
(?:^(?:\\s+|))?(?<![\\\\])(\\#(?:\\s|[\\s\\S])*?$)  # single line comment
|
(?<regexFragment>
  (?:^\\s+)?(?:[^\\s]+)
)+                                                  # regex flagment
|
([\\r|\\r\\n|\\n]+|[\\x20\\t]+(?=$)?)               # whitespaces
/gm`;

const jsRegex = compilePCREStyleRegExpLiteral(pcreStyledRegex);

type TPcreStyledRegex = typeof pcreStyledRegex;
type TJsRegexSource = PCREStyleToJsRegExpSource<TPcreStyledRegex>;
type TJsRegexLiteralParts = RegExpLiteralParts<TJsRegexSource>;

type TJsRegexExecArray = RegExpExecArrayFixedPretty<
  TypedRegExp<TJsRegexLiteralParts["pattern"]>
>;
type TJsRegexStringReplacer = ReplacerFunctionSignature<
  TypedRegExp<TJsRegexLiteralParts["pattern"]>
>;
let m = jsRegex.exec(pcreStyledRegex);
type Test0 = TJsRegexExecArray extends typeof m ? true : false;
type Test1 = typeof m extends TJsRegexExecArray ? true : false;

const replacer: TJsRegexStringReplacer = (...args) => "";
pcreStyledRegex.replace(jsRegex, replacer);
pcreStyledRegex.replace(jsRegex, "");

🌍 Global augmentation (opt-in)

This package provides an optional global augmentation entry:

import "literate-regex/global";

This is intentionally opt-in to avoid unexpected type pollution across projects.

⚠️ Notes & limitations

This is not a full PCRE parser. It focuses on:
- line comments (# ...)
- escaping \#
- whitespace stripping
Very large type-level inputs may still hit TS limits depending on your environment. If that happens, split your regex source into smaller pieces.

📚 References

This library’s whitespace set is based on the ECMAScript definition used by RegExp \s (WhiteSpace ∪ LineTerminator).

ECMA-262: White Space (Table 33) https://tc39.es/ecma262/#sec-white-space
ECMA-262: Line Terminators (Table 34) https://tc39.es/ecma262/#sec-line-terminators
MDN: RegExp character classes (\s equivalence) https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Regular_expressions/Character_classes

📜 License

Released under the Apache-2.0 License.
See LICENSE for details.

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

literate-regex

✨ Motivation

Before

After

✨ Features

📦 Install

🚀 Quick Start

🧠 Type-level normalization

1) Write a readable PCRE-style source

🔧 Runtime normalization (optional)

🔧 Runtime creation Compile PCRE Style RegExpLiteral

🌍 Global augmentation (opt-in)

⚠️ Notes & limitations

📚 References

📜 License