@fajarnugraha37/ts-rex
v2.0.0
Published
A TypeScript library for building Regex patterns.
Downloads
390
Maintainers
Readme
Type-Safe Regex (TS-Rex) without the headache
Tech Stack
: The core logic and advanced generic type system.
: Used for dependency management, test running, and linting.
- tsup: Bundler for UMD, CJS, ESM, and type declarations.
- expect-type: Static type assertion utility for unit testing.
Architecture
TS-Rex is not intended to beat raw RegExp literals in hot-path microbenchmarks. Its primary value is correctness, composability, type-safe captures, and safer regex construction.
[!NOTE] In most common scenarios, the overhead is measured in nanoseconds. However, the Bun extremely-complex case exposes a remaining optimization target, likely around generated pattern shape, capture access strategy, or JavaScriptCore-specific object allocation behavior.
TS-Rex is built upon four architectural pillars:
- AST Generation: Instead of manipulating strings that can easily become malformed, every chained method appends an AST node to an internal array.
- Immutability: Each method call creates and returns a completely new
RegexBuilderinstance. This allows you to safely branch regex definitions (e.g., saving a base pattern into a variable and extending it multiple times) without unintended side effects. - Phantom Type State: As you chain methods like
.capture(),.optional(), or.or(), TypeScript infers and records the resulting group names and optionality within the generic state. This happens entirely during compilation, imposing zero runtime memory overhead. - Runtime Compilation: Calling
.compile()collapses the AST into a native JavaScriptRegExpinstance and binds the execution context flags, returning a strict type-safe execution wrapper.
Features
- Fluent API: Chain methods to build regex patterns programmatically.
- Static Type Inference: Named capturing groups are automatically inferred as strongly-typed objects at compile time.
- Stateless Execution: The
.exec()wrapper is externally stateless: repeated calls do not leaklastIndexstate to the caller. Internally, single-match execution may reuse a private, cachedRegExpinstance for performance. Global iteration uses an isolatedRegExpinstance per iterator to avoid cross-iterator state corruption. - Deep Optionality: Quantifiers (
.optional(),.zeroOrMore()) and alternations (.or()) correctly map captured properties toPartialor union types. - Zero Dependencies: Built entirely on standard TypeScript and native
RegExp.
Core Entities
To effectively use this library, you should understand these core entities:
rx(): The factory function that initializes a fresh, empty builder.RegexBuilder: The immutable builder class. Exposes dozens of chainable, strictly-typed methods (.literal(),.digit(),.capture(), etc.).CompiledRegex: The object returned by.compile(). It contains:pattern: The raw string representation of the regex.native: The native JavaScriptRegExpinstance.exec(string): The type-safe extractor method.
MatchResult: The discriminated union returned by.exec().- On failure:
{ isMatch: false, match: null, [captureName]: undefined } - On success:
{ isMatch: true, match: string, ...[Your Captures] }. (If the.global()flag is set, this becomes anIterableIteratorof successful matches).
- On failure:
Philosophy: Auto-Escaping and Safety First
To protect from malformed regular expressions, ts-rex heavily enforces Automatic Escaping.
If you use .literal('http://') or .anyOf('a-z'), the library will automatically escape all special regex characters. For example, rx().anyOf('a-z') complies to [a\-z], meaning it searches for the literal characters "a", "-", and "z", not a range.
[!WARNING]
Do not attempt to inject raw regex strings into builder methods. To build complex character classes (like[a-zA-Z0-9.-]), you must compose them using the type-safe methods:// Correct way to compose ranges const myClass = rx() .range("a", "z") .or(rx().range("A", "Z")) .or(rx().range("0", "9")) .or(rx().anyOf(".-")); // Compiles to: (?:(?:(?:[a-z]|[A-Z])|[0-9])|[.\-])This verbose compilation behaves 100% identically to
[a-zA-Z0-9.-]in regex engines but guarantees syntactic safety.
[!NOTE] If you find the strict composition syntax too limiting and need to inject raw, unescaped regex strings,
ts-rexprovides two escape hatches for power users:
.rawClass(str: string): Generates[str]exactly as typed without any auto-escaping protection. (Example:rx().rawClass('a-zA-Z0-9.-')->[a-zA-Z0-9.-])..raw<NewCaptures>(str: string): Allows you to freely inject any raw regex pattern directly into the AST. The optional generic parameter allows you to manually register named capture groups for full type safety.const parser = rx().raw<{ userId: string }>("(?<userId>\\d+)").compile();
Getting Started
Installation
# via npmjs
bun add @fajarnugraha37/ts-rex
npm install @fajarnugraha37/ts-rex
pnpm add @fajarnugraha37/ts-rex
# via jsr
bunx jsr add @fajar/ts-rex
npx jsr add @fajar/ts-rex
pnpm i jsr:@fajar/ts-rex[!NOTE]
This library requires TypeScript version 5.0 or higher for full advanced type inference support.
Usage
Basic Capturing
Named capture groups automatically populate the output signature of .exec().
import { rx } from "@fajarnugraha37/ts-rex";
// Build a pattern
const pattern = rx()
.startOfInput()
.capture("firstName", rx().oneOrMore(rx().wordChar()))
.whitespace()
.capture("lastName", rx().oneOrMore(rx().wordChar()))
.endOfInput()
.compile();
// Execute the pattern
const result = pattern.exec("John Doe");
if (result.isMatch) {
// Types are fully inferred based on the captures defined above
console.log(result.firstName); // "John"
console.log(result.lastName); // "Doe"
console.log(result.match); // "John Doe" (The full match)
}Global Iteration
Flags dynamically modify the return type of .exec(). Using the .global() flag changes the result from a single object to an IterableIterator. Execution is fully stateless to avoid native lastIndex bugs.
const pattern = rx()
.capture("num", rx().oneOrMore(rx().digit()))
.global()
.compile();
const results = pattern.exec("I have 3 apples and 42 bananas");
for (const result of results) {
console.log(result.num); // "3", "42"
}Match Indices
Using the .withIndices() flag (ECMAScript d flag) injects an indices property containing [start, end] tuples for the full match and every captured group.
const pattern = rx().capture("val", rx().wordChar()).withIndices().compile();
const result = pattern.exec("a");
if (result.isMatch) {
console.log(result.indices.match); // [0, 1]
console.log(result.indices.val); // [0, 1]
}Alternation and Optionality
Using .or() creates a union type, ensuring mutually exclusive access to capturing groups.
const pattern = rx()
.capture("a", rx().literal("A"))
.or(rx().capture("b", rx().literal("B")))
.compile();
const result = pattern.exec("A");
if (result.isMatch) {
// TypeScript enforces that either 'a' is a string and 'b' is undefined, or vice versa.
console.log(result.a);
}Supported Regex Operations and Tokens
ts-rex supports almost the entire ECMAScript (ES2024) Regular Expression syntax.
Assertions and Boundaries
| Regex | API Method | Description |
| :------- | :----------------------------- | :------------------------------------------- |
| ^ | .startOfInput() | Matches the beginning of the input. |
| $ | .endOfInput() | Matches the end of the input. |
| \b | .wordBoundary() | Matches a word boundary. |
| \B | .nonWordBoundary() | Matches a non-word boundary. |
| (?=y) | .lookahead(builder) | Matches only if followed by the pattern. |
| (?!y) | .negativeLookahead(builder) | Matches only if NOT followed by the pattern. |
| (?<=y) | .lookbehind(builder) | Matches only if preceded by the pattern. |
| (?<!y) | .negativeLookbehind(builder) | Matches only if NOT preceded by the pattern. |
Character Classes and Escapes
| Regex | API Method | Description |
| :--------- | :-------------------------- | :------------------------------------------------------- |
| . | .anyChar() | Matches any single character. |
| \d | .digit() | Matches any digit (0-9). |
| \D | .notDigit() | Matches any character that is not a digit. |
| \w | .wordChar() | Matches any alphanumeric character. |
| \W | .notWordChar() | Matches any non-word character. |
| \s | .whitespace() | Matches a single white space character. |
| \S | .notWhitespace() | Matches a single non-white space character. |
| [abc] | .anyOf('abc') | Matches any enclosed character (auto-escapes internals). |
| [^abc] | .noneOf('abc') | Matches anything not enclosed. |
| [a-z] | .range('a', 'z') | Matches a character in the specified range. |
| \xNN | .hex('NN') | Matches a character by its 2-digit hex code. |
| \uNNNN | .unicodeChar('NNNN') | Matches a character by its 4-digit Unicode hex value. |
| \u{N} | .unicodeCodePoint('NNNN') | Matches a Unicode code point. |
| \p{P} | .unicodeProperty('...') | Matches a character based on its Unicode category. |
| \n, \t | .newline(), .tab(), etc | Named control characters. |
Quantifiers
| Regex | API Method | Description |
| :------ | :------------------------ | :---------------------------------------------------------- |
| * | .zeroOrMore(builder) | Matches 0 or more times. Maps nested captures to Partial. |
| + | .oneOrMore(builder) | Matches 1 or more times. |
| ? | .optional(builder) | Matches 0 or 1 times. Maps nested captures to Partial. |
| {n} | .times(n, builder) | Matches exactly "n" occurrences. |
| {n,} | .atLeast(n, builder) | Matches at least "n" occurrences. |
| {n,m} | .between(n, m, builder) | Matches between "n" and "m" occurrences. |
| *? | .lazy() | Appended to quantifiers to make them non-greedy. |
Groups and Logic
| Regex | API Method | Description |
| :-------- | :----------------------- | :---------------------------------------------------------- |
| (?:x) | .group(builder) | Non-capturing group. |
| (?<N>x) | .capture('N', builder) | Named capturing group. Extracts to the TS output object. |
| \k<N> | .matchPrevious('N') | Matches exact text captured previously. Statically checked. |
| x\|y | .or(builder) | Matches either branch. Resolves to a TS Union type. |
Flags
| Regex | API Method | Description |
| :------------ | :------------------------------------------ | :--------------------------------------------------------------------- |
| g | .global() | Global iteration. Changes .exec() return type to IterableIterator. |
| i | .ignoreCase() | Case-insensitive match. |
| m | .multiline() | Modifies ^ and $. |
| s | .dotAll() | Allows . to match newlines. |
| d | .withIndices() | Appends .indices tuple objects into the .exec() return type. |
| v, y, u | .unicodeSets(), .sticky(), .unicode() | Other modern ES context flags. |
Advanced Example: URL Parser
This example demonstrates how nested captures, alternations, deep optionality, and character classes seamlessly merge into a strictly typed result object. Notice how we compose small regex builders into larger ones.
import { rx } from "@fajarnugraha37/ts-rex";
// Matches 'http' or 'https'
const protocol = rx().capture(
"protocol",
rx().literal("http").optional(rx().literal("s")),
);
// Combine ranges and specific characters safely
const alphanumeric = rx()
.range("a", "z")
.or(rx().range("A", "Z"))
.or(rx().range("0", "9"));
// Password allows alphanumeric and special characters
const passwordChars = alphanumeric.or(rx().anyOf("!@#$%^&*"));
const auth = rx().capture(
"auth",
rx()
.capture("username", rx().oneOrMore(rx().wordChar()))
.literal(":")
.capture("password", rx().oneOrMore(passwordChars))
.literal("@"),
);
// Domain allows lowercase letters, numbers, dot, and hyphen
const domainChars = rx()
.range("a", "z")
.or(rx().range("0", "9"))
.or(rx().anyOf(".-"));
const urlParser = rx()
.startOfInput()
.group(protocol)
.literal("://")
.optional(auth) // Automatically makes auth, username, and password types Partial
.capture("domain", rx().oneOrMore(domainChars))
.optional(rx().literal(":").capture("port", rx().oneOrMore(rx().digit())))
.optional(
rx().literal("/").capture("path", rx().zeroOrMore(rx().notWhitespace())),
)
.endOfInput()
.compile();
const parsed = urlParser.exec(
"https://admin:[email protected]:8080/v1/users",
);
if (parsed.isMatch) {
// Types are fully mapped based on `.optional()` wrappers!
console.log(parsed.protocol); // "https" (Type: string)
console.log(parsed.domain); // "api.example.com" (Type: string)
// Auth details are typed as string | undefined because of `.optional(auth)`
if (parsed.auth) {
console.log(parsed.username); // "admin"
console.log(parsed.password); // "secret123"
}
console.log(parsed.port); // "8080" (Type: string | undefined)
console.log(parsed.path); // "v1/users" (Type: string | undefined)
}Structure
The project relies on interface inheritance and declaration merging across multiple files to maintain a fluent API while keeping files small.
/src/core/builder.ts: Contains the foundationalRegexBuilderclass and compilation logic./src/core/types.ts: Centralized types and the core interface extended by modular syntax files./src/syntax/*.ts: Modular implementation files attaching prototype methods (e.g.,alternation.ts,boundaries.ts,quantifiers.ts)./src/index.ts: The main entry point./tests/*.test.ts: Categorized behavioral and static type tests.
Development Workflow
- Use
bun run testfor running the isolated test suite. - Use
bun run lintto enforce formatting and style. - Use
bun run buildto output common module formats into thedist/directory usingtsup.
Testing
Unit Testing
| Module | % Funcs | % Lines | Uncovered Line #s |
| :-------------------------------- | :--------: | :-------: | :-------------------------------------: |
| src/core/builder.ts | 100.00 | 100.00 | |
| src/index.ts | 100.00 | 100.00 | |
| src/syntax/alternation.ts | 100.00 | 100.00 | |
| src/syntax/boundaries.ts | 100.00 | 100.00 | |
| src/syntax/character-classes.ts | 100.00 | 100.00 | |
| src/syntax/flags.ts | 100.00 | 100.00 | |
| src/syntax/groups.ts | 100.00 | 100.00 | |
| src/syntax/lookarounds.ts | 100.00 | 100.00 | |
| src/syntax/quantifiers.ts | 100.00 | 97.33 | 134 (Lazy quantifier verification hook) |
| src/utils/escape.ts | 100.00 | 100.00 | |
| All files | 100.00 | 99.87 | |
Performance Baseline: Library vs Native
We benchmarked the execution path across four complexity levels.
The tables below show the execution time and memory allocation per iteration for our current implementation compared to raw native RegExp.
Results on Bun (v1.3.13)
| Complexity Level | Library Execution | Native Raw RegExp Baseline | | :------------------------ | :-------------------- | :------------------------- | | Simple (No captures) | ~45 ns / ~0 bytes | ~46 ns / ~0 bytes | | Medium (Email parser) | ~650 ns / ~0 bytes | ~224 ns / ~0 bytes | | Complex (URL parser) | ~511 ns / ~9 bytes | ~295 ns / ~0 bytes | | Extremely Complex | ~7.58 µs / ~181 bytes | ~0.24 µs / ~1.6 bytes |
Results on Node.js (v24.14.0)
| Complexity Level | Library Execution | Native Raw RegExp Baseline | | :------------------------ | :------------------- | :------------------------- | | Simple (No captures) | ~104 ns / ~129 bytes | ~99 ns / ~128 bytes | | Medium (Email parser) | ~355 ns / ~488 bytes | ~220 ns / ~384 bytes | | Complex (URL parser) | ~59 ns / ~80 bytes | ~297 ns / ~528 bytes | | Extremely Complex | ~391 ns / ~504 bytes | ~246 ns / ~400 bytes |
Note: Native execution only returns the raw RegExpExecArray. The library returns a strongly-typed object mapping capture groups to properties. Interestingly, on Node.js, V8 heavily optimizes the monomorphic class allocation for the Complex scenario, occasionally resulting in faster execution times than standard raw array mapping.
License
This project is licensed under the MIT License. (Semua milik allah - Aldi Taher)
