npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2025 – Pkg Stats / Ryan Hefner

oniguruma-parser

v0.12.1

Published

Parse, validate, traverse, transform, and optimize Oniguruma regular expressions

Downloads

7,586,467

Readme

oniguruma-parser 🌿

npm version npm downloads bundle

A TypeScript library for parsing, validating, traversing, transforming, and optimizing Oniguruma regular expressions.

[!NOTE] Oniguruma is a regular expression engine written in C that's used in Ruby (via a fork named Onigmo), PHP (mb_ereg, etc.), TextMate grammars (used by VS Code, Shiki, etc.), and many other tools.

This library has been battle-tested by Oniguruma-To-ES and tm-grammars, which are used by Shiki to process tens of thousands of real-world Oniguruma regexes.

📜 Contents

🕹️ Install and use

npm install oniguruma-parser
import {toOnigurumaAst} from 'oniguruma-parser';

The following modules are available in addition to the root 'oniguruma-parser' export:

  • Parser module: Includes numerous functions and types for constructing and working with OnigurumaAst nodes. Also includes the parse function, wrapped by toOnigurumaAst.
  • Traverser module: Traverse and transform an OnigurumaAst.
  • Generator module: Convert an OnigurumaAst to pattern and flags strings.
  • Optimizer module: Minify and improve the performance of Oniguruma regexes.

🌿 Convert a pattern to an AST

To parse an Oniguruma pattern (with optional flags and compile-time options) and return an AST, call toOnigurumaAst, which uses the following type definition:

function toOnigurumaAst(
  pattern: string,
  options?: {
    flags?: string;
    rules?: {
      captureGroup?: boolean;
      singleline?: boolean;
    };
  }
): OnigurumaAst;

For example:

import {toOnigurumaAst} from 'oniguruma-parser';

const ast = toOnigurumaAst('A.*');
console.log(ast);
/* →
{ type: 'Regex',
  body: [
    { type: 'Alternative',
      body: [
        { type: 'Character',
          value: 65,
        },
        { type: 'Quantifier',
          kind: 'greedy',
          min: 0,
          max: Infinity,
          body: {
            type: 'CharacterSet',
            kind: 'dot',
          },
        },
      ],
    },
  ],
  flags: {
    type: 'Flags',
    ignoreCase: false,
    dotAll: false,
    extended: false,
    digitIsAscii: false,
    posixIsAscii: false,
    spaceIsAscii: false,
    wordIsAscii: false,
    textSegmentMode: null,
  },
}
*/

An error is thrown if the provided pattern or flags aren't valid in Oniguruma.

Note: toOnigurumaAst is a wrapper around the parser module's parse function that makes it easier to use by automatically providing the appropriate Unicode property validation data.

🌀 Traverse and transform an AST

See details and examples in the traverser module's readme.

↩️ Convert an AST to a pattern

See details and examples in the generator module's readme.

🪄 Optimize regexes

This library includes one of the few implementations (for any regex flavor) of a "regex optimizer" that can minify and improve the performance and readability of regexes prior to use.

Example:

(?x) (?:\!{1,}) (\b(?:ark|arm|art)\b) [[^0-9A-Fa-f]\P{^Nd}\p{ Letter }]

Becomes:

!+\b(ar[kmt])\b[\H\d\p{L}]

Optimized regexes always match exactly the same strings.

See more details and examples in the optimizer module's readme.

[!TIP] 🧪 Try the optimizer demo.

🆚 Known differences

Known differences will be resolved in future versions.

Unsupported features

The following rarely-used features throw errors since they aren't yet supported:

  • Rarely-used character specifiers: Non-A-Za-z with \cx \C-x, meta \M-x \M-\C-x, octal code points \o{…}, and octal encoded bytes ≥ \200.
  • Code point sequences: \x{H H …} \o{O O …}.
  • Absence expressions (?~|…|…), stoppers (?~|…), and clearers (?~|).
  • Conditionals: (?(…)…), etc.
  • Non-built-in callouts: (?{…}), etc.
  • Numbered forward backreferences (incl. relative \k<+N>) and backreferences with recursion level (\k<N+N>, etc.).
  • Flags D P S W y{g} y{w} within pattern modifiers, and whole-pattern modifiers C I L.

Despite these gaps, more than 99.99% of real-world Oniguruma regexes are supported, based on a sample of ~55k regexes used in TextMate grammars (conditionals were used in three regexes, and other unsupported features weren't used at all). Some of the Oniguruma features above are so exotic that they aren't used in any public code on GitHub.

This library currently treats it as an error if a numbered backreference comes before its referenced group. This is a rare issue because:

  • Most such placements are mistakes and can never match, due to Oniguruma's behavior for backreferences to nonparticipating groups.
  • Erroring matches the correct behavior of named backreferences.
  • For unenclosed backreferences, this only affects \1\9 since it's not a backreference in the first place if using \10 or higher and not as many capturing groups are defined to the left (it's an octal or identity escape).

The following don't yet throw errors, but should:

  • Special characters that are invalid in backreference names even when referencing a valid group with that name.
    • Named backreferences should use a more limited set of allowed characters than named groups and subroutines.
    • Note that an error is already correctly thrown for any backreference name that includes - or + (which is separate from how these symbols are used in relative numbered backreferences).
  • Subroutines used in ways that resemble infinite recursion (#5).
    • Such subroutines error at compile time in Oniguruma.

Behavior differences

Unenclosed four-digit backreferences

Although any number of digits are supported for enclosed \k<…>/\k'…' backreferences (assuming the backreference refers to a valid capturing group), unenclosed backreferences currently support only up to three digits (\999). In other words, \1000 is handled as \100 followed by 0 even if 1,000+ captures appear to the left.

Note: An apparent bug in vscode-oniguruma (v2.0.1 tested) prevents any regex with more than 999 captures from working. They fail to match anything, with no error.

Erroring on patterns that trigger Oniguruma bugs

This library intentionally doesn't reproduce bugs, and it currently throws errors for several edge cases that trigger Oniguruma bugs and undefined behavior.

Although nested absence functions like (?~(?~…)) don't throw an error in Oniguruma, they produce self-described "strange" results, and Oniguruma's docs state that "nested absent functions are not supported and the behavior is undefined".

In this library, nested absence functions throw an error. In future versions, parsing of nested absence functions will follow Oniguruma and no longer error.

In Oniguruma, \x is an escape for the NUL character (equivalent to \0, \x00, etc.) if it's not followed by { or a hexadecimal digit.

In this library, bare \x throws an error.

Additional behavior details for \x in Oniguruma:

  • \x is an error if followed by a { that's followed by a hexadecimal digit but doesn't form a valid \x{…} code point escape. Ex: \x{F and \x{0,2} are errors.
  • \x matches a literal x if followed by a { that isn't followed by a hexadecimal digit. Ex: \x{ matches x{, \x{G matches x{G, and \x{,2} matches 0–2 x characters, since {,2} is a quantifier with an implicit 0 min.
  • In Oniguruma 6.9.10 and earlier (report), \x matches a literal x if it appears at the very end of a pattern. This is a bug.

In future versions, parsing of \x will follow the Oniguruma rules above (excluding bugs), removing some cases where it currently errors.

Normally, any incomplete \uHHHH (including bare \u) throws an error. However, in Oniguruma 6.9.10 and earlier (report), bare \u matches a literal u if it appears at the very end of a pattern. This is a bug.

In this library, incomplete \u is always an error.

Context: Unlike \uHHHH and enclosed \x{H…} (which match code points), Oniguruma's unenclosed \xHH represents an encoded byte, which means that, unlike in other regex flavors, \x80 to \xFF are treated as fragments of a code unit. Ex: [\0-\xE2\x82\xAC] is equivalent to [\0-\u20AC].

Invalid standalone encoded bytes should throw an error, but several related bugs are present in Oniguruma 6.9.10 and earlier (report).

In this library, they always throw an error.

Behavior details in Oniguruma:

  • Standalone \x80 to \xF4 throw an error.
  • Standalone \xF5 to \xFF fail to match anything, but don't throw. This is a bug.
  • When used as the end value of a character class range:
    • Standalone \x80 to \xBF and \xF5 to \xFF are treated as \x7F. This is a bug.
    • If the range is within a negated, non-nested character class (ex: [^\0-\xFF]), \xF5 to \xFF are treated as \x{10FFFF}. This is a bug.

🔢 Oniguruma version

All versions of this library to date have followed the rules of Oniguruma 6.9.10 (released 2025-01-01), which uses Unicode 16.0.0.

At least since Oniguruma 6.0.0 (released 2016-05-09), regex syntax changes in new versions have been backward compatible. Some versions added new syntax that was previously an error (such as new Unicode property names), and in a few cases, edge case parsing bugs were fixed.

Oniguruma 6.9.8 (released 2022-04-29) is an important baseline for JavaScript projects, since that's the version used by vscode-oniguruma 1.7.0 to the latest 2.0.1. It's therefore used in recent versions of various projects, including VS Code and Shiki. However, the regex syntax differences between Oniguruma 6.9.8 and 6.9.10 are so minor that this is a non-issue.

🧩 Contributing

Contributions are welcome. See the guide to help you get started.

🏷️ About

Created by Steven Levithan and contributors.

If you want to support this project, I'd love your help by contributing improvements (guide), sharing it with others, or sponsoring ongoing development.

MIT License.