@hgargg-0710/regex
v0.1.0
Published
A JavaScript-flavoured regular expression parser, generator and AST-construction API.
Downloads
3
Readme
regex
regex is a JavaScript library intended for parsing, generation and AST-construction of
various regular expressions, as per the JavaScript variety's definition.
NOTE: the library depends upon the parsers.js package for parser-making
Installation
npm install @hgargg-0710/regexDocumentation
The package has the following exports:
parse(function)generate(function)parser(submodule)generator(submodule)tree(submodule)tokens(submodule)
parse
function parse(regex: string): FlagsA function taking in a string containing a regular expression, and returning an AST of it.
generate
function generate(AST: Flags): stringTakes in the given AST node (not necessariliy Flags, but too long to express here),
and returns a string representing it.
NOTE: partial nodes will give only partial results. For example, passing a PatternEnd will give "$".
parser
Various parsing layers APIs
| export | description |
| ------------------ | --------------------------------------------------------- |
| ExpressionParser | Function. Parses an Expression, initially tokenizing it |
| boundry | Submodule. Handles parsing of boundries |
| chars | Submodule. Handles tokenization |
| classes | Submodule. Handles parsing of character classes |
| deflag | Submodule. Handles removal of flags |
| disjunction | Submodule. Handles parsing of disjunction expressions |
| escaped | Submodule. Handles parsing of escape-sequences |
| group | Submodule. Handles recursion within a regular expression |
| nogreedy | Submodule. Handles the "no-greedy" quantifiers |
| quantifier | Submodule. Handles the quantifiers |
The submodule exports are a part of the parse function's final definition.
The order in which they (layers) are passed within the parse function are:
deflagcharsclassesescapedboundrygroup(recursive, looped)quantifiernogreedydisjunction
deflag
| export | description |
| --------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| DeFlag | Functions for the de-flagging of a string with regular expression in it. Returns a Flags object, with the .expression field containing the expressions's string |
| flagTable | Table for identification of flags with appropriate TokenInstances |
| flagInstance | Function based off flagTable. Returns the TokenType of a given flag string |
| identifyFlags | Maps flagInstance to an array of strings |
chars
| export | description |
| --------------------- | --------------------------------------------------------------------------------------- |
| ExpressionTokenizer | A PatternTokenizer for tokenizing the given Pattern with a regular expression in it |
| tokenizerMap | The RegExpMap, on which ExpressionTokenizer is based |
classes
| export | description |
| ----------------------- | ----------------------------------------------------------------------------------- |
| CharacterClassParser | Main parser for character classes |
| classLimit | Limits the given stream up to the next RectOp from the current element |
| classMap | TypeMap, on which CharacterClassParser is based |
| HandleClass | The handler for the RectOp token inside the classMap |
| ClassHandler | A multistep function, serving as the main component of HandleClass |
| EscapeInner | A parser function, first component of the ClassHandler. Escapes inside characters |
| HandleEscaped | Handler for the escaped characters, main part of the EscapeInner |
| IdentifyRanges | Second parsing function of ClassHandler. Identifies and parsers ranges |
| HandleRange | The main component of IdentifyRanges, parses encountered ranges |
| InClassEscapedHandler | A slightly modified version of the escapedMap from escaped module for escaping |
escaped
| export | description |
| -------------------------- | ---------------------------------------------------------------------- |
| EscapedParser | Main parser of the escaped characters |
| escapePreface | The TypeMap, on which EscapedParser is based |
| escapeMap | The ValueMap, on which defines the global-scope escaping |
| escapedHandler | Creates a function for handling escaped characters based off given map |
| parseBackreference | Returns a Backreference based on given arguments of curr, input |
| parseMultControl | Returns a ControlCharacter of lengths 4-5 based on curr, input |
| parseDoubleControl | Returns a ControlCharacter of length 2 based on curr, input |
| parseSingleControl | Returns a ControlCharacter of length 1 based on curr, input |
| readUnicodeClassProperty | Parses a UnicodeClassProperty based on curr, input |
| readBraced | Reads the given Stream, until a ClBrace is encountered |
| readNamedBackreference | Reads a NamedBackreference based on readIdentifier |
| readUBrace | Reads a sequence of {hhhh} or {hhhhh} where isHex(h) === true |
| readu | Reads a sequence of hhhh, where isHex(h) === true |
| readx | Reads a sequence of hh, where isHex(h) === true |
| isHex | Returns whether a character given is a hexidecimal |
boundry
| export | description |
| --------------- | ----------------------------------------------------------------------- |
| BoundryParser | Main parser of the submodule. Separates boundries into TokenInstances |
| boundryMap | The TypeMap, on which the BoundryParser is based |
| HandleEscaped | Handles the NonWordBoundry TokenInstances |
group
| export | description |
| --------------------------- | --------------------------------------------------------------------------------------------------------------- |
| EndParser | The main parser of the submodule. The ExpressionParser ends with it |
| GroupParser | The first parsing layer of the EndParser. Recursive. Handles recursion, groups/captures, look-aheads/-behinds |
| groupMap | The TypeMap, on which the GroupParser is based |
| GroupHandler | The main component of the groupMap |
| nestedBrack | Function for limiting the current-level nested bracket-expression |
| CollectionHandler | Function for handling current collection |
| HandleQMark | Function for handling "collections" starting with ? ((?<!...), (?<...>...), ...) |
| HandleCollectionBase | Function for recursively handling a capture group |
| QMarkHandler | Underlying TableParser of HandleQMark |
| HandleQMarkExclMark | Handles a negative look-ahead |
| HandleQMarkEq | Handles a look-ahead |
| HandleLeftAngular | Handles all "collections" starting with < ((?<...>...), (?<=...), ...) |
| HandleColon | Handles a no-capture group |
| LeftAngularHandler | Underlying TableParser for HandleLeftAngular |
| HandleLeftAngularBase | Handles a named capture |
| HandleLeftAngularExclMark | Handles a negative look-behind |
| HandleLeftAngularEq | Handles a look-behind |
| readIdentifier | Reads an identifier (for the named capture/backreference) |
quantifier
| export | description |
| ------------------- | ---------------------------------------------------------------------------- |
| QuantifierParser | Main parser of the submodule. Parses quantifiers |
| QuantifierHandler | A TableParser, main component of the QuantifierParser |
| HandlePlus | Handles a Plus token encountered |
| HandleStar | Handles a Star token encountered |
| HandleQMark | Handles a QMark token encountered |
| BraceHandler | Handles a OpBrace token encountered |
| HandleBraced | Returns a handling function for either one of NtoM, NPlus, or NOnly |
| readNumber | Reads a number from the given Stream (note: up to the first isNaN token) |
| limitBraced | Limits the given Stream up to the point of the first encountered ClBrace |
nogreedy
| export | description |
| ------------------- | -------------------------------------------------------------- |
| ParseNoGreedy | Main parser of the submodule. Parsers NoGreedy tokens |
| noGreedyMap | The TypeMap, on which ParseNoGreedy is based |
| HandleQuantifier | Handler for quantifiers |
| QuantifierHandler | The underlying TableParser-function of HandleQuantifiers |
| HandleQMark | Handles QMark following a quantifier (no-greedy quantifiers) |
disjunction
| export | description |
| ---------------------- | ----------------------------------------------------------------------------------------------------------------- |
| DisjunctionParser | The main export of the submodule. Parses disjunctions |
| EmptyFixer | First parsing layer of DisjunctionParser. Fixes empty expressions \|\| |
| DisjunctionTokenizer | Second parsing layer of DisjunctionParser. Puts non-Pipe bits of current Stream into DisjucntionArguments |
| DisjunctionDelimiter | Third and final parsing layer of DisjunctionParser. Delimits the Stream based off Pipe tokens |
| hasDisjunctions | Checks whether a given Stream has disjunctions to parse from given point on |
| limitPipe | Limits the given Stream until the moment the next Pipe is encountered |
| skipTilPipes | Skips Stream until a Pipe is discovered |
generator
Provides regex-generation related exports based off the package's AST
| export | description |
| ------------------------------ | -------------------------------------------------------------------------------------------------- |
| RegexGenerator | The SourceGenerator for the package's AST (generate is based on it) |
| generatorMap | The TypeMap, on which RegexGenerator is based |
| GenerateBackspaceClass | Generates a regex for BackspaceClass |
| GenerateWordBoundry | Generates a regex for WordBoundry |
| GenerateNonWordBoundry | Generates a regex for NonWordBoundry |
| GenerateNewline | Generates a regex for Newline |
| GenerateCarriageReturn | Generates a regex for CarriageReturn |
| GenerateWordClass | Generates a regex for WordClass |
| GenerateNonWordClass | Generates a regex for NonWordClass |
| GenerateFormFeed | Generates a regex for FormFeed |
| GenerateDigitClass | Generates a regex for DigitClass |
| GenerateNonDigitClass | Generates a regex for NonDigitClass |
| GenerateNULClass | Generates a regex for NULClass |
| GenerateVerticalTab | Generates a regex for VerticalTab |
| GenerateHorizontalTab | Generates a regex for HorizontalTab |
| GenerateNonWhitespaceClass | Generates a regex for NonWhitespaceClass |
| GenerateWhitespaceClass | Generates a regex for WhitespaceClass |
| GenerateEmptyExpression | Generates a regex for EmptyExpression |
| GenerateMatchIndicies | Generates a regex for MatchIndicies flag |
| GenerateGlobalSearch | Generates a regex for GlobalSearch flag |
| GenerateCaseInsensitive | Generates a regex for CaseInsensitive flag |
| GenerateMultline | Generates a regex for Multline flag |
| GenerateDotAll | Generates a regex for DotAll flag |
| GenerateUnicode | Generates a regex for Unicode flag |
| GenerateUnicodeSets | Generates a regex for UnicodeSets flag |
| GenerateSticky | Generates a regex for Sticky flag |
| GeneratePatterStart | Generates a regex for PatternStart |
| GeneratePatternEnd | Generates a regex for PatternEnd |
| GenerateFlags | Generates a regex for Flags |
| GenerateExpression | Generates an regex for Expression |
| GenerateNOnly | Generates an regex for NOnly |
| GenerateNtoM | Generates an regex for NtoM |
| GenerateNPlus | Generates an regex for NPlus |
| GenerateEscaped | Generates an regex for Escaped |
| GenerateBackreference | Generates a regex for Backreference |
| GenerateUnicodeClassProperty | Generates a regex for UnicodeClassProperty |
| GenerateControlCharacter | Generates a regex for ControlCharacter |
| GenerateNamedBackreference | Generates a regex for NamedBackreference |
| GenerateClassRange | Generates a regex for ClassRange |
| GenerateNoGreedy | Generates a regex for NoGreedy |
| GenerateOptional | Generates anregex for Optional |
| GenerateZeroPlus | Generates a regex for ZeroPlus |
| GenerateOnePlus | Generates a regex for OnePlus |
| GenerateClass | Generates a regex for CharacterClass |
| GenerateNegClass | Generates a regex for NegCharacterClass |
| GenerateDisjunction | Generates a regex for Disjunction |
| GenerateDisjunctionArgument | Generates a regex for DisjunctionArgument |
| GenerateNonCaptureGroup | Generates a regex for NonCaptureGroup |
| GenerateCaptureGroup | Generates a regex for CaptureGroup |
| GenerateLookAhead | Generates a regex for LookAhead |
| GenerateLookBehind | Generates a regex for LookBehind |
| GenerateNegLookAhead | Generates a regex for NegLookAhead |
| GenerateNegLookBehind | Generates a regex for NegLookBehind |
| GenerateNamedCapture | Generates a regex for NamedCapture |
| GenerateWildcard | Generates a regex for Wildcard |
| GeneratePipe | Generates a regex for Pipe |
| GenerateComma | Generates a regex for Comma |
| GenerateTrivial | Generates a regex for anything else not in the table already (with a typeof .value === 'string') |
tree
| export | description |
| ------------------ | -------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| RegexStream | A TreeStream for the library's AST (note: accepts THE AST ITSELF) |
| RegexTree | A Tree interface implementation for the library's AST |
| treeMap | The TypeMap, on which RegexTree is based |
| NamedCaptureTree | The function for conversion of a NamedCapture to a Tree |
| ExpressionTree | The function for conversion of an Expression to a Tree |
| FlagTree | The function for convertsion of a Flags to a Tree |
| SeveralTree | The function for conversion of NOnly, NtoM and NPlus to a Tree |
| SingleTree | The function for conversion of ZeroPlus, OnePlus, Optional, LookAhead, LookBehind, NegLookAhead, NegLookBehind, NamedBackreference to a Tree |
| ValueTree | The function for conversion of ClassRange, DisjunctionArgument, CharacterClass, NegCharacterClass and Disjunction to a Tree |
| ChildlessTree | The function for conversion of the rest of the tokens to a Tree |
tokens
The tokens module has the same submodule structure as the parser module.
| submodule | description |
| ------------- | ------------------------------------------------ |
| boundry | Various boundry tokens |
| chars | Various basic (first-order) tokens |
| classes | Tokens for representation of character classes |
| deflag | Flags and expressions representation tokens |
| disjunction | Disjunction-related tokens |
| escaped | Escape-sequence-related tokens |
| group | Tokens for groups and other recursive structures |
| nogreedy | Tokens for non-greedy quantifiers |
| quantifier | Tokens for quantifiers |
deflag
| TokenType/TokenInstance | represents | type |
| --------------------------- | ------------------------------------------------------------------------- | -------------------- |
| MatchIndicies | The d flag | "indicies" |
| GlobalSearch | The g flag | "global" |
| CaseInsensitive | The i flag | "case-insensitive" |
| Multiline | The m flag | "multiline" |
| DotAll | The s flag | "dot-all" |
| Unicode | The u flag | "unicode" |
| UnicodeSets | The v flag | "unicode-sets" |
| Sticky | The y flag | "sticky" |
| Flags | The complete regular expression with flags | "flags" |
| Expression | A partial expression, without flags (can have other Expressions inside) | "expression" |
chars
| TokenType | represents | type |
| -------------- | --------------- | ------------ |
| Escape | \\ | "escape" |
| RectOp | [ | "rop" |
| RectCl | ] | "rcl" |
| Hyphen | - | "hyphen" |
| Pipe | \| | "pipe" |
| OpBrack | ( | "opbrack" |
| ClBrack | ) | clbrack |
| QMark | ? | "qmark" |
| ExclMark | ! | "emark |
| Eq | = | "eq" |
| Wildcard | . | "wildcard" |
| Star | * | "star" |
| Plus | + | "plus" |
| OpBrace | { | "opbrc" |
| ClBrace | } | "clbrc" |
| Colon | : | "colon" |
| Comma | , | "comma" |
| LeftAngular | < | "lang" |
| RightAngular | > | "rang" |
| Dollar | $ | "dollar" |
| Xor | ^ | "xor" |
| RegexSymbol | everything else | "symbol" |
classes
| TokenType | represents | type |
| ------------------- | ----------------------------------- | ----------------- |
| CharacterClass | A character class [...] | "charclass" |
| NegCharacterClass | A negative character class [^...] | "neg-charclass" |
| ClassRange | A character class range X-Y | "class-range" |
escaped
| TokenType/TokenInstance | represents | type |
| --------------------------- | ---------------------------------------------------- | ------------------------ |
| ControlCharacter | \cX, \xhh, \uhhhh, \u{hhhh} or \u{hhhhh} | "control-char" |
| Backreference | \N - numeric backreference | "backref" |
| NamedBackreference | \k<name> - named backreference | "named-backref" |
| UnicodeClassProperty | \p{...} - unicode class property | "uniprop" |
| RegexIdentifier | name - identifier in named captures/backreferences | "identifier" |
| CarriageReturn | \r - carriage return | "cr" |
| NonWordBoundry | \B - non-word boundry (outside classes) | "non-word-boundry" |
| WordBoundry | \b - word-boundry | "word-boundry" |
| NULClass | \0 - NUL class | "nul-class" |
| FormFeed | \f - form feed | "form-feed" |
| DigitClass | \d - digit class | "digit-class" |
| NonDigitClass | \D - non-digit class | "non-digit-class" |
| WordClass | \w - word-class | "word-class" |
| NonWordClass | \W - nonw-word-class | "non-word-class" |
| WhitespaceClass | \s - whitespace class | "whitespace-class" |
| NonWhitespaceClass | \S - non-whitespace class | "non-whitespace-class" |
| HorizontalTab | \t - horizontal tab | "tab" |
| VerticalTab | \v - vertical tab | "vtab" |
| BackspaceClass | \b - backspace | "backspace" |
| Newline | \n - newline | "newline" |
| Escaped | Any other escaped character | "escaped" |
boundry
| TokenInstance | represents | type |
| --------------- | ---------- | --------- |
| PatternStart | ^ | "start" |
| PatternEnd | $ | "end" |
group
| TokenType | represents | type |
| ---------------- | ------------- | ------------------ |
| CaptureGroup | (...) | "capture" |
| NoCaptureGroup | (?:...) | "non-capture" |
| NamedCapture | (<name>...) | "named-capture" |
| LookAhead | (?=...) | "lookahead" |
| LookBehind | (?<=...) | "lookbehind" |
| NegLookAhead | (?!...) | "neg-lookahead" |
| NegLookBehind | (?<!...) | "neg-lookbehind" |
quantifier
| TokenType | represents | type |
| ----------- | -------------- | ------------- |
| ZeroPlus | ...* | "zero-plus" |
| OnePlus | ...+ | "one-plus" |
| Optional | ...? | "optional" |
| NOnly | ...{...} | "n-only" |
| NPlus | ...{...,} | "n-plus" |
| NtoM | ...{...,...} | "n-to-m" |
nogreedy
| export | description | type |
| -------------- | ------------------------------------------------------------------------------------ | ------------ |
| NoGreedy | A TokenType representing no-greedy opertors | "nogreedy" |
| isQuantifier | A predicate returning true only for tokens with types from the quantifier module |
disjunction
| TokenType/TokenInstance | represents | type |
| --------------------------- | -------------------------------------------- | ------------------- |
| Disjunction | ...\|...\|... | "disjunction" |
| DisjunctionArgument | An element of a Disjunction | "disjunction-arg" |
| EmptyExpression | An empty element of a Disjunction (\|\|) | "empty" |
