npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2024 – Pkg Stats / Ryan Hefner

re-build

v1.0.0

Published

Building regular expressions with natural language

Downloads

3

Readme

RE-Build

Build regular expressions with natural language.

Introduction

Have you ever dealt with complex regular expressions like the following one?

var ipMatch = /(?:(?:1\d\d|2[0-4]\d|25[0-5]|[1-9]\d|\d)\.){3}(?:1\d\d|2[0-4]\d|25[0-5]|[1-9]\d|\d)\b/;

Using a meaningful variable name can help, writing comments helps even more, but what's always hard to understand is what the regular expression actually does: They're left as some sort of magic trick that it's never updated because their syntax is so obscure that even the authors themselves hardly fell like facing them again. Debugging a regular expression often means rewriting it from scratch.

RE-Build's aim is to change that, converting the process of creating a regular expression to combining nice natural language expressions. The above regex would be composed as

var ipNumber = RE.group(
        RE  ("1").then.digit.then.digit
        .or ("2").then.oneOf.range("0", "4").then.digit
        .or ("25").then.oneOf.range("0", "5")
        .or .oneOf.range("1", "9").then.digit
        .or .digit
    ),

    ipMatch = RE.matching.exactly(3).group( ipNumber.then(".") )
                .then(ipNumber).then.wordBoundary.regex;

This approach is definitely more verbose, but also much clearer and less error prone.

Another module for the same purpose is VerbalExpressions, but it doesn't allow to build just any regular expression. RE-Build aims to fill that gap too.

Remember, as a general rule, that RE-Build does not care if your environment doesn't support certain RegExp features (for example, the sticky flag or extended Unicode escaping sequences), as the corresponding source code will be generated anyway. Of course, you'll get an error trying to get a RegExp object out of it.

Installation

Via npm:

npm install re-build

Via bower:

bower install re-build

The package can be loaded as a CommonJS module (node.js, io.js), as an AMD module (RequireJS, ...) or as a standalone script:

<script src="re-build.min.js"></script>

Usage

For a detailed documentation, check the reference sheet. Keep in mind that RE-Build is a tool to help building, understanding and debugging regular expressions, and does not prevent one to create incorrect results.

Basics

The core point is the RE object (or whatever variable name you assigned to it), together with the matching method:

var RE = require("re-build");
var builder = RE.matching("xyz");

The output is not, however, a regular expression, but a a regular expression builder that can be extended, or used as an extension for other builders. To get the corrisponding regular expression, use the regex property or the toRegExp()/valueOf() methods.

var start = RE.matching.theStart.then(builder).toRegExp(); // /^xyz/

var foo = RE.matching(builder).then.oneOrMore.digit.regex; // /xyz\d+/

As you can see, you can put additional matching blocks using the then word, which is also a function that can take arguments as blocks to add too. The arguments can be strings (which are backslash-escaped), regular expressions or RE-Build'ers, whose source property is added to the builder unescaped.

The or word has a similar meaning, but adds an alternative block to the source:

var hex = RE.matching.digit
            .or.oneOf.range("A", "F")
            .regex;  // /\d|[A-F]/

Regex builders are immutable

Regular expression builders are immutable objects, meaning that when extending a builder we get a new builder instance:

var bld1 = RE.matching.digit;
var bld2 = bld1.or.oneOf.range("A", "F");
bld1 === bld2; // => false

Special classes, aliases and escaping

RE-Build uses specific names to address common regex character classes:

Name | Result | Notes ---------------|--------------|-------------- digit | \d | from 0 to 9 alphaNumeric | \w | digits, uppercase and lowercase letters and the underscore whiteSpace | \s | white space characters wordBoundary | \b | anyChar | . | universal matcher theStart | ^ | theEnd | $ | cReturn | \r | carriage return newLine | \n | tab | \t | vTab | \v | vertical tab formFeed | \f | null | \0 | slash | \/ | backslash | \\ | backspace | \b | can be used in character sets `[...]' only

The first four names can be negated prefixing them with not to get the complementary meaning:

  • not.digit for \D;
  • not.alphaNumeric for \W;
  • not.whiteSpace for \S;
  • not.wordBoundary for \B.

Single characters can be defined by escape sequences:

Function | Result | Meaning ---------------|----------|----------- ascii(n) | \xhh | ASCII character corrisponding to n codePoint(n) | \uhhhh / \u{hhhhhh} | Unicode character corrisponding to n control(a) | \ca | Control sequence corrisponding to the letter a

With the exception of wordBoundary, theStart and theEnd, all of the previous words can be used inside character sets (see after).

Flags

You can set the flags of the regex prefixing matching with one or more of the flagging options:

  • globally for a global regex;
  • anyCase for a case-insensitive regex;
  • fullText for a "multiline" regex (i.e., the dot '.' matches new line characters too);
  • withUnicode for a regex with extended Unicode support;
  • stickily for a "sticky" regex.

Alternatively, you can set the flags with the withFlags method of the RE object.

// The following regexes are equivalent: /[a-f]/gi
var foo = RE.globally.anyCase.matching.oneOf.range("a", "f").regex;
var bar = RE.withFlags("gi").matching.oneOf.range("a", "f").regex;

You can't change a regex builder's flags, as builders are immutable, but you can create a copy of a builder with different flags:

var foo = RE.matching.oneOrMore.alphaNumeric;  // /\w+/
var bar = RE.globally.matching(foo);           // /\w+/g

If you don't need flags set, as a shortened version you can remove the matching word:

// These are equivalent:
RE.matching("abc").then.digit;
RE("abc").then.digit;

This becomes useful when defining the content of groups, character sets or look-aheads.

Grouping

Use the group word to define a non-capturing group, and capture for a capturing group:

var amount = RE.matching("$").then.capture(
    RE.oneOrMore.digit
      .then.noneOrOne.group(".", RE.oneOrMore.digit)
).regex;
// /\$(\d+(?:\.\d+)?)/

The group and capture words are function, and the resulting groups will embrace everything passed as arguments. Just like then and or, arguments can be strings, regular expression or other RE-Build'ers.

Backrefences for capturing groups are obtained using the reference function, passing the reference number:

var quote = RE.matching.capture( RE.oneOf("'\"") )
              .then.anyAmountOf.alphaNumeric
              .then.reference(1);
// /(['"])\w*\1/

Character sets

Character sets ([...]) are introduced by the word oneOf. Several characters can be included separated by the word and. Additionally, one can include a character interval, using the function range and giving the initial and final character of the interval.

Exclusive character sets can be obtained prefixing oneOf by the word not.

var hexColor = RE.matching("#").then.exactly(6)
                 .oneOf.digit.and.range("a", "f").and.range("A", "F");
// /#[\da-fA-F]{6}/

var hours = RE.oneOf("01").then.digit.or("2").then.oneOf.range("0", "3");
// /[01]\d|2[0-3]/

var quote = RE.matching('"').then.oneOrMore.not.oneOf('"').then('"');
// /"[^"]+"/

Quantifiers

Quantifiers can be defined prefixing the quantified block by one of these constructs:

Construct | Result ----------------|--------- anyAmountOf | * oneOrMore | + noneOrOne | ? atLeast(n) | {n,} atMost(n) | {,n} exactly(n) | {n} between(n, m) | {n,m}

Quantification is smart enough to translate constructs in their most compact form (e.g., .atLeast(1) becomes +, .between(0, 1) becomes ? and so on).

Lazy quantifiers can be obtained prefixing the word lazily prior to the quantifier.

var number = RE.oneOrMore.digit; //  /\d+/

var hexnumber = RE.exactly(2).oneOf.digit.and.range("a", "f");
// /[\da-f]{2}/

var macAddress = RE.anyCase.matching(hexnumber).then.exactly(5).group(
                    RE("-").then(hexnumber)
                 );
// /[\da-f]{2}(?:-[\da-f]{2}){5}/i

var quoteAlt = RE.matching.capture(RE.oneOf("'\""))
                 .then.lazily.anyAmountOf.anyChar
                 .then.reference(1);
// /(['"]).*?\1/

Look-aheads

Look-aheads are introduced by the function followedBy (eventually prefixed by not for negative look-aheads).

var euro = RE.matching.oneOrMore.digit.followedBy("€");
// /\d+(?=€)/

var foo = RE("a").or.not.followedBy("b").then("c");
// /a|(?!b)c/

Compatibilty

  • Internet Explorer 9+
  • Firefox 4+
  • Safari 5+
  • Chrome
  • Opera 11.60+
  • node.js

Basically, every Javascript environment that supports Object.defineProperties should be fine.

Tests

The unit tests are built on top of mocha. Once the package is installed, run npm install from the package's root directory in order to locally install mocha, then npm run test to execute the tests. Open index.html with a browser to perform the tests on the client side.

If mocha is installed globally, served side tests can be run with just the command mocha from the package's root directory.

To do

  • More natural language alternatives
  • Plurals, articles
  • CLI tool to translate regexes to and from RE-Build's syntax
  • More examples
  • Consider IE8 support

License

MIT @ Massimo Artizzu 2015-2016. See LICENSE.