npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2024 – Pkg Stats / Ryan Hefner

parseit.js

v0.1.5

Published

Customizable parsing library for converting a list of tokens into an AST tree

Downloads

11

Readme

Parsing Toolkit for Programming Languages

ParseIT.js is a JavaScript library for generating recursive-descent parsers that work with tokens. It provides a declarative way to define grammar rules for your language, as well as AST nodes to build the resulting tree with. It is availible on npm and works on variety of JavaScript and TypeScript runtimes, including web browsers.

Getting Started

First, add ParseIT to your project using npm

npm install parseit.js

Congratulations, you're good to go!

Creating grammar

Let's start by creating a new file named grammar.js, where we will define the grammar rules for our language:

import { Grammar, either } from 'parseit.js';
const grammar = new Grammar();

export default grammar;

To create a new rule, we can use the rule() method, present in the grammar object. ParseIT works with tokens, which have a type and, optionally, a value. To refer to a certain token within our rule, we use this notation: @TOKENTYPE or @TOKENTYPE:TOKENVALUE.

With that being said, we are ready to define our first rule:

grammar.rule("number").from("@NUMBER").pass();

The rule above looks for any token with the type NUMBER and passes it onto the parent rule, or if there's no parent rule, returns it as the result. However, the true power of recursive-descent parsers comes with nesting these rules together. To refer to a rule within another rule, we use this notation: $ruleName.

Let's define another rule:

grammar.rule("term").from("$number", "@MUL", "$number").pass();
grammar.rule("term").from("$number", "@DIV", "$number").pass();
grammar.rule("term").from("$number").pass();

As you can see, we are defining this new term rule multiple times. This will define 3 variations of the rule. When parsing, it will check for every rule variation, until it finds the one that matches. Ordering rule variations correctly is very important, as the parser scans them in the same order they're defined in the grammar file.

Anyways, the term rule looks for these three patterns:

  • 2 numbers separated by a * symbol (MUL token)
  • 2 numbers separated by a / symbol (DIV token)
  • a single number

Now, before we add support for + and - symbols, there's one more thing that needs to be solved. In our current grammatical setup, we cannot combine operations together (Ex: 2 * 2 * 3 / 4). Another words, our term rule has to be left-recursive. Luckily, ParseIT provides us with a concept called loops! There are two types of loops: binary loops, which allow for left-recursive grammar and block loops, which we'll talk about later. Let's add a binary loop to the rule:

grammar.rule("term").from("$number").binaryLoop(["@MUL", "$number"]).pass();
grammar.rule("term").from("$number").binaryLoop(["@DIV", "$number"]).pass();
grammar.rule("term").from("$number").pass();

This makes it possible to write expressions like these:

2 * 2 * 2
12 / 2 / 3
1 * 2 * 3 * 4 * 5 * 6

However, there's still one problem, being that we cannot combine both operators into a single expression. Luckily, there's another concept in ParseIT, that makes this possible. Let's rewrite the term rule once more:

grammar.rule("term").from("$number").binaryLoop([either("@MUL", "@DIV"), "$number"]).pass();
grammar.rule("term").from("$number").pass();

The either function allows for multiple choices in a single rule, without breaking the loop. Finally, we are finished with term! Let's add other rules to complete our grammar:

import { Grammar, either } from 'parseit.js';
const grammar = new Grammar();

grammar.rule("expr").from("$term").binaryLoop([either("@ADD", "@SUB"), "$term"]).pass();
grammar.rule("expr").from("$term").pass();

grammar.rule("term").from("$factor").binaryLoop([either("@MUL", "@DIV"), "$factor"]).pass();
grammar.rule("term").from("$factor").pass();

grammar.rule("factor").from("@ADD", "$factor").pass();
grammar.rule("factor").from("@SUB", "$factor").pass();
grammar.rule("factor").from("$number").pass();

grammar.rule("number").from("@NUMBER").pass();
grammar.rule("number").from("@OPAREN", "$expr", "@CPAREN").select(1);

export default grammar;

This set of rules makes it possible to use - or + unary operators before the numbers (notice how the factor rule is right-recursive), and to use parenthesis (OPAREN and CPAREN tokens) to change the order of operations. Notice the new select(number) method. It only passes the expr rule to the AST, ignoring parenthesis tokens.

Creating AST Nodes

Now, before testing our grammar rules, we need something to build our AST (Abstract Syntax Tree) with. In ParseIT, we use a special ASTNode class for that. Let's create a new file titled ast.js. We would need 4 types of nodes in our language:

  • Atom Node (for single numbers)
  • Binary Operation Node (for addition, subtraction, multiplication and division)
  • Unary Operation Node (for factor rules)
  • Block Node (for blocks of expressions)

Let's define them like so:

import { ASTNode, Token} from 'parseit.js';

export class AtomNode extends ASTNode {
    type = "atom";
    // token: Token
    constructor (token) {
        super();
        this.token = token;
    }
}

export class UnaryOpNode extends ASTNode {
    type = "unaryOp";
    // operator: Token, node: ASTNode
    constructor (operator, node) {
        super();
        this.operator = operator;
        this.node     = node;
    }
}

export class BinaryOpNode extends ASTNode {
    type = "binaryOp";
    // left: ASTNode, operator: Token, right: ASTNode
    constructor (left, operator, right) {
        super();
        this.left     = left;
        this.operator = operator;
        this.right    = right;
    }
}

export class BlockNode extends ASTNode {
    type = "block";
    // nodes: ASTNode[]
    constructor (...nodes) {
        super();
        this.nodes = nodes;
    }
}

The constructor arguments for a node have to be defined in the same order they appear in the grammar of your language if you plan to use them with the as(...) method.

Before we add these nodes to our grammar rules, let's define a small helper function:

function binaryOpOrPass (data) {
    return data.length == 1 
        ? data[0] 
        : new BinaryOpNode(...data);
}

This would make our AST a bit cleaner.

Now we can incorporate the nodes into the grammar:

import { Grammar, either } from 'parseit.js';
import { AtomNode, UnaryOpNode, BinaryOpNode, BlockNode } from './ast.js';
const grammar = new Grammar();

function binaryOpOrPass (data) {
    return data.length == 1 
        ? data[0] 
        : new BinaryOpNode(...data);
}

grammar.rule("expr").from("$term").binaryLoop([either("@ADD", "@SUB"), "$term"]).decide(binaryOpOrPass);
grammar.rule("expr").from("$term").pass();

grammar.rule("term").from("$factor").binaryLoop([either("@MUL", "@DIV"), "$factor"]).decide(binaryOpOrPass);
grammar.rule("term").from("$factor").pass();

grammar.rule("factor").from("@ADD", "$factor").as(UnaryOpNode);
grammar.rule("factor").from("@SUB", "$factor").as(UnaryOpNode);
grammar.rule("factor").from("$number").pass();

grammar.rule("number").from("@NUMBER").as(AtomNode);
grammar.rule("number").from("@OPAREN", "$expr", "@CPAREN").select(1);

export default grammar;

Our language is almost complete, but we are going to add one more thing.

Blocks

Blocks make it possible to group multiple statements into an array. Blocks also have a special separator rule, which is usually just a single token, used to separate the statements between one another. In our language we can use the new line token (NEWL) as a separator between expressions.

grammar.rule("block").blockLoop("$expr", "@NEWL").as(BlockNode);
grammar.rule("block").from("$expr").as(BlockNode);

We also have to tell the parser which rule should it start from. To do that, let's add this command after the rules:

grammar.startFrom("block");

And just like that, we are done with our grammar, hooray!

import { Grammar, either } from 'parseit.js';
import { AtomNode, UnaryOpNode, BinaryOpNode, BlockNode } from './ast.js';
const grammar = new Grammar();

function binaryOpOrPass (data) {
    return data.length == 1 
        ? data[0] 
        : new BinaryOpNode(...data);
}

grammar.rule("block").blockLoop("$expr", "@NEWL").as(BlockNode);
grammar.rule("block").from("$expr").as(BlockNode);

grammar.rule("expr").from("$term").binaryLoop([either("@ADD", "@SUB"), "$term"]).decide(binaryOpOrPass);
grammar.rule("expr").from("$term").pass();

grammar.rule("term").from("$factor").binaryLoop([either("@MUL", "@DIV"), "$factor"]).decide(binaryOpOrPass);
grammar.rule("term").from("$factor").pass();

grammar.rule("factor").from("@ADD", "$factor").as(UnaryOpNode);
grammar.rule("factor").from("@SUB", "$factor").as(UnaryOpNode);
grammar.rule("factor").from("$number").pass();

grammar.rule("number").from("@NUMBER").as(AtomNode);
grammar.rule("number").from("@OPAREN", "$expr", "@CPAREN").select(1);

grammar.startFrom("block");

export default grammar;

Parsing

Let's create a main.js file where we'll be testing our language:

import { Token, Parser} from "parseit.js";
import grammar from "./grammar.js";

const parser = new Parser(grammar);

const result = parser.parse([
    new Token("NUMBER", 2), new Token("ADD"), new Token("NUMBER", 2)
]);

console.log(JSON.stringify(result));

If you did everything right, you should see this as the output:

{
  "isNode": true,
  "type": "block",
  "nodes": [
    {
      "isNode": true,
      "type": "binaryOp",
      "left": {
        "isNode": true,
        "type": "atom",
        "token": {
          "type": "NUMBER",
          "value": "2"
        }
      },
      "operator": {
        "type": "ADD"
      },
      "right": {
        "isNode": true,
        "type": "atom",
        "token": {
          "type": "NUMBER",
          "value": "2"
        }
      }
    }
  ]
}

At this point, you should connect some sort of a lexer to convert plain text to tokens and test your language on a few other cases, such as:

2 + 2 * 2
(2 + 2) * 2
3 - 5 / 2 + 1 * 5
4 - -4 + +10 - (2 * (1 + 2))