npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

meta-parser-generator

v1.1.2

Published

A PEG parser generator with packrat parsing written in JavaScript for JavaScript

Readme

Meta Parser Generator

npm install meta-parser-generator

Meta Parser Generator will help you generate an efficient parser using grammar and a token definition. Meta programming is used to generate a single self-contained parser file.

Characteristics

  • PEG parser (Parsing Expression Grammar) with ordered choice
  • Packrat parsing with memoization for linear time complexity
  • Direct left recursion support using Guido van Rossum's algorithm
  • Parser code is generated from a grammar
  • Good parsing performance (O(n) with memoization)
  • Excellent error reporting with context
  • Small source code (~600 lines of code), no dependencies

Important: Grammar Order Matters

Unlike LL or LR parsers, PEG parsers use ordered choice. The first matching alternative is selected, and no backtracking occurs across alternatives. This means:

// This grammar will NEVER match 'number' because 'name' matches first!
'VALUE': [
  ['name'],    // matches ANY identifier including '123abc'
  ['number'],  // NEVER reached if name is defined as /^[\w]+/
]

// Correct order: more specific rules first
'VALUE': [
  ['number'],  // try number first
  ['name'],    // then try name
]

How to generate and use a parser

This will generate a mathematical operation parser

// generator.js
const { generateParser } = require('meta-parser-generator');
const path = require('path');

// only 3 possible tokens
const tokensDefinition = {
  'number': { reg: /^[0-9]+(\.[0-9]*)?/ },
  'math_operator': { reg: /^(\+|-|\*|%)/ },
  'newline': { str: '\n' }
}

const grammar = {
  // START is the convention keyword for the entry point of the grammar
  'START': [
    // necessary to accept the first line be MATH
    ['MATH', 'LINE*', 'EOS'], // EOS is the End Of Stream token, added automatically by the tokenizer
    // * is the repeating modifier {0,∞}. Better than recursion as it does not use the call stack
    ['LINE*', 'EOS'],
  ],
  'LINE': [
    // we define a line as always starting with a newline
    ['newline', 'MATH'],
    ['newline'],
  ],
  'MATH': [
    // direct left recursion here
    ['MATH', 'math_operator', 'number'],
    ['number'],
  ],
};

Then execute this script node generate.js

// generate.js
const { tokensDefinition, grammar } = require('./generator');
// this generate the executable parser file
generateParser(grammar, tokensDefinition, path.resolve(__dirname, './parser.js'));
console.log('parser generated');

Then you can use the generated parser this way

const parser = require('./parser');
const { tokensDefinition, grammar } = require('./generator');
const { displayError } = require('meta-parser-generator');

function parse(input) {
  const tokens = parser.tokenize(tokensDefinition, input);
  const ast = parser.parse(tokens);
  if (!ast.success) {
    displayError(tokens, tokensDefinition, grammar, ast);
  }
  return ast;
}

let ast = parse('9+10-190.3');
console.log(ast)

How does the generated parser work?

Each grammar rule you write is transformed into a function, and those grammar functions call each other until the input parsing is successful. The parser uses:

  1. PEG Ordered Choice: For each rule with multiple alternatives, tries them in order and returns the first match
  2. Packrat Parsing: Memoization prevents re-parsing the same position, guaranteeing O(n) time complexity
  3. Left Recursion Handling: Uses a special memoization strategy based on Guido van Rossum's algorithm

The JavaScript call stack is used by the generated parser. So, if you design a very recursive grammar, you might trigger a "Maximum call stack size exceeded" error for a large input.

In our example, the MATH grammar rule has left recursion, allowing you to parse expressions like 1+2+3+4+5+...X, where X is limited by V8's stack size.

To find out the default maximum stack size of V8, run node --v8-options | grep stack-size. If the default size is not enough, you can extend it or rewrite your grammar.

Best practice: Use modifiers (*, +, ?) instead of recursion when possible - they don't use the call stack and handle large inputs better.

Note: For very large files, the memoization cache can grow significantly. The parser clears the cache between parse calls, but memory usage during parsing is proportional to input size × grammar complexity.

AST interface

type ASTNode = RuleNode | Token

export interface RuleNode {
  stream_index: number                // position of the first token for this rule in the token stream
  type: string                        // name of the rule
  sub_rule_index: number              // index of this grammar rule in the sub_rule_index array
  children: [ASTNode]                 // list of children
  named: { [key: string]: ASTNode; }  // named elements withing this rule, see named aliases 
}

At the leaf of the AST you will find the final tokens. They have a slightly different interface

export interface Token {
  stream_index: number // position of the token in the token stream
  type: string         // name of token
  value: string        // the value of the token
  len: number          // shortcut for value.length
  line_start: number   // line start position in the input
  column_start: number // column start position in the input
  start: number        // character start position in the input 
}

Modifiers

There is 3 modifiers you can add at the end of a rule or token

* is the {0,∞} repeating modifier 
+ is the {1,∞} repeating modifier
? is the {0,1} conditional modifier

Example

['PREPOSITION', 'ADJECTIVE*', 'NAME']

Named alias

To facilitate your work with the AST, you can name a rule or a token using a colon

'MATH': [
  ['MATH', 'math_operator:operator', 'number:num'], // tokens math_operator and number
                                                    // are named with operator and num
  ['number:num'],                                   // here only number is named with num
]

Then in the corresponding RuleNode you will find the math_operator in the children, but also in the named object. This is useful to more easily handle and differenciate your grammar rules:

// a function that handle both MATH grammar rules defined above
function handle_MATH_node(node: RuleNode) {
  const named = node.named
  // if there is an operator, we are dealing with sub rule 0
  if(named['operator']) {
    const left_recursion = handle_MATH_node(node.children[0])
    console.log(`${left_recursion} ${named['operator'].value} ${named['num'].value}`)
  } else {
    console.log(named['num'].value)
  }
}

Errors

The util function displayError will display detailed informations about a tokenizer or parsing error. The hint given is based on the first grammar rule found that consume the most token from the stream.

Projects using this parser

  • The Blop language https://github.com/batiste/blop-language