@mattwca/little-parser-lib

v1.0.4

Published

2 days ago

A lightweight, flexible TypeScript library for building parsers using parser combinators. Create powerful parsers by combining simple, reusable parsing functions.

0High
0Medium
0Low

mattwca

parser combinators parser-combinators parsing

@mattwca/little-parser-lib

A lightweight, flexible TypeScript library for building parsers using parser combinators. Create powerful parsers by combining simple, reusable parsing functions.

Features

🚀 Parser Combinators: Build complex parsers from simple building blocks
🔍 Built-in Tokenizer: Flexible tokenization with regex and string matching
📝 TypeScript First: Full type safety and IntelliSense support
🎯 Backtracking Support: Automatic position restoration on parse failures
📦 Zero Dependencies: Lightweight with no external runtime dependencies
✨ Widely Compatible: Packaged with tsdown

Installation

npm install @mattwca/little-parser-lib

Quick Start

import { Tokenizer, TokenStream, anyOf, and, many, runParser } from '@mattwca/little-parser-lib';

// 1. Define your tokenizer
const tokenizer = new Tokenizer()
  .withTokenType('letter', /[a-zA-Z]/)
  .withTokenType('digit', /[0-9]/)
  .withTokenType('whitespace', /\s/);

// 2. Tokenize your input
const tokens = tokenizer.tokenize('hello123');
const stream = new TokenStream(tokens);

// 3. Create a parser using combinators
const parser = and(
  many(anyOf('letter')),
  many(anyOf('digit'))
);

// 4. Run the parser
const result = runParser(parser, stream);
console.log(result); // { result: [[...letters], [...digits]] }

Core Concepts

Tokenizer

The Tokenizer class converts raw input strings into tokens. Each token has a type, value, and position.

const tokenizer = new Tokenizer()
  .withTokenType('number', /[0-9]/)
  .withTokenType('operator', /[+\-*/]/)
  .withTokenType('whitespace', /\s/);

const tokens = tokenizer.tokenize('1 + 2');
// [
//   { type: 'number', value: '1', position: { line: 1, column: 1 } },
//   { type: 'whitespace', value: ' ', position: { line: 1, column: 2 } },
//   { type: 'operator', value: '+', position: { line: 1, column: 3 } },
//   ...
// ]

Parser Functions

A parser function (ParseFn<T>) takes a TokenStream and returns a ParserResult<T>, which can be either:

SuccessfulParserResult<T>: Contains the parsed result
FailedParserResult: Contains error message and position

Parser Combinators

`and(...parsers)`

Combines multiple parsers in sequence. All parsers must succeed.

const parser = and(
  anyOf('keyword'),
  anyOf('identifier'),
  anyOf('semicolon')
);

`or(...parsers)`

Tries parsers in order, returns the first successful result. If all fail, returns the deepest error.

const parser = or(
  anyOf('keyword'),
  anyOf('identifier'),
  anyOf('operator')
);

`many(parser)`

Applies a parser repeatedly until it fails (requires at least one success).

const parser = many(anyOf('digit')); // Parses one or more digits

`optional(parser, shouldBacktrack?)`

Makes a parser optional. Returns null if it fails.

const parser = optional(anyOf('sign')); // Sign is optional

`attempt(parser)`

Wraps a parser with automatic backtracking on failure.

const parser = attempt(
  and(anyOf('keyword'), anyOf('identifier'))
);

`map(parser, mapFn)`

Transforms the result of a parser using a mapping function.

const digitParser = anyOf('digit');
const numberParser = map(
  many(digitParser),
  (tokens) => parseInt(tokens.map(t => t.value).join(''))
);

`label(label, parser)`

Adds a custom label to parser errors for better debugging.

const parser = label(
  'function declaration',
  and(anyOf('function'), anyOf('identifier'))
);

Built-in Parsers

`anyOf(...types)`

Parses any token matching the specified type(s).

const parser = anyOf('letter', 'digit', 'underscore');

`anyExcept(...types)`

Parses any token NOT matching the specified type(s).

const parser = anyExcept('whitespace', 'newline');

`endOfInput()`

Ensures the end of input has been reached.

const parser = and(
  myMainParser,
  endOfInput() // Ensure nothing left to parse
);

Running Parsers

`runParser(parser, tokenStream)`

Runs a parser on a token stream. Throws ParsingError on failure.

try {
  const result = runParser(myParser, tokenStream);
  console.log(result.result);
} catch (error) {
  if (error instanceof ParsingError) {
    console.error(`Parse error at ${error.position.line}:${error.position.column}`);
  }
}

`runParserOnString(parser, input, tokenizer)`

Convenience method to tokenize and parse in one step.

const result = runParserOnString(myParser, 'input string', tokenizer);

Utilities

The library provides utility functions to help with common parser result manipulation tasks.

`unwrapResult(items)`

Flattens nested arrays that result from combining parsers like and and many. This is particularly useful when you have deeply nested parser structures and need a flat array of results.

import { unwrapResult } from '@mattwca/little-parser-lib';

// Parser results can be nested
const parser = and(
  many(anyOf('letter')),
  many(anyOf('digit'))
);

const result = runParser(parser, stream);
// result.result might be: [[token1, token2], [token3, token4]]

const flattened = unwrapResult(result.result);
// flattened is: [token1, token2, token3, token4]

Parameters:

items: (T | T[])[] - An array that may contain nested arrays

Returns:

T[] - A flattened array with all nested items extracted

Example Use Cases:

// Use with map to process flattened results
const tokenParser = map(
  and(many(anyOf('letter')), many(anyOf('digit'))),
  (results) => unwrapResult(results).map(t => t.value).join('')
);

Example: Simple Expression Parser

import { 
  Tokenizer, 
  TokenStream, 
  anyOf, 
  and, 
  or, 
  many, 
  map, 
  runParserOnString 
} from '@mattwca/little-parser-lib';

// Define tokenizer
const tokenizer = new Tokenizer()
  .withTokenType('digit', /[0-9]/)
  .withTokenType('plus', '+')
  .withTokenType('minus', '-')
  .withTokenType('whitespace', /\s/);

// Define parsers
const digit = anyOf('digit');
const number = map(
  many(digit),
  (tokens) => parseInt(tokens.map(t => t.value).join(''))
);

const operator = or(
  anyOf('plus'),
  anyOf('minus')
);

const expression = and(
  number,
  optional(anyOf('whitespace')),
  operator,
  optional(anyOf('whitespace')),
  number
);

// Parse
const result = runParserOnString(expression, '10 + 5', tokenizer);
console.log(result.result); // [10, null, {...}, null, 5]

Error Handling

The library provides detailed error messages with position information:

try {
  const result = runParser(myParser, stream);
} catch (error) {
  if (error instanceof ParsingError) {
    console.error(`
      Error: ${error.message}
      Line: ${error.position.line}
      Column: ${error.position.column}
      Position: ${error.position.position}
    `);
  }
}

API Reference

Classes

Tokenizer: Converts input strings into tokens
TokenStream: Manages token consumption and backtracking
ParsingError: Error thrown when parsing fails

Types

Token: Represents a single token with type, value, and position
TokenType: String identifier for token types
ParseFn<T>: Function that takes a TokenStream and returns ParserResult
ParserResult<T>: Union of SuccessfulParserResult and FailedParserResult

Combinators

and(...parsers): Sequential combination
or(...parsers): Alternative combination
many(parser): One or more repetitions
optional(parser): Optional parser
attempt(parser): Parser with backtracking
map(parser, fn): Transform parser result
label(label, parser): Add error label

Parsers

anyOf(...types): Match any of specified token types
anyExcept(...types): Match any token except specified types
endOfInput(): Match end of input

Utilities

runParser(parser, stream): Execute parser on token stream
runParserOnString(parser, input, tokenizer): Execute parser on string
isSuccessfulResult(result): Type guard for successful results
isFailedResult(result): Type guard for failed results
unwrapResult(results): Unwrap nested parser results

License

MIT

Author

@mattwca

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

@mattwca/little-parser-lib

Features

Installation

Quick Start

Core Concepts

Tokenizer

Parser Functions

Parser Combinators

and(...parsers)

or(...parsers)

many(parser)

optional(parser, shouldBacktrack?)

attempt(parser)

map(parser, mapFn)

label(label, parser)

Built-in Parsers

anyOf(...types)

anyExcept(...types)

endOfInput()

Running Parsers

runParser(parser, tokenStream)

runParserOnString(parser, input, tokenizer)

Utilities

unwrapResult(items)

Example: Simple Expression Parser

Error Handling

API Reference

Classes

Types

Combinators

Parsers

Utilities

License

Author

`and(...parsers)`

`or(...parsers)`

`many(parser)`

`optional(parser, shouldBacktrack?)`

`attempt(parser)`

`map(parser, mapFn)`

`label(label, parser)`

`anyOf(...types)`

`anyExcept(...types)`

`endOfInput()`

`runParser(parser, tokenStream)`

`runParserOnString(parser, input, tokenizer)`

`unwrapResult(items)`