@mattwca/little-parser-lib
v1.0.4
Published
A lightweight, flexible TypeScript library for building parsers using parser combinators. Create powerful parsers by combining simple, reusable parsing functions.
Maintainers
Readme
@mattwca/little-parser-lib
A lightweight, flexible TypeScript library for building parsers using parser combinators. Create powerful parsers by combining simple, reusable parsing functions.
Features
- 🚀 Parser Combinators: Build complex parsers from simple building blocks
- 🔍 Built-in Tokenizer: Flexible tokenization with regex and string matching
- 📝 TypeScript First: Full type safety and IntelliSense support
- 🎯 Backtracking Support: Automatic position restoration on parse failures
- 📦 Zero Dependencies: Lightweight with no external runtime dependencies
- ✨ Widely Compatible: Packaged with tsdown
Installation
npm install @mattwca/little-parser-libQuick Start
import { Tokenizer, TokenStream, anyOf, and, many, runParser } from '@mattwca/little-parser-lib';
// 1. Define your tokenizer
const tokenizer = new Tokenizer()
.withTokenType('letter', /[a-zA-Z]/)
.withTokenType('digit', /[0-9]/)
.withTokenType('whitespace', /\s/);
// 2. Tokenize your input
const tokens = tokenizer.tokenize('hello123');
const stream = new TokenStream(tokens);
// 3. Create a parser using combinators
const parser = and(
many(anyOf('letter')),
many(anyOf('digit'))
);
// 4. Run the parser
const result = runParser(parser, stream);
console.log(result); // { result: [[...letters], [...digits]] }Core Concepts
Tokenizer
The Tokenizer class converts raw input strings into tokens. Each token has a type, value, and position.
const tokenizer = new Tokenizer()
.withTokenType('number', /[0-9]/)
.withTokenType('operator', /[+\-*/]/)
.withTokenType('whitespace', /\s/);
const tokens = tokenizer.tokenize('1 + 2');
// [
// { type: 'number', value: '1', position: { line: 1, column: 1 } },
// { type: 'whitespace', value: ' ', position: { line: 1, column: 2 } },
// { type: 'operator', value: '+', position: { line: 1, column: 3 } },
// ...
// ]Parser Functions
A parser function (ParseFn<T>) takes a TokenStream and returns a ParserResult<T>, which can be either:
SuccessfulParserResult<T>: Contains the parsed resultFailedParserResult: Contains error message and position
Parser Combinators
and(...parsers)
Combines multiple parsers in sequence. All parsers must succeed.
const parser = and(
anyOf('keyword'),
anyOf('identifier'),
anyOf('semicolon')
);or(...parsers)
Tries parsers in order, returns the first successful result. If all fail, returns the deepest error.
const parser = or(
anyOf('keyword'),
anyOf('identifier'),
anyOf('operator')
);many(parser)
Applies a parser repeatedly until it fails (requires at least one success).
const parser = many(anyOf('digit')); // Parses one or more digitsoptional(parser, shouldBacktrack?)
Makes a parser optional. Returns null if it fails.
const parser = optional(anyOf('sign')); // Sign is optionalattempt(parser)
Wraps a parser with automatic backtracking on failure.
const parser = attempt(
and(anyOf('keyword'), anyOf('identifier'))
);map(parser, mapFn)
Transforms the result of a parser using a mapping function.
const digitParser = anyOf('digit');
const numberParser = map(
many(digitParser),
(tokens) => parseInt(tokens.map(t => t.value).join(''))
);label(label, parser)
Adds a custom label to parser errors for better debugging.
const parser = label(
'function declaration',
and(anyOf('function'), anyOf('identifier'))
);Built-in Parsers
anyOf(...types)
Parses any token matching the specified type(s).
const parser = anyOf('letter', 'digit', 'underscore');anyExcept(...types)
Parses any token NOT matching the specified type(s).
const parser = anyExcept('whitespace', 'newline');endOfInput()
Ensures the end of input has been reached.
const parser = and(
myMainParser,
endOfInput() // Ensure nothing left to parse
);Running Parsers
runParser(parser, tokenStream)
Runs a parser on a token stream. Throws ParsingError on failure.
try {
const result = runParser(myParser, tokenStream);
console.log(result.result);
} catch (error) {
if (error instanceof ParsingError) {
console.error(`Parse error at ${error.position.line}:${error.position.column}`);
}
}runParserOnString(parser, input, tokenizer)
Convenience method to tokenize and parse in one step.
const result = runParserOnString(myParser, 'input string', tokenizer);Utilities
The library provides utility functions to help with common parser result manipulation tasks.
unwrapResult(items)
Flattens nested arrays that result from combining parsers like and and many. This is particularly useful when you have deeply nested parser structures and need a flat array of results.
import { unwrapResult } from '@mattwca/little-parser-lib';
// Parser results can be nested
const parser = and(
many(anyOf('letter')),
many(anyOf('digit'))
);
const result = runParser(parser, stream);
// result.result might be: [[token1, token2], [token3, token4]]
const flattened = unwrapResult(result.result);
// flattened is: [token1, token2, token3, token4]Parameters:
items: (T | T[])[]- An array that may contain nested arrays
Returns:
T[]- A flattened array with all nested items extracted
Example Use Cases:
// Use with map to process flattened results
const tokenParser = map(
and(many(anyOf('letter')), many(anyOf('digit'))),
(results) => unwrapResult(results).map(t => t.value).join('')
);Example: Simple Expression Parser
import {
Tokenizer,
TokenStream,
anyOf,
and,
or,
many,
map,
runParserOnString
} from '@mattwca/little-parser-lib';
// Define tokenizer
const tokenizer = new Tokenizer()
.withTokenType('digit', /[0-9]/)
.withTokenType('plus', '+')
.withTokenType('minus', '-')
.withTokenType('whitespace', /\s/);
// Define parsers
const digit = anyOf('digit');
const number = map(
many(digit),
(tokens) => parseInt(tokens.map(t => t.value).join(''))
);
const operator = or(
anyOf('plus'),
anyOf('minus')
);
const expression = and(
number,
optional(anyOf('whitespace')),
operator,
optional(anyOf('whitespace')),
number
);
// Parse
const result = runParserOnString(expression, '10 + 5', tokenizer);
console.log(result.result); // [10, null, {...}, null, 5]Error Handling
The library provides detailed error messages with position information:
try {
const result = runParser(myParser, stream);
} catch (error) {
if (error instanceof ParsingError) {
console.error(`
Error: ${error.message}
Line: ${error.position.line}
Column: ${error.position.column}
Position: ${error.position.position}
`);
}
}API Reference
Classes
Tokenizer: Converts input strings into tokensTokenStream: Manages token consumption and backtrackingParsingError: Error thrown when parsing fails
Types
Token: Represents a single token with type, value, and positionTokenType: String identifier for token typesParseFn<T>: Function that takes a TokenStream and returns ParserResultParserResult<T>: Union of SuccessfulParserResult and FailedParserResult
Combinators
and(...parsers): Sequential combinationor(...parsers): Alternative combinationmany(parser): One or more repetitionsoptional(parser): Optional parserattempt(parser): Parser with backtrackingmap(parser, fn): Transform parser resultlabel(label, parser): Add error label
Parsers
anyOf(...types): Match any of specified token typesanyExcept(...types): Match any token except specified typesendOfInput(): Match end of input
Utilities
runParser(parser, stream): Execute parser on token streamrunParserOnString(parser, input, tokenizer): Execute parser on stringisSuccessfulResult(result): Type guard for successful resultsisFailedResult(result): Type guard for failed resultsunwrapResult(results): Unwrap nested parser results
License
MIT
Author
@mattwca
