unitas

v0.2.5

Published

a month ago

Downloads

0High
0Medium
0Low

sovrin

combinator functional parser

unitas — composing parsers into a unified whole

A lightweight, TypeScript-first parser combinator library for building expressive and composable parsers.

Features

Parser Combinators: Compose small parsers into complex ones using combinators like many, choice, sequence, and more
Terminals: Factory functions for common patterns (char, string, regex, etc.)
Primitives: Pre-built parser instances ready to use (digit, letter, whitespace, etc.)
TypeScript: Full TypeScript support with generic types and inference
Tree-shakeable: ESM-only with separate exports for combinators, terminals, primitives, and utils
No dependencies: Zero external runtime dependencies

Note: This library is in active development. The API may change before v1.0.0.

Installation

npm install unitas

Quick Start

CSV parser — parsing comma-separated values with quoted fields

import { grammar, run } from 'unitas';
import { choice, inner, separatedBy } from 'unitas/combinators';
import { char, regex } from 'unitas/terminals';
import { letters } from 'unitas/primitives';

const csv = grammar({
    row: (p) => separatedBy(p.value, char(',')),
    value: (p) => choice(p.quoted, p.unquoted),
    quoted: () => inner(char('"'), regex(/^[^"]*/), char('"')),
    unquoted: () => letters,
});

run(csv.row, 'a,b,c'); // ['a', 'b', 'c']
run(csv.row, '"a,b",c'); // ['a,b', 'c']

JSON value parser — parsing simple json values

import { grammar, run } from 'unitas';
import { choice, map, quoted } from 'unitas/combinators';
import { string } from 'unitas/terminals';
import { bool, digits, letters } from 'unitas/primitives';

const json = grammar({
    value: (p) => choice(p.string, p.number, p.bool, p.null),
    string: () => quoted(letters),
    number: () => digits,
    bool: () => bool,
    null: () => map(string('null'), () => null),
});

run(json.value, '"hello"'); // 'hello'
run(json.value, '42'); // 42
run(json.value, 'true'); // true
run(json.value, 'null'); // null

Query string parser — parsing URL query parameters

import { grammar, run } from 'unitas';
import { map, outer, separatedBy } from 'unitas/combinators';
import { char } from 'unitas/terminals';
import { letters } from 'unitas/primitives';

type Query = {
    params: Record<string, string>;
    param: [string, string];
    key: string;
    value: string;
};

const query = grammar<Query>({
    params: (p) =>
        map(separatedBy(p.param, char('&')), (pairs) =>
            Object.fromEntries(pairs),
        ),
    param: (p) => outer(p.key, char('='), p.value),
    key: () => letters,
    value: () => letters,
});

run(query.params, 'foo=bar&baz=qux'); // { foo: 'bar', baz: 'qux' }

INI file section — parsing section headers and key-value pairs

import { grammar, run } from 'unitas';
import { map, outer, sequence } from 'unitas/combinators';
import { char, regex } from 'unitas/terminals';
import { letters, nl } from 'unitas/primitives';
import { pick } from 'unitas/utils';

const ini = grammar({
    section: (p) =>
        map(
            sequence(char('['), p.name, char(']'), nl, p.entry),
            pick(1, 4),
            ([name, entry]) => ({ name, entry }),
        ),
    name: () => letters,
    entry: (p) => outer(p.key, char('='), p.value),
    key: () => letters,
    value: () => regex(/^[^\n]+/),
});

run(ini.section, '[database]\nhost=localhost'); // { name: 'database', entry: ['host', 'localhost'] }

Core Concepts

The Parser Type

A Parser<T> is a function that takes an input string and returns a Result<T>. The generic T represents the type of value the parser produces.

type Parser<T> = (input: string) => Result<T>;

The Result Type

Every parser returns a Result<T> which is either:

Success — The parser matched and produced a value
Failure — The parser did not match

type Success<T> = { ok: true; value: T; remaining: string };
type Failure = { ok: false; error?: string };
type Result<T> = Success<T> | Failure;

The remaining string is crucial — it represents what input is left after the parser has done its work. This is how we "consume" input and chain parsers together.

Creating a Parser

Use create to wrap a parsing function:

import { create, success, failure } from 'unitas';

const parser = create<string>((input) => {
    if (input.startsWith('hello')) {
        return success('hello', input.slice(5));
    }
    return failure('expected "hello"');
});

Understanding the Monadic Nature

Parsers are monadic, which means they follow certain laws that make them composable:

Left identity: create(success(a, input)) behaves like a
Right identity: parser composed with success returns equivalent result
Associativity: Composition order doesn't affect final result

The practical implication is that you can chain and combine parsers predictably.

Success Results

When a parser successfully matches, it returns:

{ ok: true, value: 'hello', remaining: ' world' }
       │           │                   │
       │           │                   └── What's left to parse
       │           └── The parsed value
       └── Always true for success

Failure Results

When a parser fails, it returns:

{ ok: false }                      // Generic failure
{ ok: false, error: 'expected a' } // Failure with message

The error field is optional — you can always add meaningful error messages later using label.

Core (`unitas`)

Core provides the fundamental types and functions for building parsers.

failure('unexpected input'); // { ok: false, error: 'unexpected input' }

type Math = {
    expr: number;
    term: number;
    value: number;
};
const g = grammar<Math>({
    expr: (p) =>
        chainLeft1(
            p.term,
            map(char('+'), () => (l, r) => l + r),
        ),
    term: (p) =>
        choice(
            p.value,
            map(sequence(char('('), p.expr, char(')')), ([, v]) => v),
        ),
    value: () => digits,
});
run(g.expr, '1+2'); // 3
run(g.expr, '1+2+3'); // 6
run(g.expr, '(1+2)'); // 3

label(char('x'), 'letter x')(''); // { ok: false, error: 'expected letter x' }

lazy(() => char('a'))('abc'); // { ok: true, value: 'a', remaining: 'bc' }

match(success('hello', ''), { success: (v) => v, failure: () => 'failed' }); // 'hello'

const memoDigits = memoize(digits);
memoDigits('123'); // { ok: true, value: 123, remaining: '' }

create((input) => success('parsed', input.slice(6)))('hello world'); // { ok: true, value: 'parsed', remaining: 'world' }

run(string('hello'), 'hello'); // 'hello'

success('hello', ' world'); // { ok: true, value: 'hello', remaining: ' world' }

Terminals (`unitas/terminals`)

Terminals are the basic building blocks that match specific parts of the input. They don't combine other parsers — they directly inspect the input string.

char('A')('ABC'); // { ok: true, value: 'A', remaining: 'BC' }

charOf(['a', 'b', 'c'])('abc'); // { ok: true, value: 'a', remaining: 'bc' }

noneOf(['a', 'b', 'c'])('xyz'); // { ok: true, value: 'x', remaining: 'yz' }

oneOf(['hello', 'hell', 'help'])('helpful'); // { ok: true, value: 'help', remaining: 'ful' }

regex(/^\w+/)('hello world'); // { ok: true, value: 'hello', remaining: ' world' }

satisfy((c) => c === 'a')('abc'); // { ok: true, value: 'a', remaining: 'bc' }

string('hello')('hello world'); // { ok: true, value: 'hello', remaining: ' world' }

stringOf('abc')('abcdef'); // { ok: true, value: 'a', remaining: 'bcdef' }

take(3)('abcdef'); // { ok: true, value: 'abc', remaining: 'def' }

takeWhile((c) => c !== 'x')('abcx'); // { ok: true, value: 'abc', remaining: 'x' }

token('let')('let x'); // { ok: true, value: 'let', remaining: 'x' }
token('let')('let1'); // { ok: true, value: 'let', remaining: '1' }
token('let')('let  x'); // { ok: true, value: 'let', remaining: 'x' }

word('let')('let x'); // { ok: true, value: 'let', remaining: 'x' }
word('let')('let1'); // { ok: false }
word('if')('if (x)'); // { ok: true, value: 'if', remaining: '(x)' }

Primitives (`unitas/primitives`)

Primitives are pre-built parser instances ready to use. Unlike terminals which are factory functions (like char('x')), primitives are constants you can pass directly to combinators.

alphaNum('a1'); // { ok: true, value: 'a', remaining: '1' }
alphaNum('1a'); // { ok: true, value: '1', remaining: 'a' }

alphaNums('abc123'); // { ok: true, value: 'abc123', remaining: '' }

anyChar('abc'); // { ok: true, value: 'a', remaining: 'bc' }

bool('true'); // { ok: true, value: true, remaining: '' }
bool('false'); // { ok: true, value: false, remaining: '' }
bool('trueABC'); // { ok: true, value: true, remaining: 'ABC' }

crlf('\r\nabc'); // { ok: true, value: '\r\n', remaining: 'abc' }

digit('5abc'); // { ok: true, value: 5, remaining: 'abc' }

digits('123abc'); // { ok: true, value: 123, remaining: 'abc' }

eof(''); // { ok: true, value: null, remaining: '' }

eol('\nabc'); // { ok: true, value: '\n', remaining: 'abc' }

float('1.23'); // { ok: true, value: 1.23, remaining: '' }
float('-2.5'); // { ok: true, value: -2.5, remaining: '' }
float('1.23abc'); // { ok: true, value: 1.23, remaining: 'abc' }

hexDigit('fF9'); // { ok: true, value: 'f', remaining: 'F9' }

hexDigits('deadbeef'); // { ok: true, value: 'deadbeef', remaining: '' }

identifier('variable_name'); // { ok: true, value: 'variable_name', remaining: '' }

integer('42'); // { ok: true, value: 42, remaining: '' }
integer('-7'); // { ok: true, value: -7, remaining: '' }
integer('123abc'); // { ok: true, value: 123, remaining: 'abc' }

letter('abc'); // { ok: true, value: 'a', remaining: 'bc' }

letters('abc123'); // { ok: true, value: 'abc', remaining: '123' }

line('hello\nworld'); // { ok: true, value: 'hello', remaining: '\nworld' }

literal('foo-bar'); // { ok: true, value: 'foo-bar', remaining: '' }
literal('123abc'); // { ok: true, value: '123abc', remaining: '' }

lowercase('abc'); // { ok: true, value: 'a', remaining: 'bc' }

lowercases('abcDEF'); // { ok: true, value: 'abc', remaining: 'DEF' }

nl('\ntext'); // { ok: true, value: '\n', remaining: 'text' }

number('42'); // { ok: true, value: 42, remaining: '' }
number('3.14'); // { ok: true, value: 3.14, remaining: '' }
number('-7'); // { ok: true, value: -7, remaining: '' }
number('-2.5'); // { ok: true, value: -2.5, remaining: '' }

octDigit('7abc'); // { ok: true, value: '7', remaining: 'abc' }

octDigits('0777abc'); // { ok: true, value: '0777', remaining: 'abc' }

position('abc'); // { ok: true, value: 3, remaining: 'abc' }

rest('hello'); // { ok: true, value: 'hello', remaining: '' }

space(' abc'); // { ok: true, value: ' ', remaining: 'abc' }

spaces('   abc'); // { ok: true, value: '   ', remaining: 'abc' }

tab('\ttext'); // { ok: true, value: '\t', remaining: 'text' }

uppercase('ABC'); // { ok: true, value: 'A', remaining: 'BC' }

uppercases('ABCdef'); // { ok: true, value: 'ABC', remaining: 'def' }

whitespace(' abc'); // { ok: true, value: ' ', remaining: 'abc' }

whitespaces('  abc'); // { ok: true, value: '  ', remaining: 'abc' }

Combinators (`unitas/combinators`)

Combinators are functions that take one or more parsers and return a new parser. They are the "glue" that lets you compose complex parsers from simple ones.

attempt(string('hello'))('hello world'); // { ok: true, value: 'hello', remaining: ' world' }

bind(digits, (n) => take(n))('3abc'); // { ok: true, value: 'abc', remaining: '' }

braced(string('hi'))('{hi}'); // { ok: true, value: 'hi', remaining: '' }

bracketed(string('hi'))('[hi]'); // { ok: true, value: 'hi', remaining: '' }

chainLeft(digits, operation)('1+2+3'); // { ok: true, value: 6, remaining: '' }
chainLeft(digits, operation)('10-3+2'); // { ok: true, value: 9, remaining: '' }

chainLeft1(digits, operation)('1+2+3'); // { ok: true, value: 6, remaining: '' }
chainLeft1(digits, operation)('8/2*3'); // { ok: true, value: 12, remaining: '' }

chainRight(digits, operation)('2-1-1'); // { ok: true, value: 2, remaining: '' }
chainRight(digits, operation)('4/2/2'); // { ok: true, value: 4, remaining: '' }

chainRight1(digits, operation)('2-1-1'); // { ok: true, value: 2, remaining: '' }
chainRight1(digits, operation)('4/2/2'); // { ok: true, value: 4, remaining: '' }

choice(string('hello'), string('world'))('hello'); // { ok: true, value: 'hello', remaining: '' }

concat(many(letter))('abc123'); // { ok: true, value: 'abc', remaining: '123' }
concat(many(letter), '-')('abc123'); // { ok: true, value: 'a-b-c', remaining: '123' }

consume(string('hello'))('hello world'); // { ok: true, value: null, remaining: ' world' }

endBy(string('item'), char(';'))('item;item;item;'); // { ok: true, value: ['item', 'item', 'item'], remaining: '' }

endBy1(string('item'), char(';'))('item;item;item;'); // { ok: true, value: ['item', 'item', 'item'], remaining: '' }

exactly(char('a'), 3)('aaa'); // { ok: true, value: ['a', 'a', 'a'], remaining: '' }

first(sequence(char('a'), digit))('a1bc'); // { ok: true, value: 'a', remaining: 'bc' }

flag(string('*'))('*abc'); // { ok: true, value: true, remaining: 'abc' }
flag(string('*'))('abc'); // { ok: true, value: false, remaining: 'abc' }

fold(digit, [], (acc, d) => [...acc, d])('123'); // { ok: true, value: [1, 2, 3], remaining: '' }

fold1(digit, 0, (acc, d) => acc + d)('123'); // { ok: true, value: 6, remaining: '' }

foldRight(digit, [], (acc, d) => [...acc, d])('123'); // { ok: true, value: [3, 2, 1], remaining: '' }

foldRight1(digit, [], (acc, d) => [...acc, d])('123'); // { ok: true, value: [3, 2, 1], remaining: '' }

fuse(char('a'), char('b'), char('c'))('abc'); // { ok: true, value: 'abc', remaining: '' }
fuse(string('hello'), char(' '), string('world'))('hello world'); // { ok: true, value: 'hello world', remaining: '' }

guard(true, string('hello'))('hello'); // { ok: true, value: 'hello', remaining: '' }
guard(false, string('hello'))('hello'); // { ok: false }

inner(char('('), string('hi'), char(')'))('(hi)'); // { ok: true, value: 'hi', remaining: '' }

interleaved(char('a'), char(','))('a,a,a'); // { ok: true, value: ['a', ',', 'a', ',', 'a'], remaining: '' }

last(sequence(char('a'), char('b')))('ab'); // { ok: true, value: 'b', remaining: '' }

left(string('hello'), string('world'))('helloworld'); // { ok: true, value: 'hello', remaining: '' }

lexeme(string('hello'))('hello   world'); // { ok: true, value: 'hello', remaining: 'world' }

many(char('a'))('aaa'); // { ok: true, value: ['a', 'a', 'a'], remaining: '' }

many1(char('a'))('aaa'); // { ok: true, value: ['a', 'a', 'a'], remaining: '' }

manyAtLeast(char('a'), 2)('aaa'); // { ok: true, value: ['a', 'a', 'a'], remaining: '' }

manyAtMost(char('a'), 2)('aaa'); // { ok: true, value: ['a', 'a'], remaining: 'a' }

manyBetween(char('a'), 2, 3)('aaa'); // { ok: true, value: ['a', 'a', 'a'], remaining: '' }

manyTill(char('a'), char('b'))('aaab'); // { ok: true, value: ['a', 'a', 'a'], remaining: '' }

map(string('hello'), (v) => v.toUpperCase())('hello'); // { ok: true, value: 'HELLO', remaining: '' }

node('binop', { left: digits, op: char('+'), right: digits })('1+2'); // { ok: true, value: { type: 'binop', left: 1, op: '+', right: 2 }, remaining: '' }
node('number', { value: digits })('123'); // { ok: true, value: { type: 'number', value: 123 }, remaining: '' }

not(string('hello'))('world'); // { ok: true, value: null, remaining: 'world' }

nth(sequence(char('a'), char('b'), char('c')), 1)('abc'); // { ok: true, value: 'b', remaining: '' }

optional(string('hello'))('hello'); // { ok: true, value: 'hello', remaining: '' }
optional(string('hello'))('world'); // { ok: true, value: null, remaining: 'world' }

optionalConsume(string('hello'))('hello world'); // { ok: true, value: undefined, remaining: ' world' }
optionalConsume(string('hello'))('world'); // { ok: true, value: undefined, remaining: 'world' }

optionalSeparatedBy(digits, char(','))('1,2'); // { ok: true, value: [1, 2], remaining: '' }
optionalSeparatedBy(digits, char(','))(',1'); // { ok: true, value: [null, 1], remaining: '' }
optionalSeparatedBy(digits, char(','))('1,'); // { ok: true, value: [1], remaining: '' }

outer(char('('), string('hi'), char(')'))('(hi)'); // { ok: true, value: ['(', ')'], remaining: '' }

padded(string('hi'))('   hi   '); // { ok: true, value: 'hi', remaining: '' }

parenthesized(string('hi'))('(hi)'); // { ok: true, value: 'hi', remaining: '' }

peek(string('hello'))('hello world'); // { ok: true, value: 'hello', remaining: 'hello world' }

postfix(
    char('a'),
    map(char('!'), () => (x) => x),
)('a!'); // { ok: true, value: 'a', remaining: '' }

prefix(
    map(char('-'), () => (x) => -x),
    digit,
)('-5'); // { ok: true, value: -5, remaining: '' }

pure(42)('abc'); // { ok: true, value: 42, remaining: 'abc' }
pure('ok')(''); // { ok: true, value: 'ok', remaining: '' }

quoted(string('hello'))('"hello"'); // { ok: true, value: 'hello', remaining: '' }

recover(string('hello'), 'default')('world'); // { ok: true, value: 'default', remaining: 'world' }

right(string('hello'), string('world'))('helloworld'); // { ok: true, value: 'world', remaining: '' }

separatedBy(char('a'), char(','))('a,a,a'); // { ok: true, value: ['a', 'a', 'a'], remaining: '' }

separatedBy1(char('a'), char(','))('a,a,a'); // { ok: true, value: ['a', 'a', 'a'], remaining: '' }

separatedEndBy(char('a'), char(';'))('a;a;a;'); // { ok: true, value: ['a', 'a', 'a'], remaining: '' }

separatedEndBy1(char('a'), char(';'))('a;a;a;'); // { ok: true, value: ['a', 'a', 'a'], remaining: '' }

separatedUntil(char('a'), char(','), char(';'))('a,a,a;'); // { ok: true, value: ['a', 'a', 'a'], remaining: '' }

sequence(char('a'), char('b'), char('c'))('abc'); // { ok: true, value: ['a', 'b', 'c'], remaining: '' }

skip(char('a'), 2)('aabc'); // { ok: true, value: null, remaining: 'bc' }

skipMany(char('a'))('aaabc'); // { ok: true, value: null, remaining: 'bc' }

skipMany1(char('a'))('aaabc'); // { ok: true, value: null, remaining: 'bc' }

surrounded(char('['), string('hi'), char(']'))('[hi]'); // { ok: true, value: 'hi', remaining: '' }
surrounded(char('a'), char('b'), char('c'))('abc'); // { ok: true, value: 'b', remaining: '' }

unless(false, string('hello'))('hello'); // { ok: true, value: 'hello', remaining: '' }
unless(true, string('hello'))('hello'); // { ok: true, value: null, remaining: 'hello' }

until(char('a'), char('b'))('baaa'); // { ok: true, value: [], remaining: 'baaa' }
until(char('a'), char('b'))('aaba'); // { ok: true, value: ['a', 'a'], remaining: 'ba' }

validate(digit, (n) => n > 5)('7'); // { ok: true, value: 7, remaining: '' }
validate(digit, (n) => n > 5)('3'); // { ok: false }

value(string('true'), true)('true'); // { ok: true, value: true, remaining: '' }
value(string('null'), null)('null'); // { ok: true, value: null, remaining: '' }

when(flag(char('*')), pure('many'), pure('one'))('*rest'); // { ok: true, value: 'many', remaining: 'rest' }
when(flag(char('*')), pure('many'), pure('one'))('abc'); // { ok: true, value: 'one', remaining: 'abc' }

Utils (`unitas/utils`)

Utils are utility functions for working with parser results, arrays, and function composition.

filter([1, 2, 3])([1, 2, 3, 4, 5]); // [4, 5]
filter([1, 2], true)([1, false, 3]); // [3]

flatten()([1, [2, [3]]]); // [1, 2, [3]]
flatten(2)([1, [2, [3]]]); // [1, 2, 3]

join()([1, 2, 3]); // '123'
join('-')([1, 2, 3]); // '1-2-3'

pick(0, 2)(['a', 'b', 'c']); // ['a', 'c']
pick(2, 4)(['a', 'b', 'c', 'd', 'e']); // ['c', 'e']

pipe(lexeme)(letters)('xyz   abc'); // { ok: true, value: 'xyz', remaining: 'abc' }

pop()([1, 2, 3]); // 3

shift()([1, 2, 3]); // 1

spread()(1, 2, 3); // [1, 2, 3]

License

MIT

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme