@typesugar/parser
v0.1.1
Published
🧊 Compile-time parser generation from PEG grammars — Boost.Spirit for TypeScript
Maintainers
Readme
@typesugar/parser
Compile-time parser generation from PEG grammars. Think Boost.Spirit, but for TypeScript.
Define grammars with tagged templates or programmatic combinators, get recursive-descent parsers.
Current Status: Phase 1 is production-ready with runtime-interpreted grammars. Phase 2 will add compile-time code generation for zero-cost parsing.
Quick Start
Grammar template
import { grammar } from "@typesugar/parser";
const ident = grammar`
ident = letter (letter | digit)*
letter = 'a'..'z' | 'A'..'Z' | '_'
digit = '0'..'9'
`;
ident.parseAll("hello_42"); // succeeds
ident.parseAll("42nope"); // throws ParseErrorCombinator API
import {
literal,
char,
charRange,
many,
many1,
alt,
seq,
map,
sepBy,
between,
lazy,
integer,
token,
} from "@typesugar/parser";
// CSV row: comma-separated integers
const row = sepBy(integer(), char(","));
row.parseAll("1,2,3"); // [1, 2, 3]
// Arithmetic with correct precedence
const expr = lazy(() =>
map(seq(term, many(seq(token(alt(char("+"), char("-"))), term))), ([first, rest]) =>
rest.reduce((a, [op, b]) => (op === "+" ? a + b : a - b), first)
)
);
const term = lazy(() =>
map(seq(factor, many(seq(token(alt(char("*"), char("/"))), factor))), ([first, rest]) =>
rest.reduce((a, [op, b]) => (op === "*" ? a * b : a / b), first)
)
);
const factor = lazy(() => alt(between(char("("), expr, char(")")), token(integer())));
expr.parseAll("2 + 3 * 4"); // 14
expr.parseAll("(2 + 3) * 4"); // 20Grammar Syntax
PEG grammar definitions support these constructs:
| Syntax | Meaning |
| -------------------- | ------------------------------------------------------------- |
| rule = expr | Rule definition |
| a b c | Sequence |
| a \| b | Ordered alternation (first match wins) |
| a* | Zero or more |
| a+ | One or more |
| a? | Optional |
| "text" or 'text' | String literal |
| 'a'..'z' | Character range (inclusive) |
| . | Any single character |
| !a b | Negative lookahead: succeed only if a fails, then match b |
| (group) | Grouping |
| ruleName | Reference to another rule |
| // comment | Line comment |
The first rule defined is the start rule by default.
Combinator Reference
Primitives
| Function | Description |
| --------------------- | ------------------------------------- |
| literal(s) | Match exact string s |
| char(c) | Match single character c |
| charRange(from, to) | Match character in range [from, to] |
| anyChar() | Match any single character |
| regex(pattern) | Match regex at current position |
| eof() | Match end of input |
Combinators
| Function | Description |
| ------------------------- | --------------------------------------------- |
| seq(a, b) | Sequence: match a then b, return [A, B] |
| seq3(a, b, c) | Sequence of three, return [A, B, C] |
| alt(a, b) | Ordered alternation: try a, then b |
| many(p) | Zero or more repetitions |
| many1(p) | One or more repetitions |
| optional(p) | Optional: returns T \| null |
| not(p) | Negative lookahead (no consumption) |
| map(p, f) | Transform result with function f |
| sepBy(item, sep) | Zero or more items separated by sep |
| sepBy1(item, sep) | One or more items separated by sep |
| between(open, p, close) | Parse p between delimiters |
| lazy(f) | Lazy parser for recursive grammars |
Convenience
| Function | Description |
| ---------------- | ---------------------------------------- |
| digit() | ASCII digit [0-9] |
| letter() | ASCII letter [a-zA-Z] |
| whitespace() | One or more whitespace characters |
| token(p) | Parse p, skip surrounding whitespace |
| integer() | Integer with optional leading - |
| float() | Floating-point number |
| quotedString() | Double-quoted string with escape support |
PEG vs CFG
This package uses PEG (Parsing Expression Grammar) semantics:
- Ordered alternation:
a | btriesafirst. Ifasucceeds,bis never tried. No ambiguity. - Greedy repetition:
a*matches as many as possible. - No left recursion: PEG parsers cannot handle left-recursive rules. The grammar parser detects this and throws a helpful error suggesting how to rewrite.
How It Works
Phase 1 (current): The grammar tagged template parses the PEG definition at runtime into a grammar IR, then builds a recursive-descent parser from that IR. The combinator API builds parsers directly.
Phase 2 (planned -- not yet implemented): The grammar macro will generate inlined recursive-descent parser code at compile time via the typesugar transformer, producing zero-overhead parsers with no runtime grammar interpretation. Currently, the grammar macro validates grammar syntax at compile time but delegates parsing to the runtime interpreter.
Error Reporting
Parse failures include the position (line/col) and what was expected:
Parse error at line 3, col 5: expected "}" or ","The parser tracks the "furthest failure" position across all alternation branches, so error messages point to the most likely problem location.
Comparison to PEG.js / Peggy
| Feature | PEG.js / Peggy | @typesugar/parser |
| ------------------ | ---------------------- | ---------------------------------------------------------------------------- |
| Grammar definition | Separate .pegjs file | Inline tagged template or combinators |
| Code generation | Separate build step | Phase 2 (planned): compile-time macro |
| Type safety | Manual typing | Full TypeScript inference |
| Runtime overhead | Pre-generated parser | Phase 1: interpreted; Phase 2 (planned): zero-cost via compile-time inlining |
| Integration | Standalone tool | Part of your TypeScript build |
