@timkendrick/syntax
v0.0.1
Published
TypeScript parser combinator toolkit
Readme
@timkendrick/syntax
A TypeScript parser combinator library for building type-safe parsers with declarative grammar definitions.
Installation
npm install @timkendrick/syntaxQuick Start
import { syntax } from '@timkendrick/syntax';
// Define a simple arithmetic grammar
const parser = syntax(`
NUMBER ::= /\\d+/
PLUS ::= "+"
MULTIPLY ::= "*"
OPEN_PAREN ::= "("
CLOSE_PAREN ::= ")"
<Formula> ::= {
expression: <expression>
}
<expression> ::= <Addition> | <term>
<Addition> ::= {
left: <term>,
: PLUS,
right: <expression>
}
<term> ::= <Multiplication> | <Number>
<Multiplication> ::= {
left: <Number>,
: MULTIPLY,
right: <term>
}
<Number> ::= {
value: <- NUMBER
}
`);
// Parse input
const result = parser.parse("2+3*4");
console.log(result);Grammar Syntax
The high-level DSL for defining grammars uses BNF grammar syntax as a starting point, and extends this with syntax for automatic construction of AST nodes and lists.
Token Definitions
Define tokens using the ::= operator:
TOKEN_NAME ::= "string_literal" // Exact string match
TOKEN_NAME ::= /regex_pattern/ // Regular expressionExamples:
SEMICOLON ::= ";"
QUOTE ::= "\""
IDENTIFIER ::= /[a-zA-Z_][a-zA-Z0-9_]*/
NUMBER ::= /\d+(\.\d+)?/
WHITESPACE ::= /\s+/Helper aliases
Non-terminal helper abstractions can be defined using angle brackets:
<foo> ::= FOONote: Alias names must start with a lowercase letter to avoid collisions with AST node names
Sequences
Within non-terminal rules, patterns can be chained by separating them with spaces:
<parenthesized_expression> ::= OPEN_PAREN <expression> CLOSE_PARENChoice Syntax
Use | for alternatives:
<expression> ::= <number> | <string> | <identifier>Empty Productions
Use "" for empty productions (useful for defining optional rules):
<optional_type> ::= <type> | ""List Syntax
Use [item, separator] for delimited lists:
<argument_list> ::= [<expression>, COMMA]
<statement_list> ::= [<statement>, SEMICOLON]The matched patterns will be parsed as arrays, omitting the separators from the parsed value.
AST Node Definitions
Define AST nodes using angle brackets and struct syntax:
<NodeName> ::= {
property_name: <- TOKEN_NAME, // String property
other_property: <OtherNode>, // Nested node
: IGNORED_TOKEN // Consumed but not stored
}Note: AST node names must start with a capital letter
Node Property Syntax
property: TOKEN- Store token in propertyproperty: <- TOKEN- Store token's text content in property: TOKEN- Match token and store raw token object in propertyproperty: <Node>- Store nested node in property
All grammars must define at least one AST node type. The first AST node declaration will be used as the root node of the overall AST.
API Reference
Top-level Functions
syntax(grammarDefinition: string)
Creates a parser from a BNF-style grammar definition string.
Returns: SyntaxParser object with:
parse(input: string)- Parse input string into an ASTtokens- Token factory functionsnodes- Node factory functionsgrammar- Underlying grammar object
Automatic Type Generation
Use the syntax-codegen CLI tool to generate TypeScript declarations for your grammar files:
npx @timkendrick/syntax-codegen ./src/**/*.grammar.tsThis will locate all syntax definitions within the provided source files, and generates *.generated.d.ts files containing complete type definitions for the custom grammar.
Type Inference Helpers
import type {
InferSyntaxNodeType,
InferSyntaxTokenType,
InferSyntaxRootNode,
InferSyntaxNodeTypeLookup,
InferSyntaxTokenTypeLookup
} from '@timkendrick/syntax';
// Generate a parser via the high-level DSL
const parser = syntax("[...language grammar]")
// Extract AST types from parser (relies on codegen)
type MyNode = InferSyntaxNodeType<typeof parser>; // Union of all node types
type MyToken = InferSyntaxTokenType<typeof parser>; // Union of all token types
type MyRootNode = InferSyntaxRootNode<typeof parser>; // AST root node type
// Look up individual node types by name
type AstNodes = InferSyntaxNodeTypeLookup<typeof parser>;
type ExpressionNode = AstNodes["Expression"];createGrammar(definition)
For advanced use cases, create grammars programmatically using parser combinators directly instead of the BNF-style DSL. This provides full control over parser construction.
import { createGrammar } from '@timkendrick/syntax';
const grammar = createGrammar({
tokens: {
NUMBER: /\d+/,
PLUS: '+',
MULTIPLY: '*'
},
rules: (tokens) => ({
Expression: (rules) => choice(
rules.Addition,
rules.Number
),
Addition: (rules) => struct(
field('left', rules.Number),
field(null, token(tokens.PLUS)),
field('right', rules.Expression)
),
Number: (rules) => struct(
field('value', text(token(tokens.NUMBER)))
),
})
});Parser Combinators
The library provides low-level combinators for building grammars programmatically:
Primitive Combinators
token<T>(type: T): ParserRule<Token<T>>- Match specific token typeempty(): ParserRule<null>- Always succeeds, consumes nothing
Sequencing
sequence<T>([...rules]): ParserRule<T>- All rules must succeed in orderchoice<T>(...rules): ParserRule<T>- First successful rule wins
Repetition
optional<T>(rule): ParserRule<T | null>- Zero or one matches (never fails)zeroOrMore<T>(rule): ParserRule<T[]>- Zero or more matchesoneOrMore<T>(rule): ParserRule<T[]>- One or more matches
Lists
list<T, S>(itemRule, separatorRule): ParserRule<T[]>- Separated list parsing
Transformation
map<T, U>(rule, fn): ParserRule<U>- Transform parse resulttext(rule): ParserRule<string>- Extract source text from tokensstruct(...fields): ParserRule<object>- Build structured objects
Node Creation
node<T>(type, rule): ParserRule<AstNode<T>>- Create AST nodefield<T>(key, rule): FieldDescriptor<T>- Named field for struct
Custom Combinators
For complex patterns that cannot be expressed using the existing parser combinators, you can create a custom rule parser.
The rule factory will be passed an object containing all the rules in the grammar, allowing this rule to invoke others (or itself, recursively).
The rule factory returns a ParserRule function, which accepts the parser state and a set of helpers for reading input tokens.
const expressionPair = (rules) => (state, helpers) => {
const { readToken, eof } = helpers;
// Keep track of the parser state and any consumed tokens
let currentState = state;
const tokens = [];
// Parse the left expression
const leftResult = rules.expression(currentState, helpers);
if (leftResult.type === ResultType.Error) return leftResult;
const { value: leftValue, tokens: leftTokens, ...leftState } = leftResult.value;
currentState = { ...leftState, tokens: [...currentState.tokens, ...leftTokens] };
// Parse the separator token
const separatorToken = readToken(currentState.index);
if (separatorToken === null || separatorToken.type !== 'COMMA') {
return {
type: ResultType.Error,
error: new ParseError('Expected comma separator', input, separatorToken?.location ?? eof)
};
}
currentState.currentIndex++;
tokens.push(separatorToken);
// Parse the right expression
const rightResult = rules.expression(currentState, helpers);
if (rightResult.type === ResultType.Error) return rightResult;
const { value: rightValue, tokens: rightTokens, ...rightState } = rightResult.value;
currentState = { ...rightState, tokens: [...currentState.tokens, ...rightTokens] };
// Return the collected value and latest parser state
return {
type: ResultType.Success,
value: {
...currentState,
value: { left: leftValue, right: rightValue },
tokens
}
};
};See ARCHITECTURE.md for more details on how parser rules are implemented.
Custom Token Parsers
For tokens that cannot be expressed as strings or regular expressions, you can create a custom parser:
import { createTokenParser } from '@timkendrick/syntax';
const customToken = createTokenParser((state) => {
// Custom parsing logic
// Return new state or null
});Grammar Extension
import { extendGrammar } from '@timkendrick/syntax';
const extendedGrammar = extendGrammar(baseGrammar, {
NewNodeType: (rules) => struct(
field('property', token('TOKEN'))
)
});Error Handling
The parser provides detailed error information:
try {
const result = parser.parse(input);
console.log(result);
} catch (error) {
if (error instanceof ParserRuleError) {
console.log(`Parse error at position ${error.location.start}: ${error.message}`);
}
}Tips for writing grammars
- The parser uses recursive descent with backtracking
- Choice combinators try alternatives in order
- Consider left-recursion elimination for better performance (and avoiding infinite loops)
- Use specific token types early in choice alternatives
Code Organization
Key Files
src/grammar.ts: Core types and grammar creationsrc/parser.ts: Parsing engine and result handlingsrc/combinators/: Individual combinator implementationssrc/syntax.ts: Public high-level API and DSL
Architecture
For detailed information about the library's internal design and implementation, see ARCHITECTURE.md.
