antlr-denter-js
v0.1.2
Published
Python-like indentation tokens for ANTLR4 JavaScript runtime
Maintainers
Readme
ANTLR-Denter: Python-like indentation tokens for ANTLR4 JavaScript Runtime
This project adds INDENT and DEDENT tokens for autogenerated ANTLR4 parsers for Python-like scopes. This defines a DenterHelper that can be added to an ANTLR4 grammar.
This is a JavaScript port of the original ANTLR-Denter project, adapted for use with the ANTLR4 JavaScript runtime.
Overview
This is a plugin that is spliced into an ANTLR grammar's lexer, and allows that lexer to make use of INDENT and DEDENT to represent Python-like scope entry and termination.
Features
Using INDENT and DEDENT tokens in a parser
When DenterHelper injects DEDENT tokens, it will prefix any string of them with a single NL. A single NL is also inserted before the EOF token if there are no DEDENTs to insert (that is, if the last line of the source file is not indented). A NL is not inserted before an INDENT, since indents always imply a newline before them (and thus make the newline token meaningless).
For example, given this input:
hello
world
universe
dollyWould be parsed as:
"hello"
INDENT
"world"
INDENT
"universe"
NL
DEDENT
DEDENT
"dolly"
NL
<eof>This approach lets you define expressions, single-line statements, and block statements naturally.
- Expressions in your parser grammar should not end in newlines. This makes compound expressions work naturally.
- Single-line statements in your grammar should end in newlines. For example, an assignment expression might be
identifier '=' expression NL. - Blocks are bookended by INDENT and DEDENT, without mentioning extra newlines:
block: INDENT statement+ DEDENT.- You should not include a newline before the INDENT
- An
ifwould be something likeif expression ':' block. (Note the lack ofNLafter the:.)
In the example above, universe and dolly represent simple expressions, and you can imagine that the grammar would contain something like statement: expression NL | helloBlock;.
Handling and asserting indentation
The DenterHelper processor asserts correct indentation on DEDENT. Take the following example:
someStatement()
if foo():
if bar():
fooAndBar()
bogusLine()bogusLine() does not dedent to the indentation of any valid scope - lacking indentation to qualify as part of the if foo():'s scope and too indented to share a scope with someStatement(). In Python this is expressed as an IndentationError.
The DenterHelper processor handles this by inserting two tokens: a DEDENT followed immediately by an INDENT (the total sequence here would actually be two DEDENTs followed by an INDENT, since bogusLine() is twice-dedented from fooAndBar()). The rationale is that the line has dedented to its parent, and then indented.
As a consequence, the DenterHelper processor will also assert correct indentation for all lines where an INDENT is not expected. Take the following example in a Python-like grammar of two method calls:
someStatement()
bogusLine()This would be illegal due to no INDENTs being expected after someStatement().
Installation
npm install antlr-denter-jsUsage
In an ANTLR grammar definition MyGrammar.g4, use the following:
tokens { INDENT, DEDENT }
@lexer::header {
import { DenterHelper } from 'antlr-denter-js';
}
@lexer::members {
this.denter = DenterHelper.builder()
.nl(SimpleCalcLexer.NL)
.indent(SimpleCalcLexer.INDENT)
.dedent(SimpleCalcLexer.DEDENT)
.pullToken(() => super.nextToken());
this.nextToken = () => this.denter.nextToken();
}
NL: ('\r'? '\n' ' '*); // For tabs just switch out ' '* with '\t'*Note: The exact syntax for @lexer::header and @lexer::members may vary depending on your ANTLR4 JavaScript target version. Adjust accordingly.
Example
See the example/ directory for a complete working example with a simple calculator grammar that uses indentation.
API Reference
DenterHelper
The main class that handles indentation processing.
Static Methods
DenterHelper.builder(): Returns a new builder instance for creating a DenterHelper.
Instance Methods
nextToken(): Returns the next token, handling indentation as needed.getOptions(): Returns a DenterOptions instance for configuring behavior.
DenterOptions
Options for configuring DenterHelper behavior.
Methods
ignoreEof(): Don't do any special handling for EOFs; they'll just be passed through normally. This is useful when the lexer will be used to parse rules that are within a line, such as expressions.
Builder Pattern
Use the builder pattern to create a DenterHelper instance:
const denter = DenterHelper.builder()
.nl(NL_TOKEN_TYPE)
.indent(INDENT_TOKEN_TYPE)
.dedent(DEDENT_TOKEN_TYPE)
.pullToken(pullTokenFunction);License
MIT License - see the LICENSE file for details.
Acknowledgements
Many thanks to yshavit for developing the original ANTLR-Denter project, which this JavaScript port is based on.
Related Work
- The original ANTLR-Denter for Java.
- ANTLR4, the language toolkit.
- antlr-denter-cs for C#.
