@elasticpath/epcc-search-ast-generator-js
v1.8.0
Published
This project contains all the logic for processing [filtering in EPCC](https://documentation.elasticpath.com/commerce-cloud/docs/api/basics/filtering.html). This document provides an introduction into how this conceptually works.
Keywords
Readme
Filtering Introduction.
This project contains all the logic for processing filtering in EPCC. This document provides an introduction into how this conceptually works.
A problem we often have is that we have a string, and want to classify it as being valid or invalid. For instance here are some things you might care about strings:
- Does a string start with a capital letter.
- Is a string a valid email address.
- Is a string a palindrome (written the same forwards and backwards like redivider, civic, radar, level)
- Is a string a valid Java program.
There are different techniques that can be used to check the above, including writing code directly and using a regular expression. But some things are more complicated, like case #3 and #4.
In the case of the last one, you would likely define a grammar which is a set of rules that say what a valid string looks like. A common technique is to proceed in two phases:
- Lexing - Converting the string into chunks of things called tokens, each token representing a some part of the string (e.g., in a java program you might have a token for each keyword). More details are in the
lexer.jsfile - Parsing - Write rules for checking the output of the lexer. We use a recursive decent parser, which means that each grammar rule calls a function that checks for a match and may recursively call other functions.
The grammar that we are attempting to parse in Augmented Backus-Naur Form (https://www.rfc-editor.org/rfc/rfc5234) is the following:
(Some of the way this is structured is to make the code easier to read)
root = or_expression eof
# Or support
or_expression = and_expression *( "|" and_expression )
# And support
and_expression = group_expression *( ":" group_expression )
and_expression =/ group_expression *( "," group_expression ) // Legacy chaining operator
# Parenthesis support
group_expression =/ search_term
group_expression =/ "(" or_expression ")"
search_term = binary_term
search_term =/ unary_term
search_term =/ vararg_term
binary_term = binary_operator "(" literal "," literal ")"
unary_term = unary_operator "(" literal ")"
# The code slightly deviates from this to handle weird cases for backwards compatability reasons, I hate
vararg_term = vararg_operator "(" literal "," literal *( "," literal ) ")"
# Everything below is largely handled as a function of the lexer
binary_operator = "eq" / "like" / "gt" / "ge " / "lt" / "le" / "text"
unary_operator = "is_null"
vararg_operator = "in"
; these literal represent standard unencoded literals.
literal = 1*(ALPHA / DIGIT / "$" / "-" / "_" / "*" / "." / " " / "{" / "}" / "@" / "|")
; we treat binary, unary & vararg operators as literals in cases where people want to do something like eq(foo, ne).
literal =/ binary_operator
literal =/ unary_operator
literal =/ vararg_operator
; 0x22 is a quotation mark
literal =/ 0x22 *string_characters 0x22
; there is probably a better way to write this in ABNF involving hex like 0x01-0x21 0x23-0x5B 0x5D-0x7E
string_character = any_character but not " or \
string_character =/ escape_sequence
escape_sequence = "\" 0x22A more interested reader is encouraged to read: Domain-Specific Languages, by Fowler, the two techniques we use here are: (Regex Table Lexer p.239-244, Recursive Descent Parser p.245-255).
Released Package
https://www.npmjs.com/package/@elasticpath/epcc-search-ast-generator-js
