@hollowsolve/match
v1.1.0
Published
A pattern matching language that replaces regular expressions
Downloads
5
Maintainers
Readme
Match
A pattern matching language that replaces regular expressions. Website · Docs · Playground
Free for personal, educational, and open-source use. Commercial use requires a one-time license.
Install
npm install @hollowsolve/matchESM and CommonJS. Node 18+.
Quick start
import { run, formatTree, formatFailure } from '@hollowsolve/match'
const grammar = `
key: one or more letters
value: one or more digits
pair: key then equals then value
`
const ok = run(grammar, 'name=42')
console.log(formatTree(ok.tree))
// pair [0..7]
// ├── key [0..4] "name"
// └── value [5..7] "42"
const bad = run(grammar, 'name=abc')
console.log(formatFailure(bad))
// match failed at byte 5 (line 1, column 6):
// expected: digit
// found: "a" (0x61)
// in: pair > valueFind all matches in a string:
import { parse, find } from '@hollowsolve/match'
const program = parse('main: one or more digits')
find(program, 'port 8080 and port 443')
// [{ start: 5, end: 9, text: "8080" }, { start: 20, end: 23, text: "443" }]Compile once, match many:
import { parse, match } from '@hollowsolve/match'
const program = parse('main: 4 digits then hyphen then 2 digits then hyphen then 2 digits')
match(program, '2025-01-15') // matched
match(program, '25-1-5') // failedThe language
Characters have names
No escape sequences. Anywhere. Ever.
semicolon -- ;
double quote -- "
backslash -- \
newline -- line feed
space -- space
dot -- .
dash -- -
bang -- !Text blocks
Literal strings:
"hello world"
"http://"
"SELECT * FROM"Single-character quoted strings ("a", "Z", "7") match exactly that character and can be used in ranges.
To match strings containing literal double quotes, use the double quote named character:
-- matches: "hello"
main: double quote then "hello" then double quote
-- matches: she said "hi"
main: "she said " then double quote then "hi" then double quoteCharacter sets
any of (letter, digit, underscore)
any of (printable except (double quote, backslash), tab)
none of (double quote, newline)any of matches one character from the set. none of matches one character not in the set.
Combining rules
key then equals then value -- sequence
key, equals, value -- comma shorthand (same thing)
token or quoted value -- first match wins (PEG ordered choice)
param joined by semicolon -- separated list
param joined by semicolon lenient -- trailing separator okRepetition
Prefix form (preferred):
one or more digits
zero or more letters
4 digits
between 2 and 10 letters
optional hyphenInfix form:
one digit or moreShorthands for repeated sets:
one or more of (letter, digit, underscore) -- repeated any of
one or more characters except (double quote, newline) -- repeated none ofPlural class names work with repetition: digits, letters, hex digits, alphanumerics, word characters, any characters.
Negation
any character isn't newline -- any char except newline
(printable isn't "—") one or more -- printable chars until —
digit isn't "0" -- digit that isn't zeroConsume until a terminator
until consumes input until a terminator pattern is found. The terminator can be any pattern.
any character until including newline -- consumes the newline
any character until excluding "END" -- stops before END
any character until including (digit then digit) -- two consecutive digits
any character until excluding closing tag -- rule reference as terminatoruntil vs none of: until consumes a run of characters up to a boundary. none of matches a single character not in a set. Use until when your terminator is multi-character or a pattern. Use none of when you need a character-set negation.
Extract
Tag rules with extract to pull matched text into result.extracted:
num: one or more digits
main: "value=" then extract num
-- result.extracted[0].text === "42"extract accepts prefix repetition directly:
extract digit -- single atom
extract num -- rule reference
extract one or more digits -- prefix repetition
extract (digit or letter) -- parenthesized compoundExtracts are collected left-to-right in sequences, iteration order in loops. Failed or branches contribute nothing. Sub-rule extracts bubble up. extracted[0], extracted[1] indexing is reliable.
Rules
Rules name patterns. The last rule is the entry point.
field: one or more characters except (comma, newline)
row: field joined by comma
csv: row joined by newlineMulti-word rule names are supported: token char, quoted value, hex pair.
Left recursion is detected at parse time and rejected.
Modules
Import rules from other grammars with use:
use "email" (local, domain)
main: local then at then domainThe use statement imports named rules (and their dependencies) from a module. Modules are resolved at parse time via a resolve map:
import { run } from '@hollowsolve/match'
const emailGrammar = `
local: one or more letters
domain: one or more letters joined by period
`
const result = run(`
use "email" (local, domain)
main: local then at then domain
`, '[email protected]', {
resolve: { email: emailGrammar }
})Dependencies are auto-resolved — if local references another rule in the module, it gets pulled in too. Grammars without use work exactly as before.
Precedence
From tightest to loosest:
- Repetition (
one or more,zero or more,optional,N,between N and M) isn'tthen/,(sequence)joined byor(alternation)
So a then b or c then d means (a then b) or (c then d).
joined by binds the element as a full sequence and the separator as a full sequence.
API
All functions are named exports from @hollowsolve/match.
Core
run(source: string, input: string): MatchResultParse a grammar and match it against input in one call. Returns MatchSuccess or MatchFailure.
parse(source: string): MatchProgram
match(program: MatchProgram, input: string): MatchResultSeparate compilation from matching. parse compiles and validates a grammar. match runs a compiled grammar against input. Use this when matching the same grammar against many inputs.
Search
find(program: MatchProgram, input: string): FindMatch[]Find all non-overlapping matches of a pattern within a string. Returns an array of { start, end, text, tree }.
searchFile(program: MatchProgram, path: string, options?: SearchOptions): SearchResult
searchFolder(program: MatchProgram, path: string, options?: SearchOptions): SearchResult
searchFolderStream(program: MatchProgram, path: string, options?: SearchOptions): AsyncGenerator<LineMatch | SearchError>Line-oriented search. searchFolder is recursive and skips binary files, hidden dirs, and node_modules. searchFolderStream yields results as it walks instead of buffering.
Diagnostics
formatFailure(failure: MatchFailure, input?: string): string
formatTree(tree: RuleMatch): stringformatFailure produces a human-readable diagnostic with a source pointer. formatTree produces a tree visualization.
Fast path
compile(program: MatchProgram): CompiledProgram
fastMatch(cp: CompiledProgram, input: Uint8Array): numberBoolean-only matching. Returns bytes consumed on success, -1 on failure. Skips tree building entirely.
Partial parsing
tryParse(source: string, input: string): MatchSuccess | PartialResultLike run, but on failure returns a PartialResult with bytes_consumed, partial_tree, and extracted from the furthest-progressed branch. Intended for editor/IDE integration.
Results
Success
{
matched: true,
bytes_consumed: number,
tree: RuleMatch,
extracted: RuleMatch[]
}The tree is a full parse tree. Every rule produces a node:
{
rule: string, // rule name
start: number, // byte offset (inclusive)
end: number, // byte offset (exclusive)
text: string, // matched text
children: RuleMatch[]
}Failure
{
matched: false,
offset: number, // byte offset where the failure occurred
line: number, // 1-based line number
column: number, // 1-based column number
expected: string[], // patterns the parser expected
found: string, // what was actually there
rule_stack: string[] // rule call stack (outermost first)
}formatFailure(failure, input?) renders this as a human-readable string with a source pointer:
match failed at byte 47 (line 3, column 12):
expected: digit, hyphen, or end of input
found: "x" (0x78)
in: forwarded > element > param > value > token
...invalid=x;more
^CLI
npx match-search "pattern" in file path.log
npx match-search "pattern" in folder ./logs
npx match-search "pattern" in folder ./logs --glob "*.log"
npx match-search "pattern" in file app.log lines 100 to 200
cat server.log | npx match-search "pattern"Respects NO_COLOR. Disables color automatically when piped.
Characters reference
Every character has a name. No escape sequences exist.
Symbols:
exclamation (bang) ! · double quote " · hash # · dollar $ · percent % · ampersand & · single quote ' · open paren ( · close paren ) · asterisk * · plus + · comma , · hyphen (dash) - · period (dot) . · slash / · colon : · semicolon ; · less than < · equals = · greater than > · question ? · at @ · open bracket [ · backslash \ · close bracket ] · caret ^ · underscore _ · backtick ` · open brace { · pipe | · close brace } · tilde ~
Whitespace: space · tab · newline · carriage return
Other: null · byte 0xHH
Classes: letter (letters) · uppercase · lowercase · digit (digits) · hex digit (hex digits) · whitespace · visible · printable · alphanumeric (alphanumerics) · word character (word characters) · any character (any characters)
Quoted characters: "a" "Z" "7" — for ranges: "a" to "z" · "0" to "9" · byte 0x80 to byte 0xFF
Unicode.
any characterandnone ofconsume one UTF-8 codepoint (1-4 bytes).cafeis 4any charactermatches, not 5. All other classes (letter,digit, etc.) match ASCII bytes only.Byte ranges.
byte 0x80 to byte 0xFFoperates byte-by-byte. Mixing codepoint-aware constructs (any character,none of) with high byte ranges (>= 0x80) in the same rule is a compile error — split them into separate rules instead.
Stability
The following are stable public API as of v1.0:
MatchSuccess,MatchFailure,PartialResult,RuleMatch— field names, types, and semanticsformatFailureoutput format — structure and field layout- All exported function signatures
These will not change in backward-incompatible ways without a major version bump.
License
Free for personal, educational, and open-source use. Commercial use requires a one-time license at matchlang.com. See LICENSE.
