sqlite3-parser
v0.7.1
Published
Parse SQLite query syntax using a LALR(1) parser derived from SQLite's parse.y grammar. 1.5x-200x faster than other JavaScript SQL parsers.
Maintainers
Readme
sqlite3-parser
Parse SQLite query syntax.
- Fast: 2x-200x faster than other JavaScript SQL parsers, see benchmarks.
- Light: Pure JavaScript, no WebAssembly overhead. Ships ~32 KB gzipped and runs unchanged in Node, Bun, and the browser.
- Faithful: The parser based on SQLite's
parse.ygrammar file using a patched version of the Lemon parser generator to emit TypeScript code. - Helpful: Improved error messages, extending the canonical
near "X": syntax errorwording with source location, a list of terminals that would have been accepted, and a grammar-aware hint for common mistakes (unclosed groups with a pointer at the opener, trailing commas, keywords-used-as-identifiers, FILTER-before-OVER, etc.).
Usage
Parse SQL script with multiple statements, or return an error:
import { parse } from "sqlite3-parser"
const result = parse(`
INSERT INTO users VALUES (1, 'Douglas');
SELECT id, name FROM users WHERE active = 1
`)
// result is a union of ParseOk and ParseErr
// type ParseOk = { status: "ok"; root: CmdList; ... }
// type ParseErr = { status: "error"; errors: readonly ParseDiagnostic[]; ... }
if (result.status === "ok") {
// Root is a CmdListNode containing all top-level statements
const { cmds } = result.root
console.log(cmds.length) // -> 2
console.log(cmds[0].type) // -> InsertStmt
console.log(cmds[1].type) // -> SelectStmt
} else {
throw "expected ok"
}Parse single statement at a time, or return an error:
import { parseStmt } from "sqlite3-parser"
const result = parseStmt("SELECT id, name FROM users WHERE active = 1")
if (result.status === "ok") {
// Root is a StmtNode containing the first top-level statement
console.log(result.root.type) // -> SelectStmt
}
// By default, parseStmt rejects trailing content (second statements, garbage tokens)
const result2 = parseStmt("SELECT id, name FROM users WHERE active = 1; SELECT 1")
if (result2.status === "error") {
console.error(result2.errors.join("\n"))
}
// Use `allowTrailing: true` to incrementally parse a multi-statement script
function* parseEach(sql: string) {
while (sql) {
const result = parseStmt(sql, { allowTrailing: true })
yield result
if (result.status === "error") {
return
}
sql = sql.slice(result.tail)
}
}Parse or throw an error:
import { parseOrThrow, parseStmtOrThrow } from "sqlite3-parser"
const { root: cmds } = parseOrThrow("SELECT id, name FROM users WHERE active = 1; SELECT 1")
console.log(cmds.type) // -> CmdList
const { root: stmt } = parseStmtOrThrow("SELECT id, name FROM users WHERE active = 1")
console.log(stmt.type) // -> SelectStmtErrors
Parse failures are modeled as "diagnostics". These are not sub-classes of Error, so constructing them is cheap since no stack trace is captured.
type ParseDiagnostic = {
/** Error message */
readonly message: string
/** Location of the error */
readonly span: Span
/** Optional: the token that caused the error */
readonly token?: Token
/** Optional: additional hints about the error (eg, solutions) */
readonly hints?: readonly DiagnosticHint[]
/** Optional: the filename used in error messages, if provided at parse time */
readonly filename?: string
/** Format the diagnostic as a string with source code citations rendered as code blocks */
format(): string
/** Alias of {@link format} */
toString(): string
}
interface DiagnosticHint {
/** Hint message */
readonly message: string
/** Optional: the source location the hint is referring to */
readonly span?: Span
}parse and parseStmt return an array of structured diagnostics when status === "error". Because diagnostics implement toString, you can use errors.join("\n") to get a ready-to-print block with source code and carets.
const result = parse("SELECT FROM users")
if (result.status === "error") {
for (const err of result.errors) {
// access fields:
// err.message → "near \"FROM\": syntax error"
// err.span → { offset, length, line, col }
// err.token → the offending Token, when available
// err.hints → readonly { message, span? }[]
}
// Same as result.errors.map(e => e.format()).join("\n")
console.error(result.errors.join("\n")) // -v
// near "FROM": syntax error
//
// At 1:7:
// 1│ SELECT FROM users
// │ ^^^^
//
// hint: expected a result expression before FROM
} else {
throw "expected error"
}parseOrThrow and parseStmtOrThrow throw a Sqlite3ParserDiagnosticError with the diagnostics formatted as the error message.
import { parseOrThrow, Sqlite3ParserDiagnosticError } from "sqlite3-parser"
try {
parseOrThrow("SELECT FROM users")
} catch (e) {
if (e instanceof Sqlite3ParserDiagnosticError) {
console.error(e.errors.length) // -> 1
console.error(e.errors[0].message) // -> "near \"FROM\": syntax error"
console.error(e.message) // -v
// near "FROM": syntax error
//
// At 1:7:
// 1│ SELECT FROM users
// │ ^^^^
//
// hint: expected a result expression before FROM
} else {
throw e
}
}AST Viewer
Use the included sqlite3-parser command-line tool to quicly view AST of queries as JSON (default) or as S-expressions with --pretty:
$ sqlite3-parser 'select 1' --pretty
(CmdList
(SelectStmt
(Select
(SelectFrom
(ExprResultColumn
(NumericLiteral :value "1"))))))$ sqlite3-parser --pretty "SELECT 1"You can chain with the sqllogictest-parser tool to extract queries from a sqllogictest file for parsing:
$ sqllogictest-parser test/examples/expr.sqllogictest --sql --idx 4:5 |
sqlite3-parser --pretty
(CmdList
(SelectStmt
(Select
(SelectFrom
(ExprResultColumn
(FunctionCallExpr :distinctness "Distinct"
(Id :name "group_concat")
(Id :name "d")
(SortListFunctionCallOrder
(SortedColumn
(Id :name "a")))))
(FromClause
(TableSelectTable
(QualifiedName
(Name :text "t1")))))))
(SelectStmt
(Select
(SelectFrom
(ExprResultColumn
(FunctionCallExpr
(Id :name "percentile_cont")
(NumericLiteral :value "0.5")
(WithinGroupFunctionCallOrder
(Id :name "a"))))
(FromClause
(TableSelectTable
(QualifiedName
(Name :text "t1"))))))))There are many example queries in the sqlite3-parser test suite:
- SQL:
test/examples/expr.sqllogictest - Generated AST:
test/examples/__snapshots__/expr.generated.test.ts.snap
AST Traversal
traverse exposes an ESTree-style walker. A depth-first walk fires enter on every node, recurses into child slots in their surface-syntax order, then fires leave:
import { parseOrThrow, traverse } from "sqlite3-parser"
const { root } = parseOrThrow("SELECT a, b FROM t")
const tables: string[] = []
traverse(root, {
// All args optional.
enter(node, parent) {
// Runs before `nodes` handler on every node.
},
nodes: {
// Runs before visiting children.
TableSelectTable(node, _parent) {
// TableSelectTable.tblName → QualifiedName.objName → Name.text
tables.push(node.tblName.objName.text)
// From any handler:
// return "skip" to stop descending into the current node
// return "break" to halt the whole walk
// return "continue" / nothing to proceed
return "skip"
},
},
leave(node, parent) {
// Runs after visiting children.
},
keys: {
// Override traversal order & keys for a specific node type.
Select: ["with", "select", "compounds"],
},
})
console.log(tables) // -> ["t"]A few things worth knowing:
- Per-type handlers. Instead of a single
enterwith a big switch, passnodes: { SelectStmt(node) { … }, BinaryExpr(node, parent) { … }, … }. Each handler is strongly typed to that node type. - Visit control. Callbacks may return
"skip"to stop descending into the current node (but still fireleave),"break"to halt the whole walk, or nothing /"continue"to proceed. - Visitor keys.
VisitorKeysis the per-type list of child-bearing property names in traversal order. You can passkeys: { CaseExpr: ["base", "elseExpr"] }to override traversal for specific types per call. - Parent tracking. Callbacks receive
(node, parent)where parent is the enclosing AST node, orundefinedat the root.
The type AstNodeMap maps every type discriminator string to its interface. traverse uses that map to type per-type handlers; user code can too — e.g. type NodeOfType<T extends keyof AstNodeMap> = AstNodeMap[T].
Benchmarks
tl;dr:
~2.5x faster than liteparser (wasm)
~6x faster than @guanmingchiu/sqlparser-ts (wasm)
~10x faster than node-sql-parser
~100x faster than pgsql-ast-parser
~200x faster than sqlite-parser
~250x faster than @appland/sql-parser- Results are averages from
bun run bench:compareon one machine:- CPU: Apple M4 Max
- Runtime: Bun 1.3.11 (
arm64-darwin)
- The compared parsers do not produce the same AST shape.
- WebAssembly libraries pay some bridging overhead.
Cases
| Case | Description |
| -------- | ----------------------------------------------------------------------------------------------- |
| tiny | Single SELECT 1; statement. |
| small | Single CREATE TABLE users (...) statement. |
| medium | WITH query with joins, aggregate, window function, FILTER, grouping, ordering, and LIMIT. |
| large | Wide CREATE TABLE analytics_events (...) statement with 96 metric columns. |
| deep | Nested expressions with a subquery. |
| broken | Invalid input with a trailing comma before FROM; used for the syntax-error path. |
Results
Avg per-iteration parse time across the five inputs, compared to ours.
| Parser | tiny | small | medium | large | deep |
| ----------------------------------- | ------------------ | ------------------ | ------------------ | ------------------ | ----------------- |
| Ours | 0.93 µs | 3.12 µs | 19.90 µs | 61.15 µs | 9.80 µs |
| liteparser (wasm) | 2.05 µs (2.2×) | 4.92 µs (1.6×) | 49.88 µs (2.5×) | 86.14 µs (1.4×) | 31.71 µs (3.2×) |
| @guanmingchiu/sqlparser-ts (wasm) | 5.97 µs (6.4×) | 14.42 µs (4.6×) | 143.50 µs (7.2×) | 186.30 µs (3.0×) | 71.35 µs (7.3×) |
| node-sql-parser | 10.83 µs (12×) | 27.53 µs (8.8×) | 290.78 µs (15×) | 567.35 µs (9.3×) | 1.12 ms (114×) |
| pgsql-ast-parser | 53.05 µs (57×) | 61.80 µs (20×) | 2.73 ms (137×) | 925.68 µs (15×) | 1.05 ms (108×) |
| sqlite-parser | 472.37 µs (509×) | 614.04 µs (197×) | 5.59 ms (281×) | 7.09 ms (116×) | 3.30 ms (337×) |
| @appland/sql-parser | 561.18 µs (605×) | 715.16 µs (229×) | 6.81 ms (342×) | 7.99 ms (131×) | 3.73 ms (380×) |
Libraries compared:
- Ours - TypeScript parser derived from SQLite's parse.y grammar, uses Lemon LALR(1) parser generator to dump parser tables as JSON, which is used to drive a small TypeScript emitter.
@sqliteai/liteparser— C/WebAssembly parser extracted from the original SQLite parser (parse.y) with minimal modifications. It uses the same Lemon LALR(1) parser generator and a hand-written tokenizer derived from SQLite's own lexer.sqlite-parser— PEG.js-generated pure-JS parser for SQLite (Code School, 2015-2017).@appland/sql-parser— AppLand's maintained fork ofsqlite-parserwith additional grammar coverage.node-sql-parser— PEG.js-derived multi-dialect parser; run here withdatabase: "sqlite".pgsql-ast-parser— nearley + moo Postgres grammar.@guanmingchiu/sqlparser-ts— Rust/WebAssembly wrapper arounddatafusion-sqlparser-rs; run with its"sqlite"dialect.
Lemon JSON
Lemon is a LALR(1) parser generator similar to Yacc, written by SQLite's author, Dr. Richard Hipp. SQLite's own parser is generated with Lemon.
Our patched lemon.c can emit a JSON dump of the parser tables, making Lemon usable for any programming language.
- The dumped parse tables are described by a JSON Schema at generated/json-schema/v1/parser.dev.schema.json.
- Our parse.y grammar file: vendor/patched/3.53.0/src/parse.y.
- Example dumped parse tables: generated/3.53.0/parser.dev.json.
- Our TypeScript emitter script: scripts/emit-ts-parser.ts.
- The output TypeScript parser: generated/3.53.0/parse.ts.
To implement a SQLite parser in another language you need roughly:
- A lexer. Eg our
src/tokenize.ts, SQLite's src/tokenize.c — ~1000 LoC. - An LALR(1) driver. Eg our
src/lempar.ts, SQLite's tool/lempar.c — ~400 LoC. Consumes the tables inparser.dev.json, emits shift/reduce events. - parse.y, updated for your AST & language. Eg our parse.y,SQLite's parse.y. Start with one and replace action bodies with your own.
Contributing
├── src/ runtime (TypeScript)
│ ├── lempar.ts LALR(1) engine — 1:1 port of sqlite/tool/lempar.c
│ ├── parser.ts AST-building parser layered on lempar
│ ├── tokenize.ts lexer — port of sqlite/src/tokenize.c
│ ├── util.ts pure helpers from sqlite/src/util.c, string unquoting
│ ├── diagnostics.ts Diagnostics, hints
│ ├── errors.ts Error subclasses
│ ├── ast/ AST definitions + helpers
│ │ ├── nodes.ts every node interface, plus `AstNode` / `AstNodeMap`
│ │ ├── parseActions.ts node constructors invoked from grammar actions
│ │ ├── parseState.ts per-parse mutable state threaded to the reducer
│ │ └── traverse.ts ESTree-style AST walker + helpers
│ ├── cli/ shared runner used by the bin/ entry points
│ └── sqllogictest/ sqllogictest reader + test emitter (sqlite3-parser/sqllogictest)
├── generated/ codegen + tables (checked into git)
│ ├── current.ts re-export shim for the default version
│ ├── template/ per-version source templates (e.g. `index.ts`) consumed by codegen
│ ├── <version>/ one directory per tracked SQLite release
│ │ ├── index.ts per-version TS wrapper (codegen)
│ │ ├── parse.ts LALR tables + per-rule reducer (codegen)
│ │ ├── keywords.ts keyword table (codegen)
│ │ ├── parser.dev.json full grammar dump (debug/introspection)
│ │ └── keywords.{dev,prod}.json
│ └── json-schema/v1/ formal JSON Schemas for the dumps
├── vendor/ vendored SQLite sources + patches
│ ├── manifest.json which versions we track, their hashes & commits
│ ├── upstream/<version>/ pristine SQLite sources
│ ├── patched/<version>/ our patched lemon.c + mkkeywordhash.c + parse.y
│ └── README.md details on the vendor tree
├── test/ test suites (see Testing, below)
│ ├── examples/ curated SQL corpora (`.sqllogictest`) + generated AST snapshot tests
│ ├── sqlite/ tokenizer, lexer, literal, and keyword tests ported from SQLite
│ └── sqllogictest/ large external sqllogictest corpora (evidence, index, random)
├── scripts/ build, codegen, bench, and profiling tooling
├── bin/ standalone CLI entry points
└── Makefile pattern rules for the per-version build graphDevelopment commands:
| Command | Purpose |
| ------------------------ | ------------------------------------------------------------------------------------------------ |
| bun run test | Run the full test suite. |
| bun run typecheck | tsc --noEmit over source + tests + scripts. |
| bun run lint | Run oxlint. |
| bun run fmt | Run oxfmt (formatter). |
| bun run coverage | Run the suite with lcov coverage, then bun run coverage:rules for the grammar rule-hit report. |
| bun run build | Full build: build:gen then build:dist. |
| bun run build:gen | Codegen only — alias for make gen (regenerate everything under generated/). |
| bun run build:dist | Dist only — tsc -p tsconfig.build.json emits a 1:1 JS mirror of the source tree into dist/. |
| bun run bench | Run the internal benchmarks (tokenize, parse, error path). :node runs under Node. |
| bun run bench:compare | Compare performance against other libraries. :node runs under Node. |
| bun run bench:dist | Run bench against the built dist/ (rather than source). :compare / :node variants exist. |
| bun run prof | CPU-profile the parser hot loop. :node runs under Node. |
| bun run prof:report | Render the most recent bun run prof run into a text report. |
| bun run dist-test | Run the external-consumer smoke tests against the built dist/ (use :build to rebuild first). |
| bun run vendor <ref> | Onboard a new SQLite version (see below). |
| make gen | Regenerate all per-version outputs under generated/ (same as build:gen). |
| make generated/<ver>/… | Regenerate a specific output without re-running the whole vendor flow. |
| make help | List every pattern target Make knows about. |
Anything touching the vendored SQLite sources is explained in vendor/README.md.
Testing
bun run test runs the full suite. Tests are organized into three clusters:
test/examples/— curated SQL corpora (.sqllogictest) paired with generated AST snapshot tests. Good place to add regression coverage for a new grammar case.test/sqlite/— tokenizer, lexer, literal, and keyword tests ported from SQLite's own test vectors.test/sqllogictest/— large external sqllogictest corpora for broad acceptance coverage.
Adding new SQLite versions
bun run vendor 3.55.0This:
- Adds a submodule at
vendor/submodule/3.55.0/pointing at SQLite's GitHub mirror, checked out at tagversion-3.55.0. - Copies
tool/{lemon.c, mkkeywordhash.c, lempar.c}andsrc/parse.yintovendor/upstream/3.55.0/. - 3-way merges our patches against the most recent previous version's patched + upstream, writing the result to
vendor/patched/3.55.0/. On conflict the script leaves<<<<<<<markers in the patched file and exits non-zero; resolve by hand and re-run. - Rebuilds the tools (
build/lemon-3.55.0,build/mkkeywordhash-3.55.0) and runs them to producegenerated/3.55.0/{parse.ts, parser.dev.json, keywords.{dev,prod}.json}, validated against the JSON Schemas. - Regenerates
generated/3.55.0/index.tsandgenerated/current.ts. The new version becomes the current one invendor/manifest.json; old versions are preserved forever.
After the script reports success, review the merged patches, run bun test, and commit. The script never auto-commits — it's easy to bury a silent regression otherwise.
Flags: --no-submodule --from <path> bring your own SQLite source tree (useful for building against an unreleased development snapshot), --commit <sha> overrides the commit hash recorded in the manifest.
Releases
Every push to main is automatically published to npm by CI via GitHub OIDC trusted publishing.
- New
package.jsonversions publish as "latest" dist-tag; default fornpm install(stable) - Existing versions publish as
${version}-next.<sha>to the "next" dist-tag; usenpm install <pkg>@nextto install (preview) - See all published versions
Stability
- AST shape may change on minor versions before v1.0.0.
