sqlite3-parser

v0.7.1

Published

7 days ago

Parse SQLite query syntax using a LALR(1) parser derived from SQLite's parse.y grammar. 1.5x-200x faster than other JavaScript SQL parsers.

0High
0Medium
0Low

jitl

sql sqlite parser ast lalr syntax tokenizer sqlite3 libsql turso lemon

sqlite3-parser

Parse SQLite query syntax.

Fast: 2x-200x faster than other JavaScript SQL parsers, see benchmarks.
Light: Pure JavaScript, no WebAssembly overhead. Ships ~32 KB gzipped and runs unchanged in Node, Bun, and the browser.
Faithful: The parser based on SQLite's parse.y grammar file using a patched version of the Lemon parser generator to emit TypeScript code.
Helpful: Improved error messages, extending the canonical near "X": syntax error wording with source location, a list of terminals that would have been accepted, and a grammar-aware hint for common mistakes (unclosed groups with a pointer at the opener, trailing commas, keywords-used-as-identifiers, FILTER-before-OVER, etc.).

Usage

Parse SQL script with multiple statements, or return an error:

import { parse } from "sqlite3-parser"

const result = parse(`
  INSERT INTO users VALUES (1, 'Douglas');
  SELECT id, name FROM users WHERE active = 1
`)

// result is a union of ParseOk and ParseErr
// type ParseOk = { status: "ok"; root: CmdList; ... }
// type ParseErr = { status: "error"; errors: readonly ParseDiagnostic[]; ... }
if (result.status === "ok") {
  // Root is a CmdListNode containing all top-level statements
  const { cmds } = result.root
  console.log(cmds.length) // -> 2
  console.log(cmds[0].type) // -> InsertStmt
  console.log(cmds[1].type) // -> SelectStmt
} else {
  throw "expected ok"
}

Parse single statement at a time, or return an error:

import { parseStmt } from "sqlite3-parser"

const result = parseStmt("SELECT id, name FROM users WHERE active = 1")
if (result.status === "ok") {
  // Root is a StmtNode containing the first top-level statement
  console.log(result.root.type) // -> SelectStmt
}

// By default, parseStmt rejects trailing content (second statements, garbage tokens)
const result2 = parseStmt("SELECT id, name FROM users WHERE active = 1; SELECT 1")
if (result2.status === "error") {
  console.error(result2.errors.join("\n"))
}

// Use `allowTrailing: true` to incrementally parse a multi-statement script
function* parseEach(sql: string) {
  while (sql) {
    const result = parseStmt(sql, { allowTrailing: true })
    yield result
    if (result.status === "error") {
      return
    }
    sql = sql.slice(result.tail)
  }
}

Parse or throw an error:

import { parseOrThrow, parseStmtOrThrow } from "sqlite3-parser"

const { root: cmds } = parseOrThrow("SELECT id, name FROM users WHERE active = 1; SELECT 1")
console.log(cmds.type) // -> CmdList
const { root: stmt } = parseStmtOrThrow("SELECT id, name FROM users WHERE active = 1")
console.log(stmt.type) // -> SelectStmt

Errors

Parse failures are modeled as "diagnostics". These are not sub-classes of Error, so constructing them is cheap since no stack trace is captured.

type ParseDiagnostic = {
  /** Error message */
  readonly message: string
  /** Location of the error */
  readonly span: Span
  /** Optional: the token that caused the error */
  readonly token?: Token
  /** Optional: additional hints about the error (eg, solutions)  */
  readonly hints?: readonly DiagnosticHint[]
  /** Optional: the filename used in error messages, if provided at parse time */
  readonly filename?: string
  /** Format the diagnostic as a string with source code citations rendered as code blocks */
  format(): string
  /** Alias of {@link format} */
  toString(): string
}

interface DiagnosticHint {
  /** Hint message */
  readonly message: string
  /** Optional: the source location the hint is referring to */
  readonly span?: Span
}

parse and parseStmt return an array of structured diagnostics when status === "error". Because diagnostics implement toString, you can use errors.join("\n") to get a ready-to-print block with source code and carets.

const result = parse("SELECT FROM users")
if (result.status === "error") {
  for (const err of result.errors) {
    // access fields:
    //   err.message    → "near \"FROM\": syntax error"
    //   err.span       → { offset, length, line, col }
    //   err.token      → the offending Token, when available
    //   err.hints      → readonly { message, span? }[]
  }

  // Same as result.errors.map(e => e.format()).join("\n")
  console.error(result.errors.join("\n")) // -v
  // near "FROM": syntax error
  //
  // At 1:7:
  //    1│ SELECT FROM users
  //     │        ^^^^
  //
  //   hint: expected a result expression before FROM
} else {
  throw "expected error"
}

parseOrThrow and parseStmtOrThrow throw a Sqlite3ParserDiagnosticError with the diagnostics formatted as the error message.

import { parseOrThrow, Sqlite3ParserDiagnosticError } from "sqlite3-parser"
try {
  parseOrThrow("SELECT FROM users")
} catch (e) {
  if (e instanceof Sqlite3ParserDiagnosticError) {
    console.error(e.errors.length) // -> 1
    console.error(e.errors[0].message) // -> "near \"FROM\": syntax error"
    console.error(e.message) // -v
    // near "FROM": syntax error
    //
    // At 1:7:
    //    1│ SELECT FROM users
    //     │        ^^^^
    //
    //   hint: expected a result expression before FROM
  } else {
    throw e
  }
}

AST Viewer

Use the included sqlite3-parser command-line tool to quicly view AST of queries as JSON (default) or as S-expressions with --pretty:

$ sqlite3-parser 'select 1' --pretty
(CmdList
  (SelectStmt
    (Select
      (SelectFrom
        (ExprResultColumn
          (NumericLiteral :value "1"))))))$ sqlite3-parser --pretty "SELECT 1"

You can chain with the sqllogictest-parser tool to extract queries from a sqllogictest file for parsing:

$ sqllogictest-parser test/examples/expr.sqllogictest --sql --idx 4:5 |
    sqlite3-parser --pretty
(CmdList
  (SelectStmt
    (Select
      (SelectFrom
        (ExprResultColumn
          (FunctionCallExpr :distinctness "Distinct"
            (Id :name "group_concat")
            (Id :name "d")
            (SortListFunctionCallOrder
              (SortedColumn
                (Id :name "a")))))
        (FromClause
          (TableSelectTable
            (QualifiedName
              (Name :text "t1")))))))
  (SelectStmt
    (Select
      (SelectFrom
        (ExprResultColumn
          (FunctionCallExpr
            (Id :name "percentile_cont")
            (NumericLiteral :value "0.5")
            (WithinGroupFunctionCallOrder
              (Id :name "a"))))
        (FromClause
          (TableSelectTable
            (QualifiedName
              (Name :text "t1"))))))))

There are many example queries in the sqlite3-parser test suite:

SQL: test/examples/expr.sqllogictest
Generated AST: test/examples/__snapshots__/expr.generated.test.ts.snap

AST Traversal

traverse exposes an ESTree-style walker. A depth-first walk fires enter on every node, recurses into child slots in their surface-syntax order, then fires leave:

import { parseOrThrow, traverse } from "sqlite3-parser"

const { root } = parseOrThrow("SELECT a, b FROM t")
const tables: string[] = []

traverse(root, {
  // All args optional.
  enter(node, parent) {
    // Runs before `nodes` handler on every node.
  },
  nodes: {
    // Runs before visiting children.
    TableSelectTable(node, _parent) {
      // TableSelectTable.tblName → QualifiedName.objName → Name.text
      tables.push(node.tblName.objName.text)
      // From any handler:
      //   return "skip" to stop descending into the current node
      //   return "break" to halt the whole walk
      //   return "continue" / nothing to proceed
      return "skip"
    },
  },
  leave(node, parent) {
    // Runs after visiting children.
  },
  keys: {
    // Override traversal order & keys for a specific node type.
    Select: ["with", "select", "compounds"],
  },
})

console.log(tables) // -> ["t"]

A few things worth knowing:

Per-type handlers. Instead of a single enter with a big switch, pass nodes: { SelectStmt(node) { … }, BinaryExpr(node, parent) { … }, … }. Each handler is strongly typed to that node type.
Visit control. Callbacks may return "skip" to stop descending into the current node (but still fire leave), "break" to halt the whole walk, or nothing / "continue" to proceed.
Visitor keys. VisitorKeys is the per-type list of child-bearing property names in traversal order. You can pass keys: { CaseExpr: ["base", "elseExpr"] } to override traversal for specific types per call.
Parent tracking. Callbacks receive (node, parent) where parent is the enclosing AST node, or undefined at the root.

The type AstNodeMap maps every type discriminator string to its interface. traverse uses that map to type per-type handlers; user code can too — e.g. type NodeOfType<T extends keyof AstNodeMap> = AstNodeMap[T].

Benchmarks

tl;dr:

    ~2.5x faster than liteparser (wasm)
    ~6x   faster than @guanmingchiu/sqlparser-ts (wasm)
   ~10x   faster than node-sql-parser
  ~100x   faster than pgsql-ast-parser
  ~200x   faster than sqlite-parser
  ~250x   faster than @appland/sql-parser

Results are averages from bun run bench:compare on one machine:
- CPU: Apple M4 Max
- Runtime: Bun 1.3.11 (arm64-darwin)
The compared parsers do not produce the same AST shape.
WebAssembly libraries pay some bridging overhead.

Cases

| Case | Description | | -------- | ----------------------------------------------------------------------------------------------- | | tiny | Single SELECT 1; statement. | | small | Single CREATE TABLE users (...) statement. | | medium | WITH query with joins, aggregate, window function, FILTER, grouping, ordering, and LIMIT. | | large | Wide CREATE TABLE analytics_events (...) statement with 96 metric columns. | | deep | Nested expressions with a subquery. | | broken | Invalid input with a trailing comma before FROM; used for the syntax-error path. |

Results

Avg per-iteration parse time across the five inputs, compared to ours.

| Parser | tiny | small | medium | large | deep | | ----------------------------------- | ------------------ | ------------------ | ------------------ | ------------------ | ----------------- | | Ours | 0.93 µs | 3.12 µs | 19.90 µs | 61.15 µs | 9.80 µs | | liteparser (wasm) | 2.05 µs (2.2×) | 4.92 µs (1.6×) | 49.88 µs (2.5×) | 86.14 µs (1.4×) | 31.71 µs (3.2×) | | @guanmingchiu/sqlparser-ts (wasm) | 5.97 µs (6.4×) | 14.42 µs (4.6×) | 143.50 µs (7.2×) | 186.30 µs (3.0×) | 71.35 µs (7.3×) | | node-sql-parser | 10.83 µs (12×) | 27.53 µs (8.8×) | 290.78 µs (15×) | 567.35 µs (9.3×) | 1.12 ms (114×) | | pgsql-ast-parser | 53.05 µs (57×) | 61.80 µs (20×) | 2.73 ms (137×) | 925.68 µs (15×) | 1.05 ms (108×) | | sqlite-parser | 472.37 µs (509×) | 614.04 µs (197×) | 5.59 ms (281×) | 7.09 ms (116×) | 3.30 ms (337×) | | @appland/sql-parser | 561.18 µs (605×) | 715.16 µs (229×) | 6.81 ms (342×) | 7.99 ms (131×) | 3.73 ms (380×) |

Libraries compared:

Ours - TypeScript parser derived from SQLite's parse.y grammar, uses Lemon LALR(1) parser generator to dump parser tables as JSON, which is used to drive a small TypeScript emitter.
@sqliteai/liteparser — C/WebAssembly parser extracted from the original SQLite parser (parse.y) with minimal modifications. It uses the same Lemon LALR(1) parser generator and a hand-written tokenizer derived from SQLite's own lexer.
sqlite-parser — PEG.js-generated pure-JS parser for SQLite (Code School, 2015-2017).
@appland/sql-parser — AppLand's maintained fork of sqlite-parser with additional grammar coverage.
node-sql-parser — PEG.js-derived multi-dialect parser; run here with database: "sqlite".
pgsql-ast-parser — nearley + moo Postgres grammar.
@guanmingchiu/sqlparser-ts — Rust/WebAssembly wrapper around datafusion-sqlparser-rs; run with its "sqlite" dialect.

Lemon JSON

Lemon is a LALR(1) parser generator similar to Yacc, written by SQLite's author, Dr. Richard Hipp. SQLite's own parser is generated with Lemon.

Our patched lemon.c can emit a JSON dump of the parser tables, making Lemon usable for any programming language.

The dumped parse tables are described by a JSON Schema at generated/json-schema/v1/parser.dev.schema.json.
Our parse.y grammar file: vendor/patched/3.53.0/src/parse.y.
Example dumped parse tables: generated/3.53.0/parser.dev.json.
Our TypeScript emitter script: scripts/emit-ts-parser.ts.
The output TypeScript parser: generated/3.53.0/parse.ts.

To implement a SQLite parser in another language you need roughly:

A lexer. Eg our src/tokenize.ts, SQLite's src/tokenize.c — ~1000 LoC.
An LALR(1) driver. Eg our src/lempar.ts, SQLite's tool/lempar.c — ~400 LoC. Consumes the tables in parser.dev.json, emits shift/reduce events.
parse.y, updated for your AST & language. Eg our parse.y,SQLite's parse.y. Start with one and replace action bodies with your own.

Contributing

├── src/                        runtime (TypeScript)
│   ├── lempar.ts               LALR(1) engine — 1:1 port of sqlite/tool/lempar.c
│   ├── parser.ts               AST-building parser layered on lempar
│   ├── tokenize.ts             lexer — port of sqlite/src/tokenize.c
│   ├── util.ts                 pure helpers from sqlite/src/util.c, string unquoting
│   ├── diagnostics.ts          Diagnostics, hints
│   ├── errors.ts               Error subclasses
│   ├── ast/                    AST definitions + helpers
│   │   ├── nodes.ts            every node interface, plus `AstNode` / `AstNodeMap`
│   │   ├── parseActions.ts     node constructors invoked from grammar actions
│   │   ├── parseState.ts       per-parse mutable state threaded to the reducer
│   │   └── traverse.ts         ESTree-style AST walker + helpers
│   ├── cli/                    shared runner used by the bin/ entry points
│   └── sqllogictest/           sqllogictest reader + test emitter (sqlite3-parser/sqllogictest)
├── generated/                  codegen + tables (checked into git)
│   ├── current.ts              re-export shim for the default version
│   ├── template/               per-version source templates (e.g. `index.ts`) consumed by codegen
│   ├── <version>/              one directory per tracked SQLite release
│   │   ├── index.ts            per-version TS wrapper (codegen)
│   │   ├── parse.ts            LALR tables + per-rule reducer (codegen)
│   │   ├── keywords.ts         keyword table (codegen)
│   │   ├── parser.dev.json     full grammar dump (debug/introspection)
│   │   └── keywords.{dev,prod}.json
│   └── json-schema/v1/         formal JSON Schemas for the dumps
├── vendor/                     vendored SQLite sources + patches
│   ├── manifest.json           which versions we track, their hashes & commits
│   ├── upstream/<version>/     pristine SQLite sources
│   ├── patched/<version>/      our patched lemon.c + mkkeywordhash.c + parse.y
│   └── README.md               details on the vendor tree
├── test/                       test suites (see Testing, below)
│   ├── examples/               curated SQL corpora (`.sqllogictest`) + generated AST snapshot tests
│   ├── sqlite/                 tokenizer, lexer, literal, and keyword tests ported from SQLite
│   └── sqllogictest/           large external sqllogictest corpora (evidence, index, random)
├── scripts/                    build, codegen, bench, and profiling tooling
├── bin/                        standalone CLI entry points
└── Makefile                    pattern rules for the per-version build graph

Development commands:

| Command | Purpose | ------------------------ | bun run test | bun run typecheck | bun run lint | bun run fmt | bun run coverage | bun run build | bun run build:gen | bun run build:dist | bun run bench | bun run bench:compare | bun run bench:dist | bun run prof | bun run prof:report | bun run dist-test | bun run vendor <ref> | make gen | make generated/<ver>/… | make help | | ------------------------------------------------------------------------------------------------ | | Run the full test suite. | | tsc --noEmit over source + tests + scripts. | | Run oxlint. | | Run oxfmt (formatter). | | Run the suite with lcov coverage, then bun run coverage:rules for the grammar rule-hit report. | | Full build: build:gen then build:dist. | | Codegen only — alias for make gen (regenerate everything under generated/). | | Dist only — tsc -p tsconfig.build.json emits a 1:1 JS mirror of the source tree into dist/. | | Run the internal benchmarks (tokenize, parse, error path). :node runs under Node. | | Compare performance against other libraries. :node runs under Node. | | Run bench against the built dist/ (rather than source). :compare / :node variants exist. | | CPU-profile the parser hot loop. :node runs under Node. | | Render the most recent bun run prof run into a text report. | | Run the external-consumer smoke tests against the built dist/ (use :build to rebuild first). | | Onboard a new SQLite version (see below). | | Regenerate all per-version outputs under generated/ (same as build:gen). | | Regenerate a specific output without re-running the whole vendor flow. | | List every pattern target Make knows about. |

Anything touching the vendored SQLite sources is explained in vendor/README.md.

Testing

bun run test runs the full suite. Tests are organized into three clusters:

test/examples/ — curated SQL corpora (.sqllogictest) paired with generated AST snapshot tests. Good place to add regression coverage for a new grammar case.
test/sqlite/ — tokenizer, lexer, literal, and keyword tests ported from SQLite's own test vectors.
test/sqllogictest/ — large external sqllogictest corpora for broad acceptance coverage.

Adding new SQLite versions

bun run vendor 3.55.0

This:

Adds a submodule at vendor/submodule/3.55.0/ pointing at SQLite's GitHub mirror, checked out at tag version-3.55.0.
Copies tool/{lemon.c, mkkeywordhash.c, lempar.c} and src/parse.y into vendor/upstream/3.55.0/.
3-way merges our patches against the most recent previous version's patched + upstream, writing the result to vendor/patched/3.55.0/. On conflict the script leaves <<<<<<< markers in the patched file and exits non-zero; resolve by hand and re-run.
Rebuilds the tools (build/lemon-3.55.0, build/mkkeywordhash-3.55.0) and runs them to produce generated/3.55.0/{parse.ts, parser.dev.json, keywords.{dev,prod}.json}, validated against the JSON Schemas.
Regenerates generated/3.55.0/index.ts and generated/current.ts. The new version becomes the current one in vendor/manifest.json; old versions are preserved forever.

After the script reports success, review the merged patches, run bun test, and commit. The script never auto-commits — it's easy to bury a silent regression otherwise.

Flags: --no-submodule --from <path> bring your own SQLite source tree (useful for building against an unreleased development snapshot), --commit <sha> overrides the commit hash recorded in the manifest.

Releases

Every push to main is automatically published to npm by CI via GitHub OIDC trusted publishing.

New package.json versions publish as "latest" dist-tag; default for npm install (stable)
Existing versions publish as ${version}-next.<sha> to the "next" dist-tag; use npm install <pkg>@next to install (preview)
See all published versions

Stability

AST shape may change on minor versions before v1.0.0.

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

sqlite3-parser

Usage

Errors

AST Viewer

AST Traversal

Benchmarks

Cases

Results

Lemon JSON

Contributing

Testing

Adding new SQLite versions

Releases

Stability