parsehouse
v0.1.1
Published
Pure TypeScript ClickHouse-oriented SQL parser with AST and serializer
Maintainers
Readme
parsehouse
Pure TypeScript ClickHouse SQL parser with a typed AST, tree traversal helpers, and canonical SQL serialization.
parsehouse is built for tools that need to parse SQL, inspect it, validate it, modify it, and turn it back into normalized SQL.
What it does
- Parse ClickHouse SQL into a typed AST
- Traverse the AST with discriminated unions and
kindchecks - Modify statements and expressions in memory
- Serialize AST nodes back to canonical SQL with
toSql() - Drop comments and normalize whitespace during serialization
Install
npm install parsehouseQuick Start
import {
ClickHouseDialect,
parseStatement,
toSql,
} from "parsehouse";
const dialect = new ClickHouseDialect();
const statement = parseStatement(
"SELECT user_id FROM events FINAL PREWHERE team_id = 1 LIMIT 10",
{ dialect },
);
if (statement.kind === "select_statement") {
const source = statement.from?.[0];
if (source?.kind === "table_reference" && source.final) {
console.log("query uses FINAL");
}
statement.limit = {
kind: "limit_clause",
limit: { kind: "literal", literalType: "number", value: 100 },
};
}
console.log(toSql(statement));
// SELECT user_id FROM events FINAL PREWHERE team_id = 1 LIMIT 100Basic API
import {
ClickHouseDialect,
parseExpr,
parseSql,
parseStatement,
toSql,
tokenize,
visit,
} from "parsehouse";parseStatement(sql, options?)parses one statement and throws if the input contains more than oneparseSql(sql, options?)parses one or more semicolon-separated statementsparseExpr(sql, options?)parses a standalone expressiontokenize(sql)returns lexer tokenstoSql(node)serializes a statement or expression back to canonical SQLvisit(node, visitor)walks the tree for analysis or collectionClickHouseDialectenables ClickHouse-specific syntax such asFINAL,PREWHERE,LIMIT BY,SETTINGS, and projection operations
Typical Workflow
1. Parse
import { ClickHouseDialect, parseStatement } from "parsehouse";
const stmt = parseStatement(
"SELECT * FROM metrics FINAL PREWHERE team_id = 42 WHERE ts >= now() - INTERVAL 1 HOUR",
{ dialect: new ClickHouseDialect() },
);2. Inspect
if (stmt.kind === "select_statement") {
console.log(stmt.prewhere);
console.log(stmt.where);
console.log(stmt.settings);
}3. Validate
if (stmt.kind === "select_statement") {
const source = stmt.from?.[0];
if (source?.kind === "table_reference" && source.final) {
throw new Error("FINAL is not allowed for user-facing queries");
}
if (!stmt.limit) {
throw new Error("Queries must include LIMIT");
}
}4. Modify
if (stmt.kind === "select_statement") {
stmt.limit = {
kind: "limit_clause",
limit: { kind: "literal", literalType: "number", value: 500 },
};
}5. Serialize
import { toSql } from "parsehouse";
const sql = toSql(stmt);Subquery Example
Subqueries can appear both in FROM (...) and inside expressions.
Subquery in FROM
import { ClickHouseDialect, parseStatement } from "parsehouse";
const stmt = parseStatement(
`
SELECT user_id, cnt
FROM (
SELECT user_id, count() AS cnt
FROM events
GROUP BY user_id
) AS aggregated
WHERE cnt > 10
ORDER BY cnt DESC
`,
{ dialect: new ClickHouseDialect() },
);
if (stmt.kind === "select_statement") {
const source = stmt.from?.[0];
if (source?.kind === "subquery_source" && source.query.kind === "select_statement") {
console.log(source.query.projection);
console.log(source.query.groupBy);
}
}Scalar subquery in an expression
import { ClickHouseDialect, parseStatement } from "parsehouse";
const stmt = parseStatement(
`
SELECT
user_id,
(SELECT avg(score) FROM scores) AS average_score
FROM users
`,
{ dialect: new ClickHouseDialect() },
);
if (stmt.kind === "select_statement") {
const firstItem = stmt.projection[1];
if (firstItem?.kind === "expression_select_item" && firstItem.expression.kind === "subquery_expr") {
console.log(firstItem.expression.query);
}
}AST Overview
The AST is a set of discriminated unions keyed by kind.
Statements
Top-level nodes are Statement values.
select_statementcreate_table_statementcreate_view_statementinsert_statementalter_table_statementoptimize_table_statementuse_statementdescribe_statementexplain_statementkill_statementraw_statement
Expressions
Expressions are Expr values.
identifier_exprliteralbinary_exprunary_exprfunction_call_exprsubscript_exprfield_access_exprarray_exprtuple_exprdictionary_exprsubquery_exprinterval_exprwildcard_exprraw_expr
Naming Nodes
identifierrepresents a single identifier likeuser_idobject_namerepresents dotted names likedb.events
Query Nodes
select_statement is the most important node for validation and rewriting.
Important fields:
projection: selected expressions or wildcardsfrom: table references, subqueries, or table functionssample: ClickHouseSAMPLEclauseprewhere: ClickHousePREWHEREwhere: regular filtergroupBy: grouping expressionshaving: post-aggregation filterorderBy:ORDER BYitems, includingWITH FILLinterpolate: ClickHouseINTERPOLATElimit: regularLIMIT,LIMIT BY, andWITH TIESsettings: ClickHouse query settingsformat: output format identifier
Example:
if (stmt.kind === "select_statement") {
console.log(stmt.projection);
console.log(stmt.from);
console.log(stmt.prewhere);
console.log(stmt.limit);
}Table Sources
The from array contains FromSource nodes.
table_referencefor normal tablesfunction_table_sourcefor table functions likenumbers(10)ors3(...)subquery_sourceforFROM (...)
table_reference includes ClickHouse FINAL directly:
if (stmt.kind === "select_statement") {
const source = stmt.from?.[0];
if (source?.kind === "table_reference") {
console.log(source.name);
console.log(source.final);
}
}Function Calls
function_call_expr models both normal and ClickHouse parametric functions.
name: function name asobject_nameargs: normal call argumentsparameters: parametric function parameters, if presentsettings: function-localSETTINGS, if present
Example:
import { ClickHouseDialect, parseExpr } from "parsehouse";
const expr = parseExpr("HISTOGRAM(0.5, 0.9)(value)", {
dialect: new ClickHouseDialect(),
});
if (expr.kind === "function_call_expr") {
console.log(expr.parameters);
console.log(expr.args);
}DDL Nodes
create_table_statement includes the main ClickHouse table-shape pieces:
namecolumnsengineprimaryKeyorderByasSelect
Each column_definition contains:
namedataTypeoptions
data_type captures nested type structures such as Nullable(String), Map(String, UInt64), Tuple(...), and AggregateFunction(...).
ALTER TABLE Operations
alter_table_statement contains typed operations in operations.
Supported ClickHouse-oriented operation nodes include:
attach_partition_operationdetach_partition_operationadd_projection_operationdrop_projection_operationclear_projection_operationmaterialize_projection_operationfreeze_partition_operationunfreeze_partition_operationraw_alter_operationas a fallback for unsupported forms
Example:
import { ClickHouseDialect, parseStatement } from "parsehouse";
const stmt = parseStatement(
"ALTER TABLE events ADD PROJECTION p0 (SELECT user_id, count() GROUP BY user_id)",
{ dialect: new ClickHouseDialect() },
);
if (stmt.kind === "alter_table_statement") {
for (const op of stmt.operations) {
console.log(op.kind);
}
}Traversal
Use visit() to inspect the tree without writing your own walker.
import { visit } from "parsehouse";
const identifiers: string[] = [];
const functions: string[] = [];
visit(stmt, {
enter(node) {
if (node.kind === "identifier_expr") {
identifiers.push(node.name.kind === "identifier" ? node.name.name : node.name.parts.map((part) => part.name).join("."));
}
if (node.kind === "function_call_expr") {
functions.push(node.name.parts.map((part) => part.name).join("."));
}
},
});Validation Patterns
parsehouse is a syntax parser and AST toolkit. It does not try to enforce your business rules.
Typical validations you can build on top of the AST:
- require a tenant filter in
WHEREorPREWHERE - reject
FINALfor latency-sensitive queries - require
LIMITfor interactive queries - ban mutation statements like
ALTER TABLE ... UPDATE - restrict access to specific tables or databases
- validate allowed functions or settings
Serialization Notes
toSql()produces normalized SQL, not source-faithful SQL- comments are dropped
- whitespace is normalized
- output may differ textually from the original even when the AST meaning is preserved
Example: Validate, Rewrite, Serialize
import {
ClickHouseDialect,
parseStatement,
toSql,
visit,
} from "parsehouse";
const stmt = parseStatement(
"SELECT user_id, count() FROM events FINAL WHERE ts >= now() - INTERVAL 1 DAY GROUP BY user_id",
{ dialect: new ClickHouseDialect() },
);
if (stmt.kind !== "select_statement") {
throw new Error("expected SELECT");
}
const source = stmt.from?.[0];
if (source?.kind === "table_reference" && source.final) {
source.final = false;
}
stmt.limit = {
kind: "limit_clause",
limit: { kind: "literal", literalType: "number", value: 100 },
};
const usedFunctions: string[] = [];
visit(stmt, {
enter(node) {
if (node.kind === "function_call_expr") {
usedFunctions.push(node.name.parts.map((part) => part.name).join("."));
}
},
});
console.log(usedFunctions);
console.log(toSql(stmt));Current Coverage
- ClickHouse-focused parser surface
- typed AST for common statements, expressions, DDL, and major ClickHouse operations
- canonical serializer via
toSql() - explicit Vitest coverage for upstream ClickHouse cases and a generated 150-query real-world corpus
Development
npm run checkruns the TypeScript type-checknpm testruns the Vitest test suitenpm run buildbuilds the package withtsupnpm run benchruns parse and generate benchmarks across five representative query sizes sampled from the test corpus
Non-Goals
- preserving original comments
- preserving original whitespace
- full semantic validation against a running ClickHouse server
- exact source formatting round-trips
License
Apache-2.0
