@clickhouse/rowbinary
v0.1.1
Published
RowBinary building blocks for Node.js — read/decode ClickHouse RowBinary / RowBinaryWithNames(AndTypes) streams (a matching writer is planned). Ships with the clickhouse-js-node-rowbinary-parser agent skill.
Readme
ClickHouse Node.js RowBinary Parser Generator
If JS had a -O3 compiler flag, this skill would be it. (for RowBinary parsing)
A skill and a library that lets a coding agent generate bespoke RowBinary parsers on the first pass from the column type definitions of a ClickHouse response. The spirit behind the approach.
Reader only for now. Today this covers reading (decoding) RowBinary streams. A matching RowBinary writer (encoding) is planned.
Status
- ✅ Sonnet 4.6: 60% -> 94.0% pass rate
- ✅ Opus 4.8: 71% -> 94.7% pass rate
- ✅ Haiku 4.5: 52% -> 86.0% pass rate
- ✅ Composer 2.5 Fast: 3x parser performance
- ✅ 469/469 tests
- ✅ type-checked
- ✅ benchmarked
Example
Take a small orders result:
SELECT id, uid, price, status FROM orders
-- id UInt8
-- uid UUID
-- price Decimal64(2)
-- status Enum8('new' = 1, 'shipped' = 2, 'done' = 3)The API-only reader — what you write by composing the library's combinators. Correct, clear, and a fine default:
const readOrderRow: Reader<OrderRow> = (s) => ({
id: readUInt8(s),
uid: formatUUID(readUUID(s)),
price: readDecimal64(2)(s),
status: readEnum8(s),
});The optimized reader the skill generates — same row, monomorphized to
straight-line code. The whole row is fixed-width (1 + 16 + 8 + 1 = 26 bytes), so
the four separate bounds checks coalesce into one advance(s, 26) and every leaf
read happens at a constant offset; the per-field combinators are gone:
const readOrderRowFast: Reader<OrderRow> = (s) => {
const { buf, view } = s;
const o = advance(s, 26); // one bounds check for the whole 26-byte row
const id = buf[o]!;
const uid = formatUUIDTable(buf.subarray(o + 1, o + 17));
const price: DecimalValue = [view.getBigInt64(o + 17, true), 2];
const status = view.getInt8(o + 25);
return { id, uid, price, status };
};Same values, same streaming-safety — ~3.4x faster.
How to use
As a library (comes with the skill):
npm install @clickhouse/rowbinary
npx skills-npm setupAs a skill only:
npx skills add ClickHouse/clickhouse-js/skills/clickhouse-js-node-rowbinary-parser> Hey, Claude, tell me what the rowbinary parser skill can do for me.
> A lot! It generates custom, high-performance RowBinary parsers…
> Super, generate a parser for the queries in app/src/model.ts.
< Reading skill clickhouse-js-node-rowbinary-parser…Why it's worth it
Four pillars — speed, correctness, judgment, and lifting smaller models:
- ~2–3x faster code than the straightforward decoder. The skill emits
monomorphized, flattened, straight-line code — inlined reads, bounds checks
coalesced across adjacent fixed-width columns, the right array layout — measured
at ~1.3–3.4x over the same logic written with the plain combinator API
(
npm run bench). This is why- inlined JIT friendly code
- benchmarked hot paths
- minimal allocations
- v8 and Node.js specific optimizations
- Correct on the gotchas that otherwise quietly break. UUID byte
order,
Variant's sort-by-type-name discriminant,DateTime64sub-second precision, signed-high-word wide integers, faithful decimals,Dynamic/JSONself-description, transparent wrappers, opaqueAggregateFunction— each encoded with a live, server-verified test (details below). - Judgment, not just code. The skill carries the working knowledge to make
the right call before writing a line, so the agent neither over- nor
under-engineers:
- Is RowBinary even right? For string-heavy results read as text, a
JSON*format + V8's nativeJSON.parse(plusgzip/zstd) can beat a JS RowBinary decoder — reach for RowBinary when the data is numeric / wide-integer / binary-blob heavy. - Whole buffer or stream? Drop the
advance()bounds checks for a complete in-memory buffer (faster); keep them for a chunked HTTP response that must survive rows straddling chunk boundaries. - Drop the portability scaffolding. RowBinary is little-endian and the target is x86/ARM, so the skill steers away from big-endian / byte-swap "portability" code a cautious one-shot pass tends to add.
- Is RowBinary even right? For string-heavy results read as text, a
- Improves smaller models' performance. Because the skill hands over the hard-won answers up front, it lifts a weaker model the most. In a 24-eval with-skill vs no-skill benchmark, the skill raised Sonnet 4.6 from 60.4% to 94.0% (+34pp) — bringing it level with skill-equipped Opus 4.8 (94.7%), which itself gained +23pp (71.5% → 94.7%). Composer 2.5 Fast got a 3x parser performance boost, Haiku 4.5 raised from 52% to 86% — the skill closes most of the model-capability gap on this task.
What it does
Given the columns of a query result — their names and ClickHouse type
definitions (as returned by RowBinaryWithNamesAndTypes, or supplied by the
user) — the skill generates parser code tailored to exactly those types. Rather
than shipping a generic, runtime-driven decoder, it emits straight-line code
that reads each column in order, so the parser only contains the logic the
specific result shape needs.
Correctness on the gotcha-heavy types
For a plain UInt64, String, DateTime result a strong model already writes fast,
correct code on its own. The skill earns its keep on the long tail of RowBinary
traps — the encodings where a from-scratch decoder is quietly wrong — each one
captured here with a live, server-verified test:
- UUID — two little-endian
UInt64halves, each byte-reversed vs. the text form (not 16 bytes in order). Variant(...)— the 1-byte discriminant indexes the alternatives sorted by type name (ClickHouse globally sorts them), NOT declaration order;0xFFis NULL.DateTime64(P)— returned as[Date, nanoseconds]so the sub-second part isn't lost to aDate's millisecond resolution;Time/Time64are durations, not instants.- Wide integers —
Int128/Int256compose from 64-bit words with the high word read signed; 64-bit values staybigint, never a lossynumber. - Decimals — kept as the exact
[unscaled, scale]pair, not a lossy float. Dynamic/JSON— self-describing: a per-value binary type encoding, then the value; declared typedJSONpaths are written without a tag (need the schema). Wrappers are erased (Nullable/Variant→ concrete type).- Transparent wrappers —
LowCardinality(T)/SimpleAggregateFunction(f, T)decode as the innerT(no dictionary layer in RowBinary);Nested(...)isArray(Tuple(...))with no wire of its own. AggregateFunction(...)— opaque, unframed state: not decodable or even skippable; finalize server-side instead.FixedStringpreserves trailing NUL padding;Enumdecodes to the underlying int (the name map is metadata);BFloat16is the top 16 bits of aFloat32.
This is also where a raw model is most likely to go wrong. In a clean-room test
on a Variant / UUID / DateTime64 / LowCardinality schema, a no-skill
Sonnet produced a silently wrong UUID (treated the bytes as plain, missing
the two-reversed-halves layout), and a no-skill Opus got it right only after
three web searches. The skill hands over these answers up front — correct by
construction, no lookups. See baseline/README.md for the full control.
And the failure isn't a one-off — it's a coin-flip. Running the same no-skill
Sonnet on the orders schema (UInt8, UUID, Decimal64(2), Enum8) 5 times in
isolation, only 3 of 5 runs decoded correctly; both failures were the same
UUID byte-order scramble. Even the passing runs varied ~1.9x in generated-code
throughput. With the skill, every run is correct. So a single A/B undersells the
gap: from scratch the model is right roughly 60% of the time and silently wrong
the rest, while the skill makes correctness deterministic.
Examples
Six end-to-end examples live in EXAMPLES.md. Each ships both an API-combinator
reader and an optimized, monomorphized one, with a runnable round-trip test and
a benchmark — so the speedups below are measured, not claimed (Node 24 / V8;
npm run bench for your own numbers):
| Example | Columns | Optimized speedup |
| ----------------- | ---------------------------------------------------- | ------------------- |
| orders | UUID, Decimal64, Enum8 | ~3.4x |
| carts | nested Array(Tuple(...)), Array(Nullable(...)) | ~2.0x |
| telemetry | Map, Array(Float64), Nullable, named Tuple | ~1.4x |
| observability | Variant, DateTime64(3), LowCardinality, nested | ~1.4x |
| profiles | Array(String), Nullable(Int32) | ~1.3x |
| events | UInt64, String, DateTime scalars | ~1.05x — on par |
Two axes drive the win. Composite structure is one: monomorphization pays in
proportion to how many per-row combinator closures it removes (carts /
telemetry / observability). Per-row formatting is the other, independent
of composites: orders is all-scalar yet the biggest win (~3.4x), almost
entirely from swapping the BigInt UUID formatter for the lookup-table
formatUUIDTable. The genuinely flat case — a scalar row with no hot formatter
(events) — is on par, so the simpler API reader is the right call there.
Measure, don't assume.
Scope
- In scope:
RowBinary,RowBinaryWithNames, andRowBinaryWithNamesAndTypesdecoding for Node.js — full-buffer and streaming (chunked) viaadvance()/NeedMoreData,readRows(), and the asyncstreamRowBatches()(with a built-in small-chunk warning and the optionalcoalesceChunks()debounce filter). - Planned: RowBinary writing / encoding (the inverse of everything above)
- Out of scope (for now): browsers and Edge runtimes, non-RowBinary formats (JSON / CSV / TSV / Parquet), and big-endian hosts.
The spirit
A RowBinary parser generator is a narrow thing. But it's built as an instance of a broader bet about what libraries become once a capable LLM is part of the toolchain. Three shifts, each already visible in this repo:
Self-modifiable software. The library deliberately ships several equivalent decoders for the same type —
readUUID/readUUIDBigInt/readUUIDHiLo,formatUUID/formatUUIDTable,new Array(n)vs[]+push, streaming vs whole-buffer — because the fastest one depends on the workload, not the type. Today the agent picks at generation time from measured benchmarks. The next step is to pair the skill with a tracing layer that runs variant A against variant B on the live workload and keeps whichever wins for this data shape and access pattern — a parser that re-tunes itself as the traffic drifts, instead of freezing one author's guess into a release.Custom software. The value here isn't a fixed high-level API; it's the benchmarked building blocks plus the judgment to combine them. So the end user doesn't bend their code to the authors' generic surface — they have the agent assemble the high-level API they actually want, shaped to their queries, row shapes, and latency/memory budget. Two teams with different workloads grow two different libraries from the same primitives, and neither inherits a design decision that was only ever right for the original authors' use case.
Read-write libraries. For either of the above to be safe, the source has to be legible to an LLM, not merely runnable. So this repo is written read-write: every tradeoff is commented where it's made — the per-column ClickHouse type annotations, the
SAFE TO TOGGLEmarkers on the fast variants, each reader's doc comment carrying its exact monomorphized form. An LLM can read why a decision was made and change it in depth with confidence — not just call the public functions, but safely rework the internals.
The through-line: the last mile is glue the LLM writes over stable, benchmarked blocks, so the authors' job shrinks to exporting good primitives and documenting their tradeoffs honestly — rather than trying to bake the right performance constants for every possible workload into the library ahead of time.
