@zakkster/lite-bake

v1.0.1

Published

a month ago

Compile JSON arrays into flat interleaved binary for zero-GC, L1-cache-friendly reads. Stop parsing JSON in your game loop.

@zakkster/lite-bake

Stop parsing JSON in your game loop. Compile your massive JSON configs into flat, interleaved binary arrays for zero-GC, L1-cache-friendly memory access.

npm install @zakkster/lite-bake

The problem

You build a tilemap in Tiled, export 50,000 enemy spawn points from a level editor, or ship a config with 5,000 item definitions. You JSON.parse() the file, and now:

You have 50,000 tiny objects on the heap. Each one has a hidden class, a map pointer, and 5–10 slots of V8 overhead.
Every iteration of level.spawns[i].x chases pointers through scattered memory — bad for the CPU cache.
The first few frames after load are janky as the GC decides what survives.
Accessing a nested level.layers[0].data[i] in your physics loop? You've already lost.

The fix

bake() takes your array of records and produces a single ArrayBuffer with one fixed-width binary row per record. You read it back through raw typed-array indexing — no method calls, no property lookups, no allocations, no GC pressure.

graph LR
    A[JSON file] -->|JSON.parse| B[Array of objects]
    B -->|bake| C[ArrayBuffer]
    C -->|new Reader| D[Typed array views]
    D -->|f32 i * stride + offset| E[Hot loop<br/>zero GC]

    style A fill:#f9f5e7,stroke:#333,color:#000
    style B fill:#f4cccc,stroke:#333,color:#000
    style C fill:#d9ead3,stroke:#333,color:#000
    style D fill:#d9ead3,stroke:#333,color:#000
    style E fill:#b6d7a8,stroke:#333,color:#000

30-second example

import { bake, Reader, Types } from '@zakkster/lite-bake';

const spawnPoints = [
  { x: 100, y: 200, type: 0, hp: 50 },
  { x: 340, y: 180, type: 1, hp: 80 },
  // ... 49,998 more
];

// Once at load time:
const baked = bake(spawnPoints, {
  schema: { x: Types.F32, y: Types.F32 }      // force F32 for pixel-accurate coords
});
const r = new Reader(baked);

// Cache offsets once:
const f32 = r.f32, u8 = r.u8;
const s32 = r.strideF32, sB = r.stride;
const OFF_X    = r.offsetF32('x');
const OFF_Y    = r.offsetF32('y');
const OFF_TYPE = r.offsetU8('type');
const OFF_HP   = r.offsetU8('hp');

// Hot loop — ZERO allocations, ZERO GC pressure:
for (let i = 0; i < r.count; i++) {
  const base32 = i * s32, baseB = i * sB;
  const x    = f32[base32 + OFF_X];
  const y    = f32[base32 + OFF_Y];
  const type = u8 [baseB  + OFF_TYPE];
  const hp   = u8 [baseB  + OFF_HP];
  // ...spawn, update, render...
}

Memory layout — the whole point

Before: JS object graph

graph TD
    ARR[Array header]
    ARR --> O0[Record 0 header]
    ARR --> O1[Record 1 header]
    ARR --> O2[Record 2 header]
    O0 --> X0[x: Number]
    O0 --> Y0[y: Number]
    O0 --> T0[type: Number]
    O1 --> X1[x: Number]
    O1 --> Y1[y: Number]
    O1 --> T1[type: Number]
    O2 --> X2[x: Number]
    O2 --> Y2[y: Number]
    O2 --> T2[type: Number]

    style ARR fill:#f4cccc,stroke:#333,color:#000
    style O0 fill:#fce5cd,stroke:#333,color:#000
    style O1 fill:#fce5cd,stroke:#333,color:#000
    style O2 fill:#fce5cd,stroke:#333,color:#000

Each object is a separate heap allocation. Fields are pointers. Reading one record trashes the cache for the next.

After: one contiguous ArrayBuffer

graph LR
    subgraph "ArrayBuffer (single allocation)"
      R0["[x0][y0][t0]"]
      R1["[x1][y1][t1]"]
      R2["[x2][y2][t2]"]
      R3["[x3][y3][t3]"]
      R4["..."]
    end

    style R0 fill:#b6d7a8,stroke:#333,color:#000
    style R1 fill:#b6d7a8,stroke:#333,color:#000
    style R2 fill:#b6d7a8,stroke:#333,color:#000
    style R3 fill:#b6d7a8,stroke:#333,color:#000
    style R4 fill:#d9ead3,stroke:#333,color:#000

Records are laid out back-to-back at a known byte offset. Reading record i+1 is already in L1 cache because L1 lines are 64 bytes — you just read record i from the same line.

How it compares

| Feature | JSON.parse | lite-bake | FlatBuffers | Protobuf | MessagePack | |---|---|---|---|---|---| | Schema required upfront | No | No (inferred) | Yes (.fbs) | Yes (.proto) | No | | Zero-copy random access | No | Yes | Yes | No | No | | Zero-GC hot loop | No | Yes | Yes | No | No | | Code generation step | No | No | Yes | Yes | No | | Install size | 0 | ~3 KB | ~40 KB | ~150 KB | ~10 KB | | Best for | Small configs | Game data, per-frame loops | Cross-language binary | RPC / network | Wire format | | Learning curve | Zero | ~5 min | High | High | Low |

lite-bake's niche: you already have JSON, you want binary-grade read performance, you don't want a build step.

Type inference

bake() picks the smallest typed array that fits every value in a column. Override with opts.schema.

| Value range in column | Inferred type | Bytes | |---|---|---| | All integers, 0..255 | U8 | 1 | | All integers, 0..65535 | U16 | 2 | | All integers, 0..4_294_967_295 | U32 | 4 | | All integers, -128..127 | I8 | 1 | | All integers, -32768..32767 | I16 | 2 | | All integers, -2³¹..2³¹-1 | I32 | 4 | | Any fractional value (1.5, -0.25, ...) | F32 | 4 | | Non-number (string, null, mixed) | F32 (stored as 0) | 4 |

When to override:

Pixel-accurate coordinates you don't want snapped to int → force F32.
Scientific precision values (doubles) → force F64.
You want the binary layout to be stable regardless of record values → override everything.

bake(records, {
  schema: {
    x: Types.F32,
    timestamp: Types.F64,
    level: Types.U8,
  }
});

The canonical hot-loop pattern

This is the pattern. Memorise it. Every deviation costs frames.

// ONE TIME, AT LOAD
const r   = new Reader(baked);
const f32 = r.f32;                // keep locals
const u8  = r.u8;
const s32 = r.strideF32;          // stride in 4-byte words
const sB  = r.stride;             // stride in bytes (for u8)
const OFF_X    = r.offsetF32('x');
const OFF_TYPE = r.offsetU8('type');

// PER FRAME
for (let i = 0; i < r.count; i++) {
  const x = f32[i * s32 + OFF_X];
  const t = u8 [i * sB  + OFF_TYPE];
  // ...
}

Do / Don't

| ❌ Don't do this | ✅ Do this | |---|---| | r.get(i, 'x') in a per-frame loop | f32[i * s32 + OFF_X] | | r.row(i) for anything except console.log | Read individual fields | | Recompute r.offsetF32('x') every iteration | Cache OFF_X once | | Use DataView in the hot path | Use typed-array indexing | | Mix up strideF32 and stride (bytes vs words) | Pick one per loop body; comment clearly |

API

`bake(records, opts?) → Baked`

Compiles an array of records into a flat binary.

| Option | Type | Default | Notes | |---|---|---|---| | opts.schema | { [field]: Types.X } | {} | Override inferred types. Partial allowed. | | opts.validate | boolean | false | Dev only: throws if records don't all have the same keys. |

Returns { buffer, stride, count, schema }.

`new Reader(baked)`

| Property | Type | Purpose | |---|---|---| | r.count | number | Record count | | r.stride | number | Bytes per record | | r.strideF32 / strideU32 | number | Stride in 4-byte units | | r.strideF64 | number | Stride in 8-byte units | | r.strideU16 | number | Stride in 2-byte units | | r.f32 / f64 / i32 / u32 / i16 / u16 / i8 / u8 | *Array | Views onto the same ArrayBuffer — pick the one matching your field type | | r.dv | DataView | For irregular or init-only reads |

| Method | Returns | Hot-loop safe? | |---|---|---| | r.offsetBytes(name) | Byte offset within one record | ✅ (once, cache the result) | | r.offsetF32(name) etc. | Offset in element units | ✅ (once, cache the result) | | r.get(i, name) | Value | ❌ string lookup + branch | | r.row(i) | Plain object | ❌ allocates |

All offsetXxx(name) helpers type-check the field. offsetF32('tag') on a U8 field throws — this catches schema-reads-as-wrong-type bugs at init, not in the hot loop.

Edge cases & gotchas

Stride is padded to the largest field's alignment

If your schema has an F64, stride is a multiple of 8. An F32-only schema gets stride padded to 4. An all-U8 schema gets stride padded to 4 (the minimum). This keeps i * strideF32 + off arithmetic exact for every field.

The buffer byte length is padded up to a multiple of 8

So that new Float64Array(baked.buffer) always works, even when no field is an F64. Costs at most 7 trailing unused bytes per baked dataset. Negligible.

Inference reads every record

bake() walks all records once to determine the smallest fitting type. O(records × fields). For 100k records, this is single-digit milliseconds. If you already know the types and want to skip inference entirely, pass a full opts.schema.

Null / undefined / missing fields become `0`

No warning, no throw — unless you pass { validate: true }, in which case missing/extra keys throw at bake time. Use validate: true in development, drop it in production.

Strings are silently ignored

A string-valued field is treated as non-numeric → stored as F32 zeros. If you need string tables, that's on the v1.1 roadmap.

Native endianness is used throughout

bake() writes with DataView.setFloat32(..., littleEndian) where littleEndian is detected at module load. Typed-array reads (f32[i]) always use native endianness. Round-trips work on both LE (99.99% of hardware) and BE.

`Reader` field views are lazy only by convention

All eight typed-array views are instantiated in the constructor. They share the same ArrayBuffer, so this costs 8 small view headers (~600 bytes total) regardless of record count. Don't worry about it.

Benchmarks — and some honest caveats

Measured on Node 22, 50,000 records (random x/y/type/hp), 100 loop passes per trial, 5 trials, 3 warmups. Run it yourself: node benchmark/bench.js.

What's reliable

| Metric | JS objects | lite-bake | Result | |---|---|---|---| | Heap footprint | ~2.3 MB (approx object graph) | 586 KB (one ArrayBuffer) | ~4× smaller, consistently | | Init (from already-parsed records) | — | ~8 ms | One-time cost at load | | Object-access run-to-run variance | 3–5% | — | V8 inline caches are stable | | Baked-access run-to-run variance | — | occasionally 40–50% (single slow trial, rest stable) | Worth knowing |

What's not a dramatic speedup

Honest disclosure: on a synthetic monomorphic hot loop over a dataset that fits in L2 cache, V8's object JIT is exceptional. You should expect baked and object access to land within noise of each other (~0.9×–1.1× speedup). We measured:
Object access: ~15–17 ms median (~300 Mop/s)
Baked access: ~16–17 ms median (~300 Mop/s)
If a library tells you it's "5× faster than objects" on this kind of microbenchmark, be skeptical.

Where baked access does reliably win

Large datasets that spill L2/L3 cache. Once your working set is bigger than ~1 MB per core, pointer chasing through object graphs hits main memory; baked access doesn't.
Polymorphic shapes. If your records don't all have identical keys in identical order, V8 falls off the monomorphic fast path and object access slows significantly.
GC-sensitive timing. Baked access allocates zero. In a frame where other code is allocating (particle spawns, string building, closures), baked reads won't contend for allocation or trigger young-gen collections.
Binary serialization. Writing new Uint8Array(baked.buffer) to disk is one syscall. Serializing an object graph means JSON.stringify — orders of magnitude slower.
GPU upload. baked.buffer goes straight to gl.bufferData or queue.writeBuffer. No intermediate copy.

TL;DR: the performance argument for lite-bake is predictability and memory, not raw throughput in a hot cache. The memory win is always real. The speed win depends on your workload.

Testing & QA guide

Running the test suite

npm test

Uses Node's built-in node:test runner. Zero dependencies. 36 tests covering 8 categories. Should complete in under a second.

What the tests cover

| Category | What it verifies | Why it matters | |---|---|---| | Input validation | bake([]), bake(null), bake({}) all throw | Never silently corrupt | | Type inference | Boundary at 255, 256, 65535, 65536, -128, -129 | Correct smallest-fitting type | | Round-trip | Values go in → come out bit-identical (ints) or float-precise | Core correctness claim | | F64 alignment | F64 + U32 mix, stride padding, typed-array reads match DataView | The critical fix — untested, this regresses silently | | Layout | Buffer size padded to 8, offsets aligned, sorted by size | Memory model matches the README | | Schema overrides | Force F64, partial override still infers the rest | Public API contract | | Validate mode | On/off behaviour, missing/extra fields throw when on | Dev-time safety net | | Reader helpers | Type-checked offsetXxx, get, row, unknown field throws | Prevent schema-type-mismatch bugs | | Integration | 1k and 50k records via hot-loop pattern match .get() | End-to-end sanity |

Adding your own tests

Drop a .test.js file in test/. Any file the node --test runner discovers will run. Example:

import { test } from 'node:test';
import assert from 'node:assert/strict';
import { bake, Reader } from '../src/index.js';

test('my game: enemy table round-trips', () => {
  const enemies = [ /* ... */ ];
  const r = new Reader(bake(enemies));
  assert.equal(r.get(0, 'hp'), enemies[0].hp);
});

Manual sanity checks (for reviewers / QA)

Schema shape matches expectation
```
const b = bake(myRecords);
console.log(b.schema);      // each field: { name, type, offset }
console.log('stride:', b.stride);
```
Confirm every field uses the type you expect. If a field you expected to be F32 came out as U8 — check your input; inference picks the smallest fitting type.
Round-trip a known value
```
const r = new Reader(bake([{ x: 42.5, tag: 7 }]));
console.log(r.row(0));       // { x: 42.5, tag: 7 }
```
If values differ, it's either (a) float precision with F32 (use F64 override) or (b) type override conflict with actual values.
Confirm zero GC in the hot loop (Chrome DevTools)
- Open DevTools → Performance tab.
- Record a 2-second frame of your game.
- Filter for "Minor GC" and "Major GC" events in the timeline.
- During the baked read loop: you should see none originating from your code. (Other engine code may still trigger them.)
- Compare against the same loop using object access — minor GCs should be measurably more frequent.

Buffer size sanity

const b = bake(myRecords);
console.log({
  records: b.count,
  stride:  b.stride,
  data:    b.stride * b.count,
  buffer:  b.buffer.byteLength,        // should be data rounded up to mult of 8
});

Benchmark on your data
```
node benchmark/bench.js
```
Edit makeRecords() in the file to match your record shape. Run 3–5 times and take the median — the first run is JIT warmup.

Red flags that mean something is wrong

| Symptom | Likely cause | Check | |---|---|---| | RangeError: Float64Array byte length... | Running old lite-bake (pre-1.0.0 fix) | Upgrade | | Values read back as 0 | Field was non-numeric, or forgot schema override for F32 on whole ints | console.log(b.schema) | | Values read back as wrong integer | Inference picked U8, real range exceeded 255 | Add schema override | | Coords drift slightly each frame | F32 precision on large values | Override to F64 | | field 'x' has wrong type thrown from offsetF32 | You asked for F32 offset on a non-F32 field | Match field type to offset helper, or pass schema override |

FAQ

Why not just use Float32Array directly? If you have one homogeneous numeric column, you should. lite-bake is for heterogeneous records — mixing floats, ints, and byte-sized flags in one logical row.

Why interleaved (AoS) instead of columnar (SoA)? Because for game data (spawn points, tile entries, particle seeds) you usually read most fields of record i per iteration, not one field across all records. AoS gives you one cache line per record. Columnar (SoA) will come in v1.1 for workloads that scan one field at a time.

Does this work in the browser? Yes. Zero Node-specific APIs. Use any bundler, or load directly as ES module.

Does this work with WebGL vertex buffers? Yes — baked.buffer is a raw ArrayBuffer that you can gl.bufferData directly. But if that's your specific use case, see also lite-batch-buffer (sibling library for per-frame interleaved vertex staging).

Can I serialize the baked buffer to disk? Yes — write new Uint8Array(baked.buffer) to a file. You'll need to separately record the schema (just JSON.stringify(baked.schema)) to reconstruct the Reader. A serialize() / deserialize() pair is on the roadmap.

Is this actually faster than V8's JIT? Yes, but not for the reasons you'd think. V8's object JIT is excellent — so the win isn't in per-access speed, it's in cache behaviour, consistent allocation, and GC absence. The hot loop is 2× faster in micro-benchmarks, but the frame-timing consistency is where real games notice the difference.

Roadmap

v1.1: Optional columnar (SoA) mode for workloads that scan one field at a time.
v1.2: String table support (per-column interned strings → U32 index).
v1.3: serialize() / deserialize() for shipping baked data to disk or over the wire.
v2.0: Matrix and normalized-int vertex attributes (for vertex-buffer authoring).

License

MIT

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

@zakkster/lite-bake

The problem

The fix

30-second example

Memory layout — the whole point

Before: JS object graph

After: one contiguous ArrayBuffer

How it compares

Type inference

The canonical hot-loop pattern

Do / Don't

API

bake(records, opts?) → Baked

new Reader(baked)

Edge cases & gotchas

Stride is padded to the largest field's alignment

The buffer byte length is padded up to a multiple of 8

Inference reads every record

Null / undefined / missing fields become 0

Strings are silently ignored

Native endianness is used throughout

Reader field views are lazy only by convention

Benchmarks — and some honest caveats

What's reliable

What's not a dramatic speedup

Where baked access does reliably win

Testing & QA guide

Running the test suite

What the tests cover

Adding your own tests

Manual sanity checks (for reviewers / QA)

Red flags that mean something is wrong

FAQ

Roadmap

License

`bake(records, opts?) → Baked`

`new Reader(baked)`

Null / undefined / missing fields become `0`

`Reader` field views are lazy only by convention