@zakkster/lite-batch-buffer

v1.0.0

Published

16 days ago

Pre-allocated, zero-GC interleaved vertex buffer for WebGL 1/2 and WebGPU sprite/tile/quad batchers. One allocation for the lifetime of the renderer.

@zakkster/lite-batch-buffer

Pre-allocated, zero-GC interleaved vertex buffer for WebGL 1 / 2 and WebGPU sprite / tile / quad batchers.

One allocation for the lifetime of the renderer. No per-frame new Float32Array. No {x, y, u, v} object graphs. No garbage-collection pauses in your draw loop.

import { BatchBuffer } from '@zakkster/lite-batch-buffer';

const vb = new BatchBuffer({
  maxVertices: 40_000,
  layout: [
    { name: 'pos',   type: 'f32', size: 2 },
    { name: 'uv',    type: 'f32', size: 2 },
    { name: 'color', type: 'u32', size: 1 },   // packed RGBA
  ],
});

// Hoist views + offsets ONCE, use them for the lifetime of the renderer.
const f32 = vb.f32, u32 = vb.u32;
const s = vb.strideF32;
const P = vb.offsetF32('pos'), U = vb.offsetF32('uv'), C = vb.offsetU32('color');

// Per frame:
vb.reset();
vb.ensureCapacity(quadCount * 6);
let n = vb.count;                             // hoist to local for the hot loop
for (let i = 0; i < quadCount; i++) {
  const o = n * s;
  f32[o + P] = x;   f32[o + P + 1] = y;
  f32[o + U] = u;   f32[o + U + 1] = v;
  u32[o + C] = BatchBuffer.packRGBA(255, 255, 255, 255);
  n++;
  // ... 5 more verts for the quad ...
}
vb.count = n;

gl.bufferSubData(gl.ARRAY_BUFFER, 0, vb.u8, 0, vb.byteLength);

Why

JavaScript graphics code has a distinctive failure mode: per-frame allocation. It looks like this, and it's what you write first:

// The code you write first, and regret later
function drawFrame() {
  const verts = [];
  for (const tile of visibleTiles) {
    verts.push({ x: tile.x, y: tile.y, u: tile.u, v: tile.v, color: tile.color });
    // ... 5 more per quad
  }
  const flat = new Float32Array(verts.length * 5);
  for (let i = 0; i < verts.length; i++) { /* flatten */ }
  gl.bufferData(gl.ARRAY_BUFFER, flat, gl.STREAM_DRAW);
}

Each frame this produces tens of thousands of short-lived objects and a fresh ArrayBuffer. The math is cheap — the allocation is what destroys your frame budget. Major GC pauses turn smooth 60 fps into periodic 30 ms stutters.

flowchart LR
    subgraph N["Naive path"]
        direction TB
        N1[per-frame allocation<br/>new Array / new TypedArray]
        N2[populate objects or slots]
        N3[flatten to TypedArray]
        N4[upload]
        N5[objects + buffer<br/>become garbage]
        N1 --> N2 --> N3 --> N4 --> N5 -.->|GC pressure<br/>frame stalls| N1
    end
    subgraph B["BatchBuffer path"]
        direction TB
        B0[one allocation<br/>at renderer init]
        B1[reset count = 0]
        B2[write into views<br/>indexed stores only]
        B3[upload vb.u8<br/>zero alloc]
        B0 -.->|reused forever| B1
        B1 --> B2 --> B3 -.->|no garbage| B1
    end

@zakkster/lite-batch-buffer owns the pre-allocated buffer and exposes every typed-array view over it (f32, u32, u16, u8, …) so the vertex-emit code stays a plain indexed-store loop. Nothing fancy. That's the point.

What this is not

Not a renderer. It doesn't know about WebGL contexts, shaders, or draw calls.
Not a scene graph. No sprites, no transforms, no culling.
Not magic. A hand-rolled Float32Array you manage yourself is ~2× faster (see benchmarks). This library trades that for layout hygiene + endian-correct color packing + capacity management in ~120 lines of code.

Install

npm i @zakkster/lite-batch-buffer

ESM-only. No dependencies. Ships TypeScript definitions alongside the source.

import { BatchBuffer } from '@zakkster/lite-batch-buffer';
// or: import BatchBuffer from '@zakkster/lite-batch-buffer';

You can also drop src/index.js into your project directly — it's one file.

Quick start

const vb = new BatchBuffer({
  maxVertices: 4096,
  layout: [
    { name: 'pos',   type: 'f32', size: 2 },  // vec2 position
    { name: 'uv',    type: 'f32', size: 2 },  // vec2 UV
    { name: 'color', type: 'u32', size: 1 },  // packed RGBA, 1 u32
  ],
});

// WebGL 2 VAO setup (done once):
const vbo = gl.createBuffer();
gl.bindBuffer(gl.ARRAY_BUFFER, vbo);
gl.bufferData(gl.ARRAY_BUFFER, vb.arrayBuffer.byteLength, gl.DYNAMIC_DRAW);

gl.enableVertexAttribArray(0);
gl.vertexAttribPointer(0, 2, gl.FLOAT,         false, vb.stride, 0);
gl.enableVertexAttribArray(1);
gl.vertexAttribPointer(1, 2, gl.FLOAT,         false, vb.stride, 8);
gl.enableVertexAttribArray(2);
gl.vertexAttribPointer(2, 4, gl.UNSIGNED_BYTE, true,  vb.stride, 16);   // normalized

// Per-frame:
function renderFrame(sprites) {
  vb.reset();
  vb.ensureCapacity(sprites.length * 6);

  const f32 = vb.f32, u32 = vb.u32;
  const s = vb.strideF32;
  const P = vb.offsetF32('pos'), U = vb.offsetF32('uv'), C = vb.offsetU32('color');
  let n = vb.count;

  for (const spr of sprites) {
    /* write 6 verts, n += 6 */
  }

  vb.count = n;
  gl.bindBuffer(gl.ARRAY_BUFFER, vbo);
  gl.bufferSubData(gl.ARRAY_BUFFER, 0, vb.u8, 0, vb.byteLength);
  gl.drawArrays(gl.TRIANGLES, 0, vb.count);
}

How it works

Memory layout

BatchBuffer allocates a single ArrayBuffer sized to stride × maxVertices (rounded up 8 bytes for safe typed-array aliasing). Each attribute aligns to its own element size; total stride is padded to max alignment so strideF32 = stride / 4 is always exact.

flowchart TB
    subgraph BB["ArrayBuffer — one allocation, reused forever"]
        direction LR
        V0["Vertex 0<br/>pos.x | pos.y | uv.x | uv.y | color<br/>4 + 4 + 4 + 4 + 4 = 20 B"]
        V1["Vertex 1<br/>20 B"]
        V2["Vertex 2"]
        V3["Vertex N−1"]
        V0 --- V1 --- V2 --- V3
    end
    F32["vb.f32 : Float32Array<br/>reads pos + uv"] -.-> BB
    U32["vb.u32 : Uint32Array<br/>reads color"] -.-> BB
    U8["vb.u8 : Uint8Array<br/>used for upload"] -.-> BB

All typed-array views point at the same backing buffer. A write through vb.f32[0] is immediately visible through vb.u8[0..3]. This is why the hot loop can write pos as two floats and color as a packed u32 — they're interleaved in memory exactly as the GPU expects.

The canonical hot loop

sequenceDiagram
    participant App
    participant VB as BatchBuffer
    participant GL as GL / GPU

    Note over App,GL: Renderer init (once)
    App->>VB: new BatchBuffer({ maxVertices, layout })
    VB-->>App: hoist f32, u32, stride, offsets as locals
    App->>GL: bufferData(null, DYNAMIC_DRAW)

    loop Every frame
        App->>VB: reset()
        App->>VB: ensureCapacity(n)
        Note over App: let c = vb.count
        loop Per vertex
            App->>VB: f32[c*s + P] = x, ...
            App->>VB: u32[c*s + C] = rgba
            Note over App: c++
        end
        App->>VB: vb.count = c
        App->>GL: bufferSubData(0, vb.u8, 0, vb.byteLength)
        App->>GL: drawArrays(..., vb.count)
    end

Why hoist `vb.count` to a local?

The inline pattern (vb.count++ inside the loop) is 20–30% slower than hoisting the counter into a let:

| Pattern | 40k verts/frame | Notes | |---|---|---| | vb.count++ inline | ~0.21 ms | simpler, perfectly fine for small loops | | Hoisted let n = vb.count | ~0.16 ms | recommended for tight inner loops |

The JITs are good at optimising indexed typed-array access but less good at property access through this. Moving count to a local lets the register allocator do its job.

Case study: a Tiled tilemap renderer

We rendered the same 64 × 64 Tiled tilemap (two layers: ground + decoration ≈ 8 000 visible tiles → ~49 000 vertices per frame) three ways, on the same WebGL 2 pipeline. The only change is the vertex-emit function — the GL state, shaders, texture, and draw call are identical.

You can run this live: open example/tilemap-demo.html and toggle between modes in the sidebar.

Map format

Stock Tiled JSON schema:

{
  "width": 64, "height": 64,
  "tilewidth": 32, "tileheight": 32,
  "layers": [
    { "type": "tilelayer", "name": "ground",     "data": [/* 4096 gids */] },
    { "type": "tilelayer", "name": "decoration", "data": [/* 4096 gids */] }
  ],
  "tilesets": [{ "firstgid": 1, "columns": 8, "tilewidth": 32, "tileheight": 32 }]
}

data is a flat array of GIDs (1-indexed, 0 = empty). The demo parses this exactly as Tiled exports it.

The three renderers

1. Array-of-objects (the first draft):

function emitAoO(tiles) {
  const verts = [];
  for (const t of tiles) {
    verts.push({ x: t.x, y: t.y, u: t.u0, v: t.v0, color: 0xffffffff });
    // ... 5 more per quad
  }
  const buf = new ArrayBuffer(verts.length * 20);
  const f32 = new Float32Array(buf), u32 = new Uint32Array(buf);
  for (let i = 0; i < verts.length; i++) { /* flatten */ }
  gl.bufferSubData(gl.ARRAY_BUFFER, 0, new Uint8Array(buf));
}

2. Naive typed array (fresh per frame):

function emitNaive(tiles) {
  const buf = new ArrayBuffer(tiles.length * 6 * 20);
  const f32 = new Float32Array(buf), u32 = new Uint32Array(buf);
  /* fill in loop */
  gl.bufferSubData(gl.ARRAY_BUFFER, 0, new Uint8Array(buf));
}

3. BatchBuffer:

const vb = new BatchBuffer({ maxVertices: MAX_TILES * 6, layout: LAYOUT });
const f32 = vb.f32, u32 = vb.u32;
const s = vb.strideF32, P = vb.offsetF32('pos'), U = vb.offsetF32('uv'), C = vb.offsetU32('color');

function emitBatched(tiles) {
  vb.reset();
  vb.ensureCapacity(tiles.length * 6);
  let n = vb.count;
  for (const t of tiles) { /* write 6 verts, n += 6 */ }
  vb.count = n;
  gl.bufferSubData(gl.ARRAY_BUFFER, 0, vb.u8, 0, vb.byteLength);
}

Results

Measured on Node 20 / M2 class, median of 5 runs × 120 frames @ 40 000 vertices/frame. Re-run npm run bench to get numbers on your own hardware — the ratios are stable across platforms.

| # | Strategy | ms/frame | MVerts/s | Peak heap Δ | vs best | |---|---|---:|---:|---:|---:| | A | BatchBuffer (inline vb.count++) | 0.205 | 195 | 3.4 KB | 2.28× | | A′ | BatchBuffer (hoisted local count) | 0.157 | 254 | 432 B | 1.75× | | B | Plain typed-array (hoisted, reused) | 0.090 | 445 | 432 B | — | | C | Fresh typed-array per frame | 0.484 | 82.7 | 10 KB | 5.38× | | D | Array-of-objects + flatten | 5.040 | 7.94 | 33.6 MB | 56.1× |

Allocation pressure — log scale

%%{init: {"theme":"dark"}}%%
xychart-beta
    title "Peak heap growth per run (KB, log scale) — lower is better"
    x-axis ["A inline", "A' hoisted", "B plain TA", "C fresh TA", "D AoO"]
    y-axis "KB (log)" 0.1 --> 100000
    bar [3.4, 0.4, 0.4, 10, 34000]

D allocates 33.6 MB per frame. At 60 fps that's ~2 GB/sec of garbage — the GC can't keep up, and you see stutter.

Frame budget

%%{init: {"theme":"dark"}}%%
xychart-beta
    title "ms per frame at 40k verts — lower is better"
    x-axis ["A inline", "A' hoisted", "B plain TA", "C fresh TA", "D AoO"]
    y-axis "ms" 0 --> 6
    bar [0.21, 0.16, 0.09, 0.48, 5.04]

A 60 fps frame is 16.67 ms. Both A and A′ consume ~1 %; D burns 30 % before any game logic runs.

When it matters

| Scenario | Verts/frame | Without @zakkster/lite-batch-buffer | With A′ | |---|---:|---|---| | Menu UI (20 sprites) | ~120 | irrelevant | irrelevant | | Platformer (300 sprites) | ~1 800 | usually fine | fine | | Tiled scroller (8k tiles) | ~49k | 5 ms + GC stutter | ~0.2 ms | | Particle system (50k) | ~150k | 15+ ms · GC storm | ~0.5 ms | | Bullet hell (100k) | ~300k | off the budget | ~1.0 ms |

Rule of thumb: once your per-frame vertex count passes ~10 000, the allocation profile of your emit loop matters more than its ALU cost.

API reference

`new BatchBuffer({ maxVertices, layout })`

| Arg | Type | Description | |---|---|---| | maxVertices | number | Hard cap. Sets the backing ArrayBuffer size. | | layout | LayoutAttribute[] | Ordered list of attributes. |

`LayoutAttribute`

| Field | Type | Description | |---|---|---| | name | string | Used by offset*() lookups. | | type | 'f32' \| 'i32' \| 'u32' \| 'i16' \| 'u16' \| 'i8' \| 'u8' | Element type. | | size | number | Element count (2 for vec2, 1 for a packed u32 color). Must be a positive integer. |

Instance members

| Member | Type | Description | |---|---|---| | stride | number | Bytes per vertex (padded to max alignment). | | strideF32, strideU32, strideU16 | number | Stride in those element units. | | maxVertices | number | As passed. | | count | number | Writable vertex cursor. Advance inline from the hot loop. | | attrs | ResolvedAttribute[] | Layout with filled-in byte offsets. | | arrayBuffer | ArrayBuffer | Single backing allocation. Never reallocated. | | f32 i32 u32 i16 u16 i8 u8 dv | Typed-array / DataView | All view the same arrayBuffer. | | capacity | number | Alias for maxVertices. | | remaining | number | maxVertices − count. | | byteLength | number | count × stride. Pass this to bufferSubData. |

Methods

| Method | Returns | Description | |---|---|---| | offsetBytes(name) | number | Byte offset inside a vertex. Works for any type. | | offsetF32(name) / offsetI32 / offsetU32 | number | Typed offset. Throws on type mismatch. | | offsetU16(name) / offsetI16 / offsetU8 / offsetI8 | number | Same, for smaller element types. | | ensureCapacity(n) | void | Throws if count + n > maxVertices. Call at batch boundaries. | | reset() | void | Sets count = 0. Does not zero the backing buffer. | | viewBytes() | Uint8Array | WebGL 1 helper: subarray view of the written prefix. Allocates ~80 B. | | static packRGBA(r, g, b, a) | number | Pack four 0–255 bytes into a u32 with platform-correct byte order. |

Error conditions

| Situation | Throws | |---|---| | Missing maxVertices or empty layout | { maxVertices, layout } required | | Unknown attribute type | unknown type 'X' | | Non-positive / non-integer attribute size | size must be positive integer | | offsetF32('x') where x is declared u32 | 'x' is u32, not f32 | | offsetBytes('nope') | unknown attribute 'nope' | | ensureCapacity(n) overflow | capacity exceeded (C + N > MAX) |

Benchmarks

Node CLI

node --expose-gc bench/bench.js
# or: npm run bench

Runs all five strategies, prints a formatted table, writes bench/bench-results.json. --expose-gc is required for accurate heap numbers.

Browser (interactive)

Open bench/bench.html in any modern browser. It runs the same strategies live and plots ms/frame and peak heap Δ as bar charts. You can change verts/frame and frame count from the controls at the top right.

Heap numbers are only meaningful on Chromium-based browsers (they expose performance.memory). The time measurements are reliable everywhere.

Testing (for clients & QA)

Three levels of verification, depending on how deep you want to go.

1. Unit tests — "does the library do what it says?"

npm test
# or: node --expose-gc test/edge-cases.test.js

Runs 34 deterministic assertions covering:

| Group | What's tested | |---|---| | Construction + validation | missing args, zero / negative / non-integer sizes, unknown types, empty layout | | Stride + offset arithmetic | mixed alignment (f32 + u8 + f32), u16-after-u8 padding, u8-only layouts, type-checked offsets | | Typed-array aliasing | f32 / u32 / u8 views all read/write the same bytes | | Capacity + lifecycle | ensureCapacity boundary conditions, reset() preserves buffer identity | | packRGBA | endian-correct memory layout, channel clamping, edge values | | Zero-allocation guarantee | 600 000 vertex writes → heap grows < 1 MB (requires --expose-gc) | | Realistic round-trip | emit a quad, read bytes back, reuse buffer across 100 frames |

A clean run ends with 34 passed, 0 failed and exit code 0. Any failure prints the assertion plus the expected/actual values. Suitable for CI.

2. Benchmark — "does it perform as claimed?"

npm run bench

Reproduces the five-strategy comparison in the case study. Exit code is always 0; the failing signal is the numbers. On any 2021+ machine you should see:

A′ (hoisted BatchBuffer) within 2× of B (hand-rolled typed array)
D (array-of-objects) at least 20× slower than B and producing tens of MB of heap growth per run
All three "managed" strategies (A, A′, B) producing < 20 KB of heap growth per run

3. Visual smoke test — "does it render correctly?"

# Just open the file; no build step.
open example/tilemap-demo.html

A 64 × 64 Tiled tilemap rendered in WebGL 2. In the sidebar, toggle between BatchBuffer, Naive, and Array-of-objects modes — the rendered output must be pixel-identical across all three. If it isn't, the vertex layout is wrong somewhere; that's the QA signal.

On a mid-range 2022 laptop, moving between modes you should see roughly:

| Mode | FPS | CPU ms | Heap delta / 60 frames | |---|---:|---:|---:| | BatchBuffer | 60 (capped) | < 1 ms | < 10 KB | | Naive | 58–60 | 2–3 ms | 1–3 MB | | Array-of-objects | 30–45 (stutter) | 8–15 ms | 20–60 MB |

Heap stats only appear in Chromium browsers (which expose performance.memory).

Quick `npm run` reference

| Command | What it does | |---|---| | npm test | Run the 34-test unit suite | | npm run bench | Run the Node benchmark, write bench/bench-results.json | | npm run verify | npm test && npm run bench — the full CI-style check | | npm run bench:browser | Prints the path to open in your browser | | npm run demo | Prints the path to the Tiled demo |

Running the demo

example/tilemap-demo.html

No build step. No server needed if you run the whole repo over file:// — the demo uses a relative ESM import from ../src/index.js.

Controls:

| Input | Action | |---|---| | Left-click + drag | Pan camera | | W A S D / arrow keys | Pan camera | | Zoom slider | 0.5× – 4× | | Mode buttons | Switch vertex-emit strategy live |

The procedural tileset is generated on a <canvas> at startup, so the demo is fully self-contained — no external image assets.

Browser & engine compatibility

The library itself is plain ESM and uses only standard ArrayBuffer / typed-array APIs, so it works everywhere ES2015+ works. The example demo additionally needs WebGL 2 (for VAOs and #version 300 es shaders).

| Target | Library | Demo (WebGL 2) | |---|---|---| | Chrome / Edge 61+ | ✅ | ✅ | | Firefox 60+ | ✅ | ✅ | | Safari 15+ (iOS 15+) | ✅ | ✅ | | Node.js 18+ | ✅ | — (browser only) | | Bun / Deno | ✅ | — | | WebGL 1 projects | ✅ (use viewBytes()) | — | | WebGPU projects | ✅ (use vb.arrayBuffer / vb.u8) | — |

Integration snippets

WebGL 2 — upload with zero allocations:

gl.bufferSubData(gl.ARRAY_BUFFER, 0, vb.u8, 0, vb.byteLength);

WebGL 1 — the 3-arg bufferSubData signature requires a Uint8Array view, which allocates ~80 B:

gl.bufferSubData(gl.ARRAY_BUFFER, 0, vb.viewBytes());

WebGPU — map the GPU buffer, copy from vb.u8:

queue.writeBuffer(gpuBuf, 0, vb.u8, 0, vb.byteLength);

Edge cases & guarantees

Behaviours the test suite pins down:

Stride is always a multiple of 4 for any layout that contains an f32, i32, or u32. That means strideF32 and strideU32 are always exact integers — you can index with count * strideF32 safely.
u8-only layouts work too. A maxVertices: 10, layout: [{ type: 'u8', size: 3 }] buffer has stride = 3, but the backing ArrayBuffer is padded up to 8 bytes internally so every typed-array view can still be constructed. byteLength reflects the unpadded count × stride, so your GL upload size is correct.
reset() does not zero the backing memory. Old vertex data is still there until overwritten. This is intentional — zeroing 320 KB every frame would defeat the point. Only bytes 0..byteLength are ever uploaded.
count is public and writable. You can set it directly (e.g. vb.count += 6 after writing a quad in one go), or pre-seed it for layered writes. The only invariant the library checks is count ≤ maxVertices (via ensureCapacity).
All typed-array views alias the same backing buffer. A write through vb.f32[0] is immediately visible through vb.u8[0..3], vb.u16[0..1], and vb.dv.getUint32(0, …).
packRGBA is endian-safe. On both little-endian (x86, ARM default) and big-endian platforms, the bytes land in memory as R, G, B, A — so GPU vec4 color attributes receive channels in the expected order regardless of host.
The library throws on misuse at construction time or batch boundary, never per vertex. The hot loop does no validation. If you write past maxVertices, you'll get a typed-array out-of-bounds write (silent no-op in non-strict mode, throws in strict mode). Always pair your emit loop with a preceding ensureCapacity.

FAQ

Why a hard maxVertices cap? Why not auto-grow? Because ArrayBuffer resize would reallocate, re-create every typed-array view, and invalidate the GL buffer you bound to it. The whole point is that none of that ever happens. Pick a number large enough for your worst frame; you pay ~N bytes of RAM, once.

How big should maxVertices be? Enough for the largest batch you'll flush at once. For a tilemap, that's visibleTiles × 6 (two tris per tile). For a sprite batcher, it's however many sprites you flush per draw call. A 64k-vertex buffer costs 1.3 MB at a 20-byte stride — cheap.

Can I have multiple buffers? Yes. Make one per layout / draw call if you want. They're independent objects.

Why not use DataView? You can — it's exposed as vb.dv. But typed-array indexed access is faster than DataView.setFloat32(…) on every modern engine and produces far tighter JIT output. Use DataView when you need explicit endianness for non-color data.

Why packRGBA instead of four u8 writes? One u32 store is ~4× faster than four u8 stores (one instruction vs four, plus better cache behaviour). You can still bind the attribute as 4 × UNSIGNED_BYTE normalized in GL — the shader sees vec4 either way.

What about indexed (ELEMENT_ARRAY) rendering? Use a second BatchBuffer with { type: 'u16' (or 'u32'), size: 1 }. The library makes no assumptions about whether the buffer is used as ARRAY_BUFFER or ELEMENT_ARRAY_BUFFER.

Does it work in a Web Worker? Yes. ArrayBuffer + typed arrays are the core of Transferable. You can build vertex data in a worker and postMessage(vb.u8, [vb.arrayBuffer]) to transfer ownership to the main thread — but note that transfers neuter the original, so you'll need a new buffer afterwards. For zero-copy two-way sharing use SharedArrayBuffer (you'll need cross-origin isolation headers).

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

@zakkster/lite-batch-buffer

Contents

Why

What this is not

Install

Quick start

How it works

Memory layout

The canonical hot loop

Why hoist vb.count to a local?

Case study: a Tiled tilemap renderer

Map format

The three renderers

Results

Allocation pressure — log scale

Frame budget

When it matters

API reference

new BatchBuffer({ maxVertices, layout })

LayoutAttribute

Instance members

Methods

Error conditions

Benchmarks

Node CLI

Browser (interactive)

Testing (for clients & QA)

1. Unit tests — "does the library do what it says?"

2. Benchmark — "does it perform as claimed?"

3. Visual smoke test — "does it render correctly?"

Quick npm run reference

Running the demo

Browser & engine compatibility

Integration snippets

Edge cases & guarantees

FAQ

License

Why hoist `vb.count` to a local?

`new BatchBuffer({ maxVertices, layout })`

`LayoutAttribute`

Quick `npm run` reference