lzo1x
v1.0.1
Published
Pure-TypeScript clean-room implementation of LZO1X-1 compression and decompression, isomorphic (Node + browser)
Downloads
960
Maintainers
Readme
lzo1x
Pure-TypeScript, MIT-licensed, clean-room implementation of LZO1X-1 compression and decompression. Isomorphic (Node + modern browsers), zero runtime dependencies, ESM-only.
Why this over lzo / miniLZO?
The canonical option for LZO in Node is the lzo npm package, which wraps the C miniLZO library via node-gyp. It's fast and battle-tested — but:
- License. miniLZO (and the
lzobinding) is GPLv2. That is a hard non-starter for a lot of commercial codebases. This package is MIT, clean-room from the public LZO1X-1 stream spec. - Runtime reach. Native bindings only work on Node, only on platforms with a working compiler toolchain, and only after a successful
node-gypbuild. This package is pure TypeScript — runs in browsers, Cloudflare Workers, Deno, Bun, Electron renderers, and any Node version without rebuilds. - Install footprint. Zero runtime dependencies, no compile step, no postinstall script, no prebuilt-binary download dance. Just code.
- Types. Ships its own
.d.ts. No@types/lzoshim required. - Conformity. Every release is cross-validated against the actual miniLZO binding over ~2050 payloads in both directions (see Testing). If
lzoaccepts it, so do we; if we produce it,lzodecompresses it.
When you should pick lzo instead: you're Node-only, GPL is fine, you need LZO1X-999 / LZO1Y / LZO1Z, or you're moving multi-GB/s of data and the ~10× speed gap of native C over JS matters more than reach.
Install
pnpm add lzo1xAPI
import { lzo1xCompress, lzo1xDecompress } from 'lzo1x';
const compressed = lzo1xCompress(input); // Uint8Array → Uint8Array
const restored = lzo1xDecompress(compressed); // dynamic-grow
const restored2 = lzo1xDecompress(compressed, input.length); // pre-sized, throws RangeError on mismatchThat is the entire library. No streaming, no async, no LZO1Y/LZO1Z, no LZO1X-999.
lzo1xCompress always produces output ≤ input.length + ceil(input.length / 16) + 67 bytes (the published LZO1X worst case).
lzo1xDecompress throws RangeError on truncated/corrupt input or on expectedOutputLength mismatch.
LZO1X-1 stream format — one-pager
The stream is a sequence of (literal-run, match) pairs, driven by a single token byte per match. After the last match the stream is terminated by an M4 end marker (0x11 0x00 0x00).
Token byte layout
The token's high bits select the encoding family:
| Token range | Family | Encoding |
| --------------- | -------------------------------------------------------------------- | -------- |
| 0..15 | (after-match) literal-only top-up — see below |
| 0..15 (first) | First-frame long literal — t < 16 triggers extended literal-length |
| 16..63 | M4 — long match (≥ 9 bytes from a far distance) |
| 64..127 | M1 — 2-byte literal-distance match, len = 3..4 |
| 128..191 | M2 — short match, len = 3..4, distance ≤ 2048 |
| 192..255 | M3 — len 3..8, distance ≤ 2048 |
(The "M1..M4" naming is the canonical LZO1X terminology.)
Length / distance encoding ladder
When a length field's bits in the token are zero, the actual length is encoded by a run of 0x00 bytes (each contributes 255) followed by a non-zero terminator. The same trick is used for literal-run length after a match (low 2 bits of the previous token), and for match length on M3/M4.
"State" — the literal-run-after-match path
The low 2 bits of every match token (state) carry the number of literal bytes (0..3) that immediately follow the match without their own token. When state == 0 the next byte starts a fresh token; when state > 0 those literals are copied raw and the byte right after is the next match's token.
End marker
The decoder MUST see exactly 0x11 0x00 0x00 (token = M4 with len-bits = 1, then two zero distance bytes — interpreted by the decoder as "stop"). Anything after is rejected.
First-frame quirk
The very first token has no preceding match. If it is < 16 it encodes the leading literal run directly (with the length ladder for t == 0). If >= 16 it is a normal match token (rare in practice).
Implementation notes
- Compressor is the LZO1X-1 variant (canonical "fast" mode): a single-pass greedy matcher with a 13-bit (8192-slot) hash table keyed on 4 input bytes. This is what the spec calls a "64 KB working set" — the 64 KB is the match-distance window, not the table size.
- The hash function is
((b[i]*2654435761) >>> (32-13)) & 0x1FFF(Knuth multiplicative hash). - Minimum match length is 3 bytes. Below that, we emit literals.
- The "trailing literals" rule: the last
M2_MAX_LEN + 5(≈ 20) bytes of input are always emitted as literals, never as the tail of a match. This keeps the decoder's wildcopy safe.
Performance
On a typical developer laptop, both directions run at roughly 400-500 MB/s on cache-warm 64 KB buffers. There is no SIMD path; the inner loops are byte-at-a-time Uint8Array reads. Most callers will be I/O-bound or matcher-bound on cold inputs long before they hit the JS interpreter ceiling.
Browser support
Pure TypeScript, zero runtime dependencies, no DOM/Node-only APIs. Runs anywhere Uint8Array does.
Testing
Five test streets under src/__tests__/:
format.test.ts— Hand-crafted inputs that exercise every M1/M2/M3/M4 path and the length ladders.roundtrip.test.ts— Deterministic-RNG inputs at 1, 16, 256, 4096, 65535, 131072 bytes;decompress(compress(x)) === x.oracle-minilzo.test.ts— Cross-validates against the nativelzonpm binding (miniLZO) over ~2050 payloads in both directions: random-LCG sweeps across 9 size bands, whitened high-entropy inputs, single-byte runs for every byte value, short repeating patterns, small-alphabet text-like data, sparse-zeros mixes, and hand-picked inputs that force decoder-only token paths. Runs in CI on every push; locally self-skips if the binding fails to build.captured-frames.test.ts— Real on-the-wire LZO frames captured from a Niimbot B2 Pro printer over BLE. Self-skips if the research path is absent.api.test.ts— Error semantics, worst-case size bound.
Prior art
Other browser-capable peers: lzo-wasm (WASM, decompress-only, BSD-2 wrapping LGPL-3 FFmpeg code) and lzo-ts (TS port of minilzo-js, GPL-3.0, compress + decompress).
Licence
MIT — see LICENSE.
