@openai-hce/encode

v1.0.3

Published

4 months ago

HCE (Hierarchical Columnar Encoding) encoder for efficient data compression

0High
0Medium
0Low

saurabhkohli

hce encoding compression columnar hierarchical

@openai-hce/encode

Hierarchical Columnar Encoding (HCE) encoder for compressing structured JSON payloads. Converts JSON arrays into a column-oriented string representation that typically delivers 40–60 % smaller payloads compared to raw JSON, while keeping the layout deterministic for fast decoding.

Installation

npm install @openai-hce/encode

pnpm add @openai-hce/encode
yarn add @openai-hce/encode

Quick start

import { HCEEncoder } from '@openai-hce/encode';

const data = [
	{ type: 'user', id: 1, name: 'Alice', role: 'admin' },
	{ type: 'user', id: 2, name: 'Bob', role: 'user' }
];

const encoder = new HCEEncoder();
const hce = encoder.encode(data, 'users');

console.log(hce);
// users(user)[2]:
//   user(id,name,role)[2]:
//     1,Alice,admin|2,Bob,user

Why HCE

Columnar compression for repeated schemas.
Automatic grouping of similar records.
Fully typed TypeScript API.
Deterministic output suitable for caching.
Zero runtime dependencies.

API overview

`new HCEEncoder(options?: HCEOptions)`

interface HCEOptions {
	fieldDelimiter?: string;      // default: ','
	recordDelimiter?: string;     // default: '|'
	nestedDelimiter?: string;     // default: ';'
	missingValue?: string;        // default: ' '
	flattenNested?: boolean;      // default: true
	typeField?: string;           // default: 'type'
	autoDetectGrouping?: boolean; // default: true
	preferTypeGrouping?: boolean; // default: true
	minGroupSizeForSecondaryGrouping?: number; // default: 5
	schemaUniformityThreshold?: number;        // default: 0.9
}

`encoder.encode(data, rootKey?)`

data: array of objects, or { key: [...] }.
rootKey: optional explicit group name.
Returns an HCE string.

Examples

Type-only grouping is implicit

The encoder never prints by type; type grouping is the default.

const products = {
	products: [
		{ type: 'product', name: 'Laptop', price: 999 },
		{ type: 'product', name: 'Phone', price: 599 }
	]
};

console.log(new HCEEncoder().encode(products));

Output:

products(product)[2]:
	product(name,price)[2]:
		Laptop,999|Phone,599

Notice the absence of by type in the header—type is implicit.

Secondary grouping adds `by {field}`

When a good secondary field (category, role, status, …) exists, the encoder emits the by suffix and suppresses that field inside each group.

const groupedProducts = {
	products: [
		{ type: 'product', category: 'Electronics', name: 'Laptop', price: 999 },
		{ type: 'product', category: 'Electronics', name: 'Phone', price: 599 },
		{ type: 'product', category: 'Books', name: 'JS Guide', price: 39 },
		{ type: 'product', category: 'Books', name: 'TS Handbook', price: 45 }
	]
};

console.log(new HCEEncoder().encode(groupedProducts));

Output:

products(product by category)[4]:
	Electronics(name,price)[2]:
		Laptop,999|Phone,599
	Books(name,price)[2]:
		JS Guide,39|TS Handbook,45

Uniform schemas stay type-only

If every record shares the same shape, the encoder honours preferTypeGrouping and keeps a single type group—even if a secondary field exists.

const uniformUsers = {
	users: [
		{ type: 'user', role: 'admin', name: 'Alice', age: 30 },
		{ type: 'user', role: 'admin', name: 'Bob', age: 25 },
		{ type: 'user', role: 'user', name: 'Charlie', age: 35 }
	]
};

console.log(new HCEEncoder().encode(uniformUsers));

Output:

users(user)[3]:
	user(age,name,role)[3]:
		30,Alice,admin|25,Bob,admin|35,Charlie,user

Multi-type collections list every type

const items = {
	items: [
		{ type: 'book', title: 'HCE Guide', pages: 200 },
		{ type: 'product', name: 'Laptop', price: 999 },
		{ type: 'service', name: 'Consulting', rate: 150 }
	]
};

console.log(new HCEEncoder().encode(items));

Output:

items(book,product,service)[3]:
	book(pages,title)[1]:
		200,HCE Guide
	product(name,price)[1]:
		Laptop,999
	service(name,rate)[1]:
		Consulting,150

Edge case: single-valued secondary field

const adminsOnly = {
	users: [
		{ type: 'user', role: 'admin', name: 'Alice' },
		{ type: 'user', role: 'admin', name: 'Bob' }
	]
};

console.log(new HCEEncoder().encode(adminsOnly));

Output:

users(user)[2]:
	user(name,role)[2]:
		Alice,admin|Bob,admin

The encoder keeps type-only grouping because role has only one value.

Edge case: missing type field

Objects without a type field still encode safely—the encoder collapses the output to a single header and preserves your original schema.

const unnamed = {
	products: [
		{ name: 'Laptop', price: 999 },
		{ name: 'Phone', price: 599 }
	]
};

console.log(new HCEEncoder().encode(unnamed));

Output:

products(name,price)[2]:
    Laptop,999|Phone,599

Nested objects and arrays

const posts = [
	{
		type: 'post',
		id: 1,
		title: 'Hello World',
		author: { name: 'Alice', team: 'Platform' },
		tags: ['intro', 'hce']
	},
	{
		type: 'post',
		id: 2,
		title: 'Encoder Tips',
		author: { name: 'Bob', team: 'SDK' },
		tags: ['guide']
	}
];

console.log(new HCEEncoder().encode(posts, 'posts'));

Output:

posts(post)[2]:
	post(id,title,.author,.tags)[2]:
		1,'Hello World'|2,'Encoder Tips'
		.author(name,team)[2]:
			Alice,Platform|Bob,SDK
		.tags: intro;hce|guide

Custom delimiters

const encoder = new HCEEncoder({
	fieldDelimiter: '\t',
	recordDelimiter: '\n',
	nestedDelimiter: ',',
});

console.log(encoder.encode([{ type: 'row', id: 1, value: 'A' }], 'rows'));

Output:

rows(row)[1]:
	row(id\tvalue)[1]:
		1\tA

Tips

Provide { users: [...] } if you want the root key to match an existing JSON property automatically.
Disable autoDetectGrouping when deterministic single-group output is required.
Decode using @openai-hce/decode for round-trip conversions.

Grouping configuration cheat sheet

| Option | Default | Effect | |--------|---------|--------| | autoDetectGrouping | true | Finds the best grouping field automatically. Set false to force type-only grouping. | | preferTypeGrouping | true | Keeps uniform data in a single type group. Set false to allow aggressive secondary grouping. | | schemaUniformityThreshold | 0.9 | Minimum proportion of records sharing the same schema before the encoder prefers type-only grouping. | | minGroupSizeForSecondaryGrouping | 5 | Average group size needed to justify a secondary field split. Lower it to accept smaller groups. | | typeField | 'type' | Name of the discriminator field. Change when the source data uses kind, category, etc. |

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

@openai-hce/encode

Installation

Quick start

Why HCE

API overview

new HCEEncoder(options?: HCEOptions)

encoder.encode(data, rootKey?)

Examples

Type-only grouping is implicit

Secondary grouping adds by {field}

Uniform schemas stay type-only

Multi-type collections list every type

Edge case: single-valued secondary field

Edge case: missing type field

Nested objects and arrays

Custom delimiters

Tips

Grouping configuration cheat sheet

License

`new HCEEncoder(options?: HCEOptions)`

`encoder.encode(data, rootKey?)`

Secondary grouping adds `by {field}`