@aiblox/xform

v0.1.0

Published

a month ago

Context optimization engine for LLM-ready structured data

0High
0Medium
0Low

metinsaylan

llm context optimization json toon token-efficiency

@aiblox/xform

Context optimization engine for LLM-ready structured data. Converts JSON, XML, CSV, and TSV into compact, high-signal output formats.

This is not a general file-format library — it scans, reduces, transforms, and describes data specifically for AI context windows.

Install

npm install @aiblox/xform

Quick start

import { xform } from '@aiblox/xform';

const largeJsonData = [
  { id: 1, tenant: 'acme', name: 'Ada', status: 'active', note: null },
  { id: 2, tenant: 'acme', name: 'Bob', status: 'active', note: null },
  ...
];

const context = await xform(largeJsonData);

Sample output:

Dataset with {X} records and {Y} columns.
Constant across all records: status="active"; tenant="acme".

Records:
  [X|]{id|name|region|score}:
    1|Ada|us-east|10
    2|Bob|us-east|12
    3|Cora|eu-west|99
    ...

xform is an alias for transform. See USAGE.md for examples of every output format with sample inputs and outputs.

API

| Function | Description | |----------|-------------| | xform(input, options) | Alias for transform — full pipeline → context, json_compact, or toon | | transform(input, options) | Full pipeline → context, json_compact, or toon | | scan(input, options) | Column profiles, types, constants, null ratios | | reduce(input, options) | Remove null-only columns, collapse constants | | describe(input, options) | Concise natural-language data summary | | toJsonCompact(input, options) | Minified JSON with metadata | | toToon(input, options) | TOON-encoded output (tabular-friendly) | | toDSV(data, delimiter) | Delimiter-separated values (records or pipeline result) | | toCSV / toTSV / toPSV | toDSV with ,, tab, or pipe delimiters | | fromJson / fromXml / fromCsv / fromTsv | Parse inputs to record arrays |

Options

interface TransformOptions {
  output?: 'context' | 'json_compact' | 'toon';
  /** TOON field separator: `|`, `,`, tab, or `pipe` / `comma` / `tab`. Default `|`. */
  delimiter?: string;
  schema?: SchemaDefinition[];
  hints?: { groupby?: string[] };
  compact?: boolean;
  preserveOutliers?: boolean;
  includeStats?: boolean;
  format?: 'json' | 'xml' | 'csv' | 'tsv';
}

Schema

Schemas are JSON arrays with name, optional _extends, and nested _type:

const schemas = [
  { name: 'Base', status: 'string' },
  { name: 'User', _extends: 'Base', email: 'string' },
];

await transform(records, { schema: schemas });

Grouping

Record grouping runs only when you pass explicit hints — no fuzzy clustering by default:

await transform(records, {
  hints: { groupby: ['department'] },
});

Pipeline

Scan — column types, null ratios, cheap constant detection
Reduce — drop null-only columns, collapse constants, summarize repeats
Transform — schema-aware normalization
Describe — token-efficient natural language summary

If the final output is longer than the serialized input, results automatically fall back to the original (token safety). Disable with fallbackToOriginal: false.

License

MIT

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

@aiblox/xform

Install

Quick start

API

Options

Schema

Grouping

Pipeline

License