@aiblox/xform
v0.1.0
Published
Context optimization engine for LLM-ready structured data
Maintainers
Readme
@aiblox/xform
Context optimization engine for LLM-ready structured data. Converts JSON, XML, CSV, and TSV into compact, high-signal output formats.
This is not a general file-format library — it scans, reduces, transforms, and describes data specifically for AI context windows.
Install
npm install @aiblox/xformQuick start
import { xform } from '@aiblox/xform';
const largeJsonData = [
{ id: 1, tenant: 'acme', name: 'Ada', status: 'active', note: null },
{ id: 2, tenant: 'acme', name: 'Bob', status: 'active', note: null },
...
];
const context = await xform(largeJsonData);Sample output:
Dataset with {X} records and {Y} columns.
Constant across all records: status="active"; tenant="acme".
Records:
[X|]{id|name|region|score}:
1|Ada|us-east|10
2|Bob|us-east|12
3|Cora|eu-west|99
...xform is an alias for transform. See USAGE.md for examples of every output format with sample inputs and outputs.
API
| Function | Description |
|----------|-------------|
| xform(input, options) | Alias for transform — full pipeline → context, json_compact, or toon |
| transform(input, options) | Full pipeline → context, json_compact, or toon |
| scan(input, options) | Column profiles, types, constants, null ratios |
| reduce(input, options) | Remove null-only columns, collapse constants |
| describe(input, options) | Concise natural-language data summary |
| toJsonCompact(input, options) | Minified JSON with metadata |
| toToon(input, options) | TOON-encoded output (tabular-friendly) |
| toDSV(data, delimiter) | Delimiter-separated values (records or pipeline result) |
| toCSV / toTSV / toPSV | toDSV with ,, tab, or pipe delimiters |
| fromJson / fromXml / fromCsv / fromTsv | Parse inputs to record arrays |
Options
interface TransformOptions {
output?: 'context' | 'json_compact' | 'toon';
/** TOON field separator: `|`, `,`, tab, or `pipe` / `comma` / `tab`. Default `|`. */
delimiter?: string;
schema?: SchemaDefinition[];
hints?: { groupby?: string[] };
compact?: boolean;
preserveOutliers?: boolean;
includeStats?: boolean;
format?: 'json' | 'xml' | 'csv' | 'tsv';
}Schema
Schemas are JSON arrays with name, optional _extends, and nested _type:
const schemas = [
{ name: 'Base', status: 'string' },
{ name: 'User', _extends: 'Base', email: 'string' },
];
await transform(records, { schema: schemas });Grouping
Record grouping runs only when you pass explicit hints — no fuzzy clustering by default:
await transform(records, {
hints: { groupby: ['department'] },
});Pipeline
- Scan — column types, null ratios, cheap constant detection
- Reduce — drop null-only columns, collapse constants, summarize repeats
- Transform — schema-aware normalization
- Describe — token-efficient natural language summary
If the final output is longer than the serialized input, results automatically fall back to the original (token safety). Disable with fallbackToOriginal: false.
License
MIT
