data-layout

v1.2.2

Published

3 months ago

Define fields in a layout to unpack buffers into structured objects

0High
0Medium
0Low

no_mad

data-layout

Define fields in a layout to unpack buffers into structured objects.

Installation

npm install data-layout

Basic Usage

const DataLayout = require('data-layout');

const layout = new DataLayout()
    .field('version', { size: 1, type: 'dec-str' })
    // for fixed-size types (UIntN, IntN), you can omit the size:
    .field('size', { type: 'uint16le' })
    .field('payload', {
        size: ({ fields }) => fields['size'].value,
        type: 'utf8',
        then: ({ value }) => value.trim(),
    });

const buffer = Buffer.from([49, 5, 0, 72, 101, 108, 108, 111, 40, 40]);
// "1\x5\x0Hello  "

const struct = layout.unpack(buffer);

console.log(struct.fields.version.value);
// -> 1   (the byte 0x31 is casted as a string "1" and then as the number 1)
console.log(struct.fields.size.value);
// -> 5

// Update 'version' field
struct.updateField('version', 2);
// Fill 'payload' field with binary zeroes
struct.clearField('payload');

// Repack the buffer, now modified
const newbuffer = struct.repack();

// Export the DataLayout declared
const json = layout.toJSON();

// Import a JSON object as a DataLayout
const newlayout = DataLayout.fromJSON(json);

Field Definition

.field(name, options)

Options

| Option | Type | Description | | --------- | -------------------------------- | -------------------------------- | | offset | number | function -> number | Absolute offset | | after | string | function -> string | Offset relative to another field | | size | number | function -> number | Field size in bytes (required)* | | type | string | function -> any | Parsing strategy | | repeat | number | function -> number | Repeat field N times | | then | function | Post-processing hook | | depends | string | function -> string | Conditional parsing dependency |

*Note: For fixed-size types (UIntN, IntN) you can omit the size, and the default size for that type will be used. E.g. a uint16le field will have an inferred size of 2 bytes.

Parsing Result

{
  fields: {
    ...DataLayout_Field: {
        name, offset, size, buffer, value
    }
  },
  global: { (any) }
}

Callbacks for dynamic components of fields

When setting dynamic components of fieds (e.g. for 'size', 'offset', etc.), the parameters received by the callback function you provide are the ones defined in the DataLayout_CallbackParameters class.

The properties, however, are available as the internal sizes/offsets/data of the field are resolved (in this order: depends -> type -> offset/after -> size -> then -> repeat).

| Option | Type | Available For: | Description | | ------------ | ------------ | --------------------------------- | --------------------------------------------- | | fields | Array<any> | All callbacks | The already parsed DataLayout_Field objects | | global | object | All callbacks | The global store | | name | string | All callbacks | The name of the field | | src_buffer | Buffer | All callbacks | The entire source buffer passed for unpacking | | type | string | All callbacks | The type of the field (resolved) | | offset | number | All callbacks | The offset of the field (resolved) | | size | number | The then and repeat callbacks | The size of the field (resolved) | | buffer | Buffer | The then and repeat callbacks | The raw buffer extracted for the field | | value | any | The then and repeat callbacks | The processed data for the field |

This means that some fields (e.g. size, buffer, value) are not available when using dynamic computations for offset for example.

This DOES NOT WORK:

// This fails, because the 'size' property is not available when resolving the field size
.field('a', { size: 2 , type: ({size}) => /*do something with 'size'*/});

This WORKS:

// This work, because the 'size' property is already resolved when handling the value
.field('a', { size: 2 , then: ({size}) => /*do something with 'size'*/});

Built-in Type Parsers

There are some conveninence built-in parsing functions (the most common ones), and can be used by simply passing or returning their names in the 'type' property.

// Returns the 2 bytes as a UInt16-LE
.field('a', { size: 2, type: 'uint16le' });

// Returns the first byte as a UInt8 (the last bytes are ignored)
.field('b', { size: 4, type: 'uint8' });

// Uses all bytes of the field to return the largest possible UInt-LE
.field('c', { size: 3, type: 'uintle-auto'});

.unpack(Buffer.from([1, 1, 2, 2, 2, 2, 3, 3, 3]));
// a: (UInt16-LE)     0x0101      -> 0x0101   -> 257
// b: (UInt8)         0x02020202  -> 0x02     -> 2 (last 3 bytes ignored)
// c: (UInt-LE Auto)  0x030303    -> 0x030303 -> 197379

It is also possible to dynamically select a built-in type parser, by returning the type string:

// This is a base field, it contains the size for the 'payload' field
.field('len', { size: 2 , type: 'uint16le' })

// This field depends on the value passed in field 'size'
.field('payload', {
    // Dynamic size, depending on 'len'
    size: ({fields}) => fields['len'].value,
    // Dynamic parsing, depending on current size
    // If size is 1 byte, return as number
    // else return as buffer
    type: ({size}) => size === 1 ? 'uint8' : 'buffer'
});

However, to simplify the syntax, you could use the dynamic built-in type parsers, that extract always the largest number possible from a field:

.field('payload', {
    size: ({fields}) => fields['len'].value,
    type: 'uintle-auto'
    // Regardless of the field size, the largest UInt-LE will
    // be extracted always (this is the most common scenario)
})

Note: buffer, ascii, and utf8 type parsers always return the longest buffer/string possible

Parsers are just really simple functions, and you can make one, if the built-in ones don't suit your needs.

Custom auto-sized type parsers:

// In this example, the type parser reads the largest string possible,
// while ignoring the last byte
.field('payload', {
    size: ({fields}) => fields['len'].value,
    type: ({buffer, size}) =>
        buffer.subarray(0, size-1).toString('utf8')
})

Integer Types

| Type | Description | | ---------- | --------------------------- | | uint8 | Unsigned 8-bit | | uint16le | Unsigned 16-bit LE | | uint32le | Unsigned 32-bit LE | | uint64le | Unsigned 64-bit LE (BigInt) | | uint16be | Unsigned 16-bit BE | | uint32be | Unsigned 32-bit BE | | uint64be | Unsigned 64-bit BE (BigInt) | | int8 | Signed 8-bit | | int16le | Signed 16-bit LE | | int32le | Signed 32-bit LE | | int64le | Signed 64-bit LE (BigInt) | | int16be | Signed 16-bit BE | | int32be | Signed 32-bit BE | | int64be | Signed 64-bit BE (BigInt) |

Dynamic Integer Types

| Type | Description | | ------------- | ---------------------- | | uintle-auto | Auto-sized unsigned LE | | uintbe-auto | Auto-sized unsigned BE | | intle-auto | Auto-sized signed LE | | intbe-auto | Auto-sized signed BE |

String & Buffer Types

| Type | Description | | ---------------- | ------------ | | buffer | Raw buffer | | ascii | ASCII string | | utf8 / utf-8 | UTF-8 string | | hex | Hex string |

Macro Types

| Type | Description | | --------- | --------------------------------- | | dec-str | UTF-8 string → decimal number | | hex-str | UTF-8 string → hexadecimal number |

Dynamic Fields

Dynamic Size

.field('data', {
  size: ({ fields }) => fields.length.value,
  type: 'buffer'
});

Dynamic Offset

.field('data', {
  offset: ({ fields }) => fields.start.value,
  size: 4
});

Using `after`

.field('header', { size: 4 })
.field('body', { after: 'header', size: 8 });

Repeated Fields

.field('items', {
  size: 2,
  repeat: 3,
  type: 'uint16le'
});

Dynamic repetition:

.field('count', { size: 1, type: 'uint8' })
.field('items', {
  size: 2,
  repeat: ({ fields }) => fields.count.value,
  type: 'uint16le'
});

Post-Processing (`then`)

.field('value', {
  size: 2,
  type: 'utf8',
  then: ({ value }) => value.trimEnd() // remove trailing spaces from field value
});

Runs after parsing
Used to override the final value or set global field properties
Return undefined or null to preserve original value, anything else will override it

Dependencies (`depends`)

.field('optional', {
  depends: 'flag',
  size: 4
});

Field is skipped if dependency is not found

Custom Type Parsers

.field('custom', {
  size: 4,
  type: ({ buffer }) => buffer.readUInt32LE(0) * 2
});

Or return a predefined type:

type: () => 'uint16le';

Global Context

A shared object available across all fields:

.field('length', {
  size: 1,
  type: 'uint8',
  then: ({ value, global }) => {
    global.length = value;
  }
})
.field('data', {
  size: ({ global }) => global.length
});

JSON Serialization/Deserialization

<DataLayoutInstance>.toJSON([space]);
<DataLayoutWrapperInstance>.toJSON([space]);

DataLayout.fromJSON(string);

WARNING: DataLayout objects may include functions for declaration properties (such as 'size' or 'type'), and serialization of DataLayout objects WILL include those functions packed as strings.

WARNING: Due to the possible presence of functions in DataLayout objects, deserializing objects of unknown sources is NOT secure an MAY trigger arbitrary code execution, and SHOULD NOT be done. Only deserialize pre-declared layouts that you trust!

Modifying unpacked buffers

After unpacking a buffer into a DataLayout structure, the modification of bytes in the original buffer will not reflect on the extracted structure.

To update the original buffer in a structured manner use:

Updating fields (data and type):

Call .updateField on an extracted object to update a field value, and optionally, it's type.

It is not possible to change the field size, and setting incompatible values/sizes will trigger errors.

<DataLayoutWrapperInstance>.updateField(fieldName, fieldData, fieldType)

Clearing fields:

Call .clearField on an extracted object to update a field value, and set it to binary zeroes (null bytes).

<DataLayoutWrapperInstance>.clearField(fieldName)

Examples:

const layout = new DataLayout()
    .field('version', { size: 1, type: 'dec-str' })
    .field('size', { size: 2, type: 'uint16le' });

const buffer = Buffer.from([0x31, 0x05, 0x00]); // "1\x5\x0"
const struct = layout.unpack(buffer);

struct.updateField('version', 2, 'uint8'); // Update 'version' field and change it's type to UINT8
struct.clearField('size'); // Fill 'size' with zeroes

const newbuffer = struct.repack(); // Repack the modified buffer // -> [02, 00, 00]

Notes

Fields are parsed in definition order
If neither offset nor after is provided:
- First field defaults to offset 0
- Subsequent fields default to after previous field

Advanced Usage: Conditional fields

Sometimes, when parsing more complex structures - e.g. PE files - there is the need to completely remove/ignore a field depending on previous fields.

In this case, fields can be ignored by returning 'null' or zero on it's "size" property.

If the condition for a field to be extracted is simply another field being defined, then you can achieve the same result through the "depends" property.

const layout = new DataLayout()
    .field('version', { size: 1, type: 'dec-str' })

    // Here, the "f1" field is only extracted if the "version" field value is '1'.
    //  by returning zero if the version is not 1, the field is ignored
    .field('f1', {
        size: ({ fields }) => (fields.version.value === 1 ? 2 : 0),
        type: 'utf8',
    })

    // Another way of doing this, is conditioning one field to another.
    // In this case, the 'f2' field only exists if the 'f1' field also
    // exists.
    .field('f2', {
        size: 2,
        depends: 'f1',
        type: 'utf8',
    });

// The 'version' field is "2", which means that F1 is not extracted.
// And, as F2 depends on F1, it is also ignored.
const { fields } = layout.unpack(Buffer.from('2AABB'));
console.log(fields.version.value, fields.f1?.value, fields.f2?.value);
// ->  1, undefined, undefined

// The 'version' field is "1", which means that F1 will be extracted,
// and, therefore, will be extracted as well.
const { fields } = layout.unpack(Buffer.from('1AABB'));
console.log(fields.version.value, fields.f1.value, fields.f2.value);
// ->  1, "AA", "BB"

Advanced Usage: Dynamic offsets and "after" property

After dynamic fields, you can use both the 'offset' property (to point the next field to a custom location - which is usually not practical), or use the "after" and "depends" properties to align fields in sequence.

In this example, the file structure has 2 version variants, and is extracted in one of 3 ways, according to the 'version' field:

const layout = new DataLayout()
    .field('version', { size: 2, type: 'dec-str' })

    // Variant 1 -> extracts field F1, with 2 bytes
    .field('f1', { size: ({ fields }) => (fields.version.value === 1 ? 2 : 0) })

    // Variant 2 -> extracts field F2, with 4 bytes
    .field('f2', {
        size: ({ fields }) => (fields.version.value === 2 ? 4 : 0),
    });

After this type of definition, if there are fields shared among the two variants, the extraction will depend on the sizes.

In the example, the 'footer' field is shared among both variants. However, it's offset changes depending on the fields defined (in variant 1, the offset is 4; while in variant 2, the offset is 6).

To solve it, the options are:

//  Option A: (the most volatile) let the lib auto-calculate the field sizes
bin.field('footer', { size: 16 }); // Omit 'after' and 'offset'
//  *Use only for small schemas !*

//  Option A: (the best) the 'after' property depending on the existing fields
bin.field('footer', {
    size: 16,
    after: ({ fields }) => fields.f1?.name || fields.f3?.name,
});

// Option B: (the second best) use dynamic offsets
bin.field('footer', {
    size: 16,
    offset: ({ fields }) =>
        (fields.f1?.offset || fields.f2?.offset) +
        (fields.f1?.size || fields.f2?.size),
});

// Option C: (usually not good) use 2 fields, with named dependencies
bin.field('footer-1', { size: 16, depends: 'f1' });
bin.field('footer-2', { size: 16, depends: 'f2' });

Note: For structures with such simple schemes, the own library is able to calculate field offsets dynamically, during the parsing phase. In the example above, you could simply omit both the 'offset' and 'after' properties, and the own library would calculate them automatically.

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

data-layout

Installation

Basic Usage

Field Definition

Options

Parsing Result

Callbacks for dynamic components of fields

Built-in Type Parsers

Integer Types

Dynamic Integer Types

String & Buffer Types

Macro Types

Dynamic Fields

Dynamic Size

Dynamic Offset

Using after

Repeated Fields

Post-Processing (then)

Dependencies (depends)

Custom Type Parsers

Global Context

JSON Serialization/Deserialization

Modifying unpacked buffers

Updating fields (data and type):

Clearing fields:

Examples:

Notes

Advanced Usage: Conditional fields

Advanced Usage: Dynamic offsets and "after" property

Using `after`

Post-Processing (`then`)

Dependencies (`depends`)