@buffela/serializer

v2.4.2

Published

22 days ago

Serializer for the buffela data format

0High
0Medium
0Low

banforfun

buffela protobuf wire data serialization format protocol schema binary serializer

Buffela binary format

Buffela (pronounced bah-FEH-lah) is a minimal, schema-based binary format.

You can write the schemas in any json-compatible language, like for example yaml:

Gender:
  - FEMALE
  - MALE

User:
  userId: String(36)
  gender: Gender
  hobbies: String[UByte]
  registeredWith: Type

  RegisteredWithPhone:
    countryCode: UByte
    phone: String

  RegisteredWithEmail:
    email: String

AuthToken:
  version: 1
  issuedAt: Double
  signature: Buffer(32)
  user: User

Buffela supports all the types you would expect (strings, booleans, numbers), along with enums, subtypes (similar to kotlin's sealed types) and arrays (in both constant and variable sized variants).

Then you can import it in your favorite programming language - as long as it is Javascript, Typescript or Kotlin ;) - and write type safe serialization and deserialization code.

Why not protobuf

Buffela drops support for forward and backward compatible schemas, in favor of schema readability and size efficiency. For the above example and some typical test data, the output generated by protobuf is 11% larger. To achieve this size efficiency, buffela gives you more control over the format. You can manually specify the byte length of numbers (byte, short, integer, long). The same goes for arrays, you can make them fixed size or specify the byte length of the item count. Additionally, you don't have to worry about field numbers.

Compatible languages

We currently support Javascript/Typescript and Kotlin Multiplatform, but there is no reason that we couldn't add support for more in the future.

There are two ways to implement a (de)serializer:

Reflection
Go through the schema and decide how to encode/decode each field at runtime. This can be slow, especially for typed languages but it means that the schema needs no preprocessing.
Code generation
Decide and write down the steps to encode/decode each field in a preprocessing step. Basically, compile the schema into code. Encoding or decoding at runtime is as fast as it can be.

Ideally, we would support both approaches for all supported languages, but ain't nobody got time for that. So here is what we currently support:

| Language | Reflection based serialization | Reflection based deserialization | Serializer code generation | Deserializer code generation | | --------------------- | ------------------------------ | -------------------------------- | -------------------------- | ---------------------------- | | Javascript/Typescript | ✅ | ✅ | ❌ | ❌ | | Kotlin Multiplatform | ❌ | ❌ | ✅ | ✅ |

Installation

Javascript/Typescript

Install the schema parser

npm i @buffela/parser

You want to serialize?

npm i @buffela/serializer

You want to deserialize?

npm i @buffela/deserializer

You're a front end developer?

Install the buffer browser polyfill.

Javascript with JSDoc

Install typescript as a dev dependency

npm i -D typescript

Set up a simple tsconfig.json inside your project folder (don't worry it's for your editor, I won't have you compile anything)

"compilerOptions": {
    "target": "es2016",
    "module": "commonjs",
    "skipLibCheck": true
}

Kotlin

You can run the Kotlin tools through npm:

npx @buffela/tools-kotlin --help

The first time around it will ask you to download the package, press Enter to proceed.

Don't know what npm is? Bless your innocent soul xD. I recommend installing node through nvm: https://github.com/nvm-sh/nvm
Install nvm and then run nvm install --lts. Now you should have node and npm with it.

You'll also want to install some dependencies required by the generated code in your project:

org.jetbrains.kotlinx:kotlinx-io-core (Latest version)
gr.elaevents.buffela.schema:utils (Latest version)

Usage

JSDoc/Typescript type generation

Install the buffela js tools as a dev dependency

npm i -D @buffela/tools

Run the type generator

buffela-to-types YOUR_BUFFELA_SCHEMA DIRECTORY_TO_PUT_THE_TYPES

This will generate a .d.ts file in the specified directory with the same name as your schema file

Javascript/Typescript

You can read the buffela schema however you need to, depending on the format. If it is yaml, you need either a library to parse it (like yaml), or to convert it to json first for easy importing.

To convert it to json first install the js tools (you could also use any online yaml to json converter)

npm i -D @buffela/tools

Then run the converter

buffela-to-json YOUR_BUFFELA_SCHEMA DIRECTORY_TO_PUT_THE_JSON

This will generate a .json file in the specified directory with the same name as your schema file

Javascript

const { parseBuffelaSchema } = require('@buffela/parser')
const { serializeCalf } = require('@buffela/serializer')
const { deserializeCalf } = require('@buffela/deserializer')

/**
 * Do this only if you have generated types
 * @type {import('./YOUR_BUFFELA_TYPES').default}
 */
const schema = parseBuffelaSchema(howeverYouReadIt())

const buffer = serializeCalf(schema.AuthToken, {
    issuedAt: Date.now(),
    signature: Buffer.alloc(32),
    user: {
        userId: "d6c47b4b-6983-48eb-a957-a954798f6e57",
        gender: schema.Gender.FEMALE,
        hobbies: ["coffee", "reading", "going out"],
        registeredWith: schema.User.RegisteredWithPhone,
        countryCode: 30,
        phone: "691 234 5678"
    }
})

const authToken = deserializeCalf(schema.AuthToken, buffer)
console.log(authToken)

Typescript

import Schema from './YOUR_BUFFELA_TYPES'

import { parseBuffelaSchema } from '@buffela/parser'
import { serializeCalf } from '@buffela/serializer'
import { deserializeCalf } from '@buffela/deserializer'

const schema = parseBuffelaSchema<Schema>(howeverYouReadIt())

const buffer = serializeCalf(schema.AuthToken, {
    issuedAt: Date.now(),
    signature: Buffer.alloc(32),
    user: {
        userId: "d6c47b4b-6983-48eb-a957-a954798f6e57",
        gender: schema.Gender.FEMALE,
        hobbies: ["coffee", "reading", "going out"],
        registeredWith: schema.User.RegisteredWithPhone,
        countryCode: 30,
        phone: "691 234 5678"
    }
})

const authToken = deserializeCalf(schema.AuthToken, buffer)
console.log(authToken)

Kotlin

Run the kotlin class generator

npx @buffela/tools-kotlin generate YOUR_BUFFELA_SCHEMA DIRECTORY_TO_PUT_THE_FILE --package=YOUR_PACKAGE

This will create a .kt file in the specified directory with the same name as your buffela schema

package YOUR_PACKAGE

import kotlinx.io.Buffer
import kotlinx.io.readByteArray

fun main() {
    val bytes = AuthToken(
        issuedAt = System.currentTimeMillis().toDouble(),
        signature = ByteArray(32),
        user = User.RegisteredWithPhone(
            userId = "d6c47b4b-6983-48eb-a957-a954798f6e57",
            gender = Gender.FEMALE,
            hobbies = arrayOf("coffee", "reading", "going out"),
            countryCode = 30u,
            phone = "This is my phone",
        )
    ).serialize().readByteArray()

    val authToken = AuthToken.deserialize(Buffer().apply { write(bytes) })
}

Schema syntax

Validation

We provide a JSON schema for in-editor validation of your buffela schemas.

Javascript/Typescript

Since we expect you'll be writing schemas in a development environment, the schema is bundled with the parser that you probably already need.

After installing the parser, you can find the schema at node_modules/@buffela/parser/schemas/buffela.json.

Kotlin

You can export the JSON schema through the command line tool:

npx @buffela/tools-kotlin schema buffela.json

This will create a buffela.json file in your current directory.

Please make sure to run the command again after every minor or major buffela update in order to get the latest features.

Calves

The top level types inside your buffela schema are called calves. Their name must (1) start with a capital letter and (2) only contain letters.

Going back to our example:

Gender:
  ...

User:
  ...

AuthToken:
  ...

Gender, User and AuthToken are all calves.

Calves can be either objects, or arrays and can contain either fields or enumeration values respectively.

Enumeration calves

Enumeration calves represent single-choice types.

They are arrays that can only contain unique string values.

The values must (1) be in all uppercase and (2) only contain letters.

Here is an enumeration calf from our example:

Gender:
  - FEMALE
  - MALE

Data calves

Data calves represent a collection of fields with their respective types.

They are objects that can contain key-values pairs.

The keys are the names of the fields, and they must (1) start with a lowercase letter and (2) only contain letters.

The values are the types of the fields, of which there are many as we'll see below.

Here is a data calf from our example:

AuthToken:
  version: 1
  issuedAt: Double
  signature: Buffer(32)
  user: User

Variable types

You use a variable type for fields that can have multiple values, unknown at compile time. In the above data calf example, issuedAt, signature and user all have variable types, in contrast to the version field which has a constant type.

There are three kinds of variable types:

References

You can use other enumeration and data calves as types. For example:

User:
  gender: Gender
  ...

AuthToken:
  user: User
  ...

Primitives

These are the supported primitive types along with their mapping to the supported languages and their byte lengths.

| Buffela Type | Javascript Type | Kotlin Type | Byte Length | Description | | ------------ | --------------- | ----------- | ----------- | ------------------------------------------------------------ | | Byte | number | Byte | 1 | Integers from -128 to 127 | | UByte | number | UByte | 1 | Integers from 0 to 255 | | Short | number | Short | 2 | Integers from -32,768 to 32,767 | | UShort | number | UShort | 2 | Integers from 0 to 65,535 | | Int | number | Int | 4 | Integers from -2,147,483,647 to 2,147,483,647 | | UInt | number | UInt | 4 | Integers from 0 to 4,294,967,295 | | Long | BigInt | Long | 8 | Integers from -9,223,372,036,854,775,808 to 9,223,372,036,854,775,807 | | ULong | BigInt | ULong | 8 | Integers from 0 to 18,446,744,073,709,551,615 | | Float | number | Float | 4 | Decimals from 3.4 E-38 to 3.4 E +38 | | Double | number | Double | 8 | Decimals from 1.7 E -308 to 1.7 E +308 | | String | string | String | Variable | Text | | Boolean | boolean | Boolean | 1 | true or false |

Here is an example:

AuthToken:
  issuedAt: Double
  ...

Arrays

Any type can be used as the item type of an array, by adding the [SIZE] suffix. The suffix can be chained to create n-dimensional arrays.

Arrays can be either variable length or fixed length. If you expect an array to ALWAYS contain a known fixed number of items, you can use this number directly as the size. As the size is known from the schema, buffela doesn't have to write it in the packet, saving precious bytes.

Otherwise, if you don't know the exact number of items, you must specify the data type used for the size. For example, if you know that a user will not have more than 255 hobbies:

User:
  hobbies: String[UByte]
  ...

NOTE: The only valid data types for a size are UByte, UShort, and Int.

Why Int and not UInt you may ask, well Kotlin and many other languages do not support lengths larger that the signed 32-bit integer limit

While what I said above is true: "Any type can be used as the item type of an array", for optimization reasons you should avoid using the array notation for numeric and boolean types. Instead you should use these optimized types:

| Buffela Type | Javascript Type | Kotlin Type | | ---------------- | ----------------- | ------------ | | UByteArray | Uint8Array | UByteArray | | ByteArray | Int8Array | ByteArray | | UShortArray | Uint16Array | UShortArray | | ShortArray | Int16Array | ShortArray | | UIntArray | Uint32Array | UIntArray | | IntArray | Int32Array | IntArray | | LongArray | BigInt64Array | LongArray | | ULongArray | BigUint64Array | ULongArray | | FloatArray | Float32Array | FloatArray | | DoubleArray | Float64Array | DoubleArray | | BooleanArray | Uint8ClampedArray | BooleanArray |

You should also avoid using a ByteArray for transmitting binary data. Instead, I present to you the Buffer type that corresponds to Buffer in Javascript and ByteArray in Kotlin.

"But now how do I specify the size!?", you exclaim. With parenthesis. These types expect a (SIZE) suffix, and the rules about the size are the same as with normal arrays. You can even make an 2, or 3 or n-dimensional array, by adding as many [SIZE]s as your heart desires at the end.

String

The String type is a hybrid type, meaning that it can be used as both a primitive and an array type. If you expect a string to ALWAYS be of a certain UTF-8 byte length, then you can add a (SIZE) suffix. For example, if you want to represent a UUID string:

User:
  userId: String(36)
  ...

Sub (data) calves

Sometimes you may need a data calf that can take multiple forms. For example, imagine a User structure for an application where users have the choice of signing up with either an email address or a phone number. Each user will either have a phone number or an email address, but never both. To represent this kind of either-or relationship between groups of fields, we have sub calves. For example:

User:
  userId: String(36)
  ...
  
  registeredWith: Type

  RegisteredWithPhone:
    countryCode: UByte
    phone: String

  RegisteredWithEmail:
    email: String

In this example RegisteredWithPhone and RegisteredWithEmail are sub calves of User. All users will have an id, regardless of which method they used to sign up, but only one of group of fields RegisteredWithPhone or RegisteredWithEmail will be present.

The fields inside the selected group are direct descendants of the User parent type, and thus sub calves cannot declare fields with the same name as a field declared in any of their ancestors up the tree.

In order for the serializer to know which group of fields to serialize, we require you to declare a special field of type "Type". The name though is at your disgression. The value of that field inside the instance you wish to serialize, must be set to one of the sub calves, but how you do this is language specific so please refer to the Usage section for your language of choice.

You can only declare a single field of the special type "Type", and if you declare such a field in a root calf (a calf with no sub calves), the field will be ignored. The position of the type field inside the data calf does not matter, and you can move it around without changing the resulting packet.

Sub calves are data calves themselves and can be nested infinitely.

Constant types

One disadvantage that non-self-describing protocols have, is backwards and forwards compatibility between schema revisions. Buffela will blindly trust any binary data you give it to deserialize. This is not always desired. For example, if you use buffela for server-client communication and one day decide to change the type of a field on the server, the outdated clients will have no idea about the change and will try to decode the field with its old type. This will lead to data corruption. For this reason, you should implement your own versioning mechanism to use the right schema revision based on some kind of header.

But as a last ditch measure, or if you couldn't give a damn about the service quality of outdated clients, you can use buffela constants. These, you define as regular fields in the schema, but instead of a type you write a number. For example:

AuthToken:
  version: 1
  ...

These constants are represented as unsigned bytes, and thus must be whole numbers between 0 and 255. If you ever run out of numbers, first you have my congratulations, and second you can add additional constants.

Constants are written to each packet during serialization and are checked during deserialization to match the schema. If they don't, buffela throws an error.

You can select to version specific (sub) calves. But, for most cases we recommend versioning all the root calves and none of the sub calves.

If you ever plan to use buffela constants for versioning, please do it from the beginning. Adding a constant to an existing calf WILL NOT reliably reject binary data serialized with previous schema revisions, as adding a constant field constitutes a revision by itself. More details in the 'Versioning' section.

Refactoring

NOTE: A refactoring is considered safe if it doesn't change the resulting packet. If you change a name in the schema, you of course will need to change the code that uses it.

You can safely change the name of everything: root calves, sub calves, fields, anything.

You can safely reorder the root calves.

Members inside a data calf are split into four categories:

The constant fields
The variables fields
The Type field
The sub calves

You can safely interleave members from all categories, but you cannot change the relative order of members belonging to the same category.

Example 1

AuthToken:
  version: 1
  issuedAt: Double
  signature: Buffer(32)
  user: User

is the same as:

AuthToken:
  issuedAt: Double
  version: 1
  signature: Buffer(32)
  user: User

but NOT the same as:

AuthToken:
  version: 1
  user: User
  issuedAt: Double
  signature: Buffer(32)

Example 2

User:
  userId: String(36)
  registeredWith: Type

  RegisteredWithEmail:
    ...
  RegisteredWithPhone:
    ...

is the same as:

User:
  RegisteredWithEmail:
    ...
  
  userId: String(36)
  
  RegisteredWithPhone:
    ...
      
  registeredWith: Type

but NOT the same as:

User:
  userId: String(36)
  registeredWith: Type

  RegisteredWithPhone:
    ...
      
  RegisteredWithEmail:
    ...

Convention

To avoid confusion we recommend ordering your members in the following way:

Constants
Variables
Type field
Sub calves

Versioning

This section is a guide for when to increment the version number of a data calf.

Glossary: Calves that have no sub calves are called leaf calves.

The ONLY type of change that wasn't covered in the 'Refactoring' section and that doesn't constitute a revision, is the addition of a variable type field to the end of a leaf calf (or the addition of a field in general if the leaf calf contains only constant fields or none at all). In any other case, the addition, removal, type modification or relocation of a field constitutes a revision on the containing (sub) calf.

Note: If you don't implement versioning on a sub calf and do a change that constitutes a revision, you must bubble the revision up to the nearest parent that does.

Finally the addition, removal or relocation of a subtype, constitutes a revision on the ROOT calf (for reasons described in the 'Sub calf id compression' section).

Internals

Sub calf id compression

When you serialize a sub calf, buffela needs to write some kind of header containing the sub calf id, to know which sub calf to use for deserialization later. While you can have infinitely nested calves, we don't expect the leaf calves to be that many in number. So instead of writing a sub calf header for each nested sub calf, buffela assigns a unique id to each leaf calf and only writes a single header for the entire packet. We use DFS to assign the ids. Example:

RootCalf:
  ChildCalf1: (id 1)

  ChildCalf2:
    GrandchildCalf1: (id 2)
    GrandchildCalf2: (id 3)

Additionally, even if you have nested sub calves, in the weird scenario that there is only one leaf calf, buffela skips writing the header entirely. Example:

RootCalf:
  ChildCalf:
    GrandchildCalf:

Serialization order

Serialized packets have the following structure:

Root calf constants
Leaf calf id
Root calf variables
    Child calf constants
    Child calf variables
        ...

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

Buffela binary format

Why not protobuf

Compatible languages

Installation

Javascript/Typescript

Javascript with JSDoc

Kotlin

Usage

JSDoc/Typescript type generation

Javascript/Typescript

Javascript

Typescript

Kotlin

Schema syntax

Validation

Javascript/Typescript

Kotlin

Calves

Enumeration calves

Data calves

Variable types

References

Primitives

Arrays

String

Sub (data) calves

Constant types

Refactoring

Example 1

Example 2

Convention

Versioning

Internals

Sub calf id compression

Serialization order