@buffela/serializer
v2.4.2
Published
Serializer for the buffela data format
Maintainers
Readme
Buffela binary format
Buffela (pronounced bah-FEH-lah) is a minimal, schema-based binary format.
You can write the schemas in any json-compatible language, like for example yaml:
Gender:
- FEMALE
- MALE
User:
userId: String(36)
gender: Gender
hobbies: String[UByte]
registeredWith: Type
RegisteredWithPhone:
countryCode: UByte
phone: String
RegisteredWithEmail:
email: String
AuthToken:
version: 1
issuedAt: Double
signature: Buffer(32)
user: UserBuffela supports all the types you would expect (strings, booleans, numbers), along with enums, subtypes (similar to kotlin's sealed types) and arrays (in both constant and variable sized variants).
Then you can import it in your favorite programming language - as long as it is Javascript, Typescript or Kotlin ;) - and write type safe serialization and deserialization code.
Why not protobuf
Buffela drops support for forward and backward compatible schemas, in favor of schema readability and size efficiency. For the above example and some typical test data, the output generated by protobuf is 11% larger. To achieve this size efficiency, buffela gives you more control over the format. You can manually specify the byte length of numbers (byte, short, integer, long). The same goes for arrays, you can make them fixed size or specify the byte length of the item count. Additionally, you don't have to worry about field numbers.
Compatible languages
We currently support Javascript/Typescript and Kotlin Multiplatform, but there is no reason that we couldn't add support for more in the future.
There are two ways to implement a (de)serializer:
Reflection
Go through the schema and decide how to encode/decode each field at runtime. This can be slow, especially for typed languages but it means that the schema needs no preprocessing.
Code generation
Decide and write down the steps to encode/decode each field in a preprocessing step. Basically, compile the schema into code. Encoding or decoding at runtime is as fast as it can be.
Ideally, we would support both approaches for all supported languages, but ain't nobody got time for that. So here is what we currently support:
| Language | Reflection based serialization | Reflection based deserialization | Serializer code generation | Deserializer code generation | | --------------------- | ------------------------------ | -------------------------------- | -------------------------- | ---------------------------- | | Javascript/Typescript | ✅ | ✅ | ❌ | ❌ | | Kotlin Multiplatform | ❌ | ❌ | ✅ | ✅ |
Installation
Javascript/Typescript
Install the schema parser
npm i @buffela/parserYou want to serialize?
npm i @buffela/serializerYou want to deserialize?
npm i @buffela/deserializerYou're a front end developer?
Install the buffer browser polyfill.
Javascript with JSDoc
Install typescript as a dev dependency
npm i -D typescriptSet up a simple tsconfig.json inside your project folder (don't worry it's for your editor, I won't have you compile anything)
"compilerOptions": {
"target": "es2016",
"module": "commonjs",
"skipLibCheck": true
}Kotlin
You can run the Kotlin tools through npm:
npx @buffela/tools-kotlin --helpThe first time around it will ask you to download the package, press Enter to proceed.
Don't know what npm is? Bless your innocent soul xD. I recommend installing node through nvm: https://github.com/nvm-sh/nvm
Install nvm and then run
nvm install --lts. Now you should have node and npm with it.
You'll also want to install some dependencies required by the generated code in your project:
org.jetbrains.kotlinx:kotlinx-io-core(Latest version)gr.elaevents.buffela.schema:utils(Latest version)
Usage
JSDoc/Typescript type generation
Install the buffela js tools as a dev dependency
npm i -D @buffela/toolsRun the type generator
buffela-to-types YOUR_BUFFELA_SCHEMA DIRECTORY_TO_PUT_THE_TYPESThis will generate a .d.ts file in the specified directory with the same name as your schema file
Javascript/Typescript
You can read the buffela schema however you need to, depending on the format. If it is yaml, you need either a library to parse it (like yaml), or to convert it to json first for easy importing.
To convert it to json first install the js tools (you could also use any online yaml to json converter)
npm i -D @buffela/toolsThen run the converter
buffela-to-json YOUR_BUFFELA_SCHEMA DIRECTORY_TO_PUT_THE_JSONThis will generate a .json file in the specified directory with the same name as your schema file
Javascript
const { parseBuffelaSchema } = require('@buffela/parser')
const { serializeCalf } = require('@buffela/serializer')
const { deserializeCalf } = require('@buffela/deserializer')
/**
* Do this only if you have generated types
* @type {import('./YOUR_BUFFELA_TYPES').default}
*/
const schema = parseBuffelaSchema(howeverYouReadIt())
const buffer = serializeCalf(schema.AuthToken, {
issuedAt: Date.now(),
signature: Buffer.alloc(32),
user: {
userId: "d6c47b4b-6983-48eb-a957-a954798f6e57",
gender: schema.Gender.FEMALE,
hobbies: ["coffee", "reading", "going out"],
registeredWith: schema.User.RegisteredWithPhone,
countryCode: 30,
phone: "691 234 5678"
}
})
const authToken = deserializeCalf(schema.AuthToken, buffer)
console.log(authToken)Typescript
import Schema from './YOUR_BUFFELA_TYPES'
import { parseBuffelaSchema } from '@buffela/parser'
import { serializeCalf } from '@buffela/serializer'
import { deserializeCalf } from '@buffela/deserializer'
const schema = parseBuffelaSchema<Schema>(howeverYouReadIt())
const buffer = serializeCalf(schema.AuthToken, {
issuedAt: Date.now(),
signature: Buffer.alloc(32),
user: {
userId: "d6c47b4b-6983-48eb-a957-a954798f6e57",
gender: schema.Gender.FEMALE,
hobbies: ["coffee", "reading", "going out"],
registeredWith: schema.User.RegisteredWithPhone,
countryCode: 30,
phone: "691 234 5678"
}
})
const authToken = deserializeCalf(schema.AuthToken, buffer)
console.log(authToken)Kotlin
Run the kotlin class generator
npx @buffela/tools-kotlin generate YOUR_BUFFELA_SCHEMA DIRECTORY_TO_PUT_THE_FILE --package=YOUR_PACKAGEThis will create a .kt file in the specified directory with the same name as your buffela schema
package YOUR_PACKAGE
import kotlinx.io.Buffer
import kotlinx.io.readByteArray
fun main() {
val bytes = AuthToken(
issuedAt = System.currentTimeMillis().toDouble(),
signature = ByteArray(32),
user = User.RegisteredWithPhone(
userId = "d6c47b4b-6983-48eb-a957-a954798f6e57",
gender = Gender.FEMALE,
hobbies = arrayOf("coffee", "reading", "going out"),
countryCode = 30u,
phone = "This is my phone",
)
).serialize().readByteArray()
val authToken = AuthToken.deserialize(Buffer().apply { write(bytes) })
}Schema syntax
Validation
We provide a JSON schema for in-editor validation of your buffela schemas.
Javascript/Typescript
Since we expect you'll be writing schemas in a development environment, the schema is bundled with the parser that you probably already need.
After installing the parser, you can find the schema at node_modules/@buffela/parser/schemas/buffela.json.
Kotlin
You can export the JSON schema through the command line tool:
npx @buffela/tools-kotlin schema buffela.jsonThis will create a buffela.json file in your current directory.
Please make sure to run the command again after every minor or major buffela update in order to get the latest features.
Calves
The top level types inside your buffela schema are called calves. Their name must (1) start with a capital letter and (2) only contain letters.
Going back to our example:
Gender:
...
User:
...
AuthToken:
...Gender, User and AuthToken are all calves.
Calves can be either objects, or arrays and can contain either fields or enumeration values respectively.
Enumeration calves
Enumeration calves represent single-choice types.
They are arrays that can only contain unique string values.
The values must (1) be in all uppercase and (2) only contain letters.
Here is an enumeration calf from our example:
Gender:
- FEMALE
- MALEData calves
Data calves represent a collection of fields with their respective types.
They are objects that can contain key-values pairs.
The keys are the names of the fields, and they must (1) start with a lowercase letter and (2) only contain letters.
The values are the types of the fields, of which there are many as we'll see below.
Here is a data calf from our example:
AuthToken:
version: 1
issuedAt: Double
signature: Buffer(32)
user: UserVariable types
You use a variable type for fields that can have multiple values, unknown at compile time. In the above data calf example, issuedAt, signature and user all have variable types, in contrast to the version field which has a constant type.
There are three kinds of variable types:
References
You can use other enumeration and data calves as types. For example:
User:
gender: Gender
...or
AuthToken:
user: User
...Primitives
These are the supported primitive types along with their mapping to the supported languages and their byte lengths.
| Buffela Type | Javascript Type | Kotlin Type | Byte Length | Description | | ------------ | --------------- | ----------- | ----------- | ------------------------------------------------------------ | | Byte | number | Byte | 1 | Integers from -128 to 127 | | UByte | number | UByte | 1 | Integers from 0 to 255 | | Short | number | Short | 2 | Integers from -32,768 to 32,767 | | UShort | number | UShort | 2 | Integers from 0 to 65,535 | | Int | number | Int | 4 | Integers from -2,147,483,647 to 2,147,483,647 | | UInt | number | UInt | 4 | Integers from 0 to 4,294,967,295 | | Long | BigInt | Long | 8 | Integers from -9,223,372,036,854,775,808 to 9,223,372,036,854,775,807 | | ULong | BigInt | ULong | 8 | Integers from 0 to 18,446,744,073,709,551,615 | | Float | number | Float | 4 | Decimals from 3.4 E-38 to 3.4 E +38 | | Double | number | Double | 8 | Decimals from 1.7 E -308 to 1.7 E +308 | | String | string | String | Variable | Text | | Boolean | boolean | Boolean | 1 | true or false |
Here is an example:
AuthToken:
issuedAt: Double
...Arrays
Any type can be used as the item type of an array, by adding the [SIZE] suffix. The suffix can be chained to create n-dimensional arrays.
Arrays can be either variable length or fixed length. If you expect an array to ALWAYS contain a known fixed number of items, you can use this number directly as the size. As the size is known from the schema, buffela doesn't have to write it in the packet, saving precious bytes.
Otherwise, if you don't know the exact number of items, you must specify the data type used for the size. For example, if you know that a user will not have more than 255 hobbies:
User:
hobbies: String[UByte]
...NOTE: The only valid data types for a size are UByte, UShort, and Int.
Why Int and not UInt you may ask, well Kotlin and many other languages do not support lengths larger that the signed 32-bit integer limit
While what I said above is true: "Any type can be used as the item type of an array", for optimization reasons you should avoid using the array notation for numeric and boolean types. Instead you should use these optimized types:
| Buffela Type | Javascript Type | Kotlin Type | | ---------------- | ----------------- | ------------ | | UByteArray | Uint8Array | UByteArray | | ByteArray | Int8Array | ByteArray | | UShortArray | Uint16Array | UShortArray | | ShortArray | Int16Array | ShortArray | | UIntArray | Uint32Array | UIntArray | | IntArray | Int32Array | IntArray | | LongArray | BigInt64Array | LongArray | | ULongArray | BigUint64Array | ULongArray | | FloatArray | Float32Array | FloatArray | | DoubleArray | Float64Array | DoubleArray | | BooleanArray | Uint8ClampedArray | BooleanArray |
You should also avoid using a ByteArray for transmitting binary data. Instead, I present to you the Buffer type that corresponds to Buffer in Javascript and ByteArray in Kotlin.
"But now how do I specify the size!?", you exclaim. With parenthesis. These types expect a (SIZE) suffix, and the rules about the size are the same as with normal arrays. You can even make an 2, or 3 or n-dimensional array, by adding as many [SIZE]s as your heart desires at the end.
String
The String type is a hybrid type, meaning that it can be used as both a primitive and an array type. If you expect a string to ALWAYS be of a certain UTF-8 byte length, then you can add a (SIZE) suffix. For example, if you want to represent a UUID string:
User:
userId: String(36)
...Sub (data) calves
Sometimes you may need a data calf that can take multiple forms. For example, imagine a User structure for an application where users have the choice of signing up with either an email address or a phone number. Each user will either have a phone number or an email address, but never both. To represent this kind of either-or relationship between groups of fields, we have sub calves. For example:
User:
userId: String(36)
...
registeredWith: Type
RegisteredWithPhone:
countryCode: UByte
phone: String
RegisteredWithEmail:
email: String
In this example RegisteredWithPhone and RegisteredWithEmail are sub calves of User. All users will have an id, regardless of which method they used to sign up, but only one of group of fields RegisteredWithPhone or RegisteredWithEmail will be present.
The fields inside the selected group are direct descendants of the User parent type, and thus sub calves cannot declare fields with the same name as a field declared in any of their ancestors up the tree.
In order for the serializer to know which group of fields to serialize, we require you to declare a special field of type "Type". The name though is at your disgression. The value of that field inside the instance you wish to serialize, must be set to one of the sub calves, but how you do this is language specific so please refer to the Usage section for your language of choice.
You can only declare a single field of the special type "Type", and if you declare such a field in a root calf (a calf with no sub calves), the field will be ignored. The position of the type field inside the data calf does not matter, and you can move it around without changing the resulting packet.
Sub calves are data calves themselves and can be nested infinitely.
Constant types
One disadvantage that non-self-describing protocols have, is backwards and forwards compatibility between schema revisions. Buffela will blindly trust any binary data you give it to deserialize. This is not always desired. For example, if you use buffela for server-client communication and one day decide to change the type of a field on the server, the outdated clients will have no idea about the change and will try to decode the field with its old type. This will lead to data corruption. For this reason, you should implement your own versioning mechanism to use the right schema revision based on some kind of header.
But as a last ditch measure, or if you couldn't give a damn about the service quality of outdated clients, you can use buffela constants. These, you define as regular fields in the schema, but instead of a type you write a number. For example:
AuthToken:
version: 1
...These constants are represented as unsigned bytes, and thus must be whole numbers between 0 and 255. If you ever run out of numbers, first you have my congratulations, and second you can add additional constants.
Constants are written to each packet during serialization and are checked during deserialization to match the schema. If they don't, buffela throws an error.
You can select to version specific (sub) calves. But, for most cases we recommend versioning all the root calves and none of the sub calves.
If you ever plan to use buffela constants for versioning, please do it from the beginning. Adding a constant to an existing calf WILL NOT reliably reject binary data serialized with previous schema revisions, as adding a constant field constitutes a revision by itself. More details in the 'Versioning' section.
Refactoring
NOTE: A refactoring is considered safe if it doesn't change the resulting packet. If you change a name in the schema, you of course will need to change the code that uses it.
You can safely change the name of everything: root calves, sub calves, fields, anything.
You can safely reorder the root calves.
Members inside a data calf are split into four categories:
- The constant fields
- The variables fields
- The Type field
- The sub calves
You can safely interleave members from all categories, but you cannot change the relative order of members belonging to the same category.
Example 1
AuthToken:
version: 1
issuedAt: Double
signature: Buffer(32)
user: Useris the same as:
AuthToken:
issuedAt: Double
version: 1
signature: Buffer(32)
user: Userbut NOT the same as:
AuthToken:
version: 1
user: User
issuedAt: Double
signature: Buffer(32) Example 2
User:
userId: String(36)
registeredWith: Type
RegisteredWithEmail:
...
RegisteredWithPhone:
...is the same as:
User:
RegisteredWithEmail:
...
userId: String(36)
RegisteredWithPhone:
...
registeredWith: Typebut NOT the same as:
User:
userId: String(36)
registeredWith: Type
RegisteredWithPhone:
...
RegisteredWithEmail:
...Convention
To avoid confusion we recommend ordering your members in the following way:
- Constants
- Variables
- Type field
- Sub calves
Versioning
This section is a guide for when to increment the version number of a data calf.
Glossary: Calves that have no sub calves are called leaf calves.
The ONLY type of change that wasn't covered in the 'Refactoring' section and that doesn't constitute a revision, is the addition of a variable type field to the end of a leaf calf (or the addition of a field in general if the leaf calf contains only constant fields or none at all). In any other case, the addition, removal, type modification or relocation of a field constitutes a revision on the containing (sub) calf.
Note: If you don't implement versioning on a sub calf and do a change that constitutes a revision, you must bubble the revision up to the nearest parent that does.
Finally the addition, removal or relocation of a subtype, constitutes a revision on the ROOT calf (for reasons described in the 'Sub calf id compression' section).
Internals
Sub calf id compression
When you serialize a sub calf, buffela needs to write some kind of header containing the sub calf id, to know which sub calf to use for deserialization later. While you can have infinitely nested calves, we don't expect the leaf calves to be that many in number. So instead of writing a sub calf header for each nested sub calf, buffela assigns a unique id to each leaf calf and only writes a single header for the entire packet. We use DFS to assign the ids. Example:
RootCalf:
ChildCalf1: (id 1)
ChildCalf2:
GrandchildCalf1: (id 2)
GrandchildCalf2: (id 3)Additionally, even if you have nested sub calves, in the weird scenario that there is only one leaf calf, buffela skips writing the header entirely. Example:
RootCalf:
ChildCalf:
GrandchildCalf:Serialization order
Serialized packets have the following structure:
Root calf constants
Leaf calf id
Root calf variables
Child calf constants
Child calf variables
...