@perseveranza-pets/milo
v0.3.0
Published
A fast and embeddable HTTP/1.1 parser.
Readme
Milo
Milo is a fast and embeddable HTTP/1.1 parser written in Rust.
It is usable in JavaScript via WebAssembly.
How to use it
Install it from npm:
npm install @perseveranza-pets/miloThen create a sample source file:
import { setup } from '@perseveranza-pets/milo'
/*
Milo works using callbacks.
All callbacks have the same signature, which characterizes the payload:
* The current parser
* from: The payload offset.
* size: The payload length.
The payload parameters above are relative to the last data sent to the milo.parse method.
If the current callback has no payload, both values are set to 0.
The callbacks must be provided using setup and are named in snake case.
*/
const milo = setup({
on_data(p, from, size) {
console.log(`Pos=${milo.getPosition(p)} Body: ${message.slice(from, from + size).toString()}`)
}
})
// Prepare a message to parse.
const message = Buffer.from('HTTP/1.1 200 OK\r\nContent-Length: 3\r\n\r\nabc')
// Allocate a memory in the WebAssembly space. This speeds up data copying to the WebAssembly layer.
const ptr = milo.alloc(message.length)
// Create a buffer we can use normally.
const buffer = Buffer.from(milo.memory.buffer, ptr, message.length)
// Create the parser.
const parser = milo.create()
// Now perform the main parsing using milo.parse. The method returns the number of consumed characters.
buffer.set(message, 0)
milo.parse(parser, ptr, message.length)
// Cleanup used resources.
milo.destroy(parser)
milo.dealloc(ptr, message.length)Finally build and execute it using node:
node index.js
# Pos=38 Body: abcAPI
The module exports several constants (* is used to denote a family prefix):
FLAG_DEBUG: If the debug informations are enabled or not.MESSAGE_TYPE_*: The type of the parser: it can autodetect (default) or only parse requests or response.ERROR_*: An error code.METHOD_*: An HTTP/RTSP request method.CONNECTION_*: AConnectionheader value.CALLBACK_*: A parser callback.STATE_*: A parser state.
Callbacks handling
All callback in Milo have the following signature (TypeScript syntax):
(parser: number, offset: number, length: number) => voidwhere the parameters have the following meaning:
- The current parser.
- The payload offset. Can be
0. - The data length. Can be
0.
If both offset and length are 0, it means the callback has no payload associated.
MessageTypes
An enum listing all possible message types.
Access is supported from string constant or numeric value.
Errors
An enum listing all possible parser errors.
Access is supported from string constant or numeric value.
Methods
An enum listing all possible HTTP/RTSP methods.
Access is supported from string constant or numeric value.
Connections
An enum listing all possible connection (Connection header value) types.
Access is supported from string constant or numeric value.
Callbacks
An enum listing all possible parser callbacks.
Access is supported from string constant or numeric value.
States
An enum listing all possible parser states.
Access is supported from string constant or numeric value.
setup
Create a new milo module instance. Note that this is not a parser yet.
The method accepts a single object containing one or more of the following callbacks:
on_state_changeon_erroron_finishon_message_starton_message_completeon_requeston_responseon_reseton_methodon_urlon_protocolon_versionon_statuson_reasonon_header_nameon_header_valueon_headerson_connecton_upgradeon_chunk_lengthon_chunk_extension_nameon_chunk_extension_valueon_chunkon_bodyon_dataon_trailer_nameon_trailer_valueon_trailers
The return object will be a milo module instance which can be use to create and manage parsers.
The object supports the methods below.
alloc
Allocates a shared memory area with the WebAssembly instance which can be used to pass data to the parser.
The returned value MUST be destroyed later using dealloc.
dealloc(ptr)
Deallocates a shared memory area created with alloc.
create
Creates a new parser.
The returned value MUST be destroyed later using destroy.
destroy(parser)
Destroys a parser.
parse(parser, data, limit)
Parses data up to limit characters.
It returns the number of consumed characters.
reset(parser)
Resets a parser. The second parameters specifies if to also reset the parsed counter.
The following fields are not modified:
positioncontextmodemanage_unconsumedcontinue_without_datacontext
clear(parser)
Clears all values about the message in the parser.
The connection and message type fields are not cleared.
pause(parser)
Pauses the parser. The parser will have to be resumed via resume.
resume(parser)
Resumes the parser.
finish(parser)
Marks the parser as finished. Any new invocation of milo::milo_parse will put the parser in the error state.
fail(parser, code, description)
Marks the parsing a failed, setting a error code and and error message.
getMode(parser)
Returns the parser mode.
isPaused(parser)
Returns true if the parser is paused.
manageUnconsumed(parser)
Returns true if the parser should automatically copy and prepend unconsumed data.
continueWithoutData(parser)
Returns true if the next execution of the parse loop should execute even if there is no more data.
isConnect(parser)
Returns true if the current request used CONNECT method.
skipBody(parser)
Returns true if the parser should skip the body.
getState(parser)
Returns the parser state.
getPosition(parser)
Returns the parser position.
getParsed(parser)
Returns the total bytes consumed from this parser.
getErrorCode(parser)
Returns the parser error.
getMessageType(parser)
Returns the parser current message type.
getMethod(parser)
Returns the parser current request method.
getStatus(parser)
Returns the parser current response status.
getVersionMajor(parser)
Returns the parser current message HTTP version major version.
getVersionMinor(parser)
Returns the parser current message HTTP version minor version.
getConnection(parser)
Returns the parser value for the connection header.
getContentLength(parser)
Returns the parser value of the Content-Length header.
getChunkSize(parser)
Returns the parser expected length of the next chunk.
getRemainingContentLength(parser)
Returns the parser missing data length of the body according to the content_length field.
getRemainingChunkSize(parser)
Returns the parser missing data length of the next chunk according to to the chunk_size field.
hasContentLength(parser)
Returns true if the parser the current message has a Content-Length header.
hasChunkedTransferEncoding(parser)
Returns true if the parser the current message has a Transfer-Encoding: chunked header.
hasUpgrade(parser)
Returns true if the parser the current message has a Connection: upgrade header.
hasTrailers(parser)
Returns true if the parser the current message has a Trailers header.
getErrorDescription(parser)
Returns the parser error description or null.
getCallbackError(parser)
Returns the parser callback error or null.
setMode(parser, value)
Sets the parser mode.
setManageUnconsumed(parser, value)
Sets if the parser should automatically copy and prepend unconsumed data.
setContinueWithoutData(parser, value)
Sets if the next execution of the parse loop should execute even if there is no more data.
setSkipBody(parser, value)
Set if the parser should skip the body.
setIsConnect(parser, value)
Sets if the current request used the CONNECT method.
How it works?
Milo leverages Rust's procedural macro, syn and quote crates to allow an easy definition of states and matchers for the parser.
See the macros internal crate for more information.
The data matching is possible thanks to power of the Rust's match statement applied to data slices.
The resulting parser is as simple state machine which copies the data in only one (optional) specific case: to automatically handle unconsumed portion of the input data.
In all other all cases, no data is copied and the memory footprint is very small as only 30 bool, uintprt_t or uint64_t fields can represent the entire parser state.
Why?
The scope of Milo is to replace llhttp as Node.js main HTTP parser.
This project aims to:
- Make it maintainable and verificable using easy to read Rust code.
- Be performant by avoiding any unnecessary data copy.
- Be self-contained and dependency-free.
To see the rationale behind the replacement of llhttp, check Paolo's talk at Vancouver's Node Collab Summit in January 2023 (slides).
To see the initial disclosure of milo, check Paolo's talk at NodeConf EU 2023 in November 2023 (slides).
Sponsored by
Contributing to milo
- Check out the latest master to make sure the feature hasn't been implemented or the bug hasn't been fixed yet.
- Check out the issue tracker to make sure someone already hasn't requested it and/or contributed it.
- Fork the project.
- Start a feature/bugfix branch.
- Commit and push until you are happy with your contribution.
- Make sure to add tests for it. This is important so I don't break it in a future version unintentionally.
Copyright
Copyright (C) 2023 and above Paolo Insogna ([email protected]) and NearForm (https://nearform.com).
Licensed under the ISC license, which can be found at https://choosealicense.com/licenses/isc or in the LICENSE.md file.

