paxl
v0.1.0
Published
A high-performance XML to JSON parser implemented in WebAssembly.
Downloads
100
Readme
Paxl
A high-performance XML to JSON parser implemented in WebAssembly.
Paxl parses XML documents and converts them into JSON objects with minimal overhead. It's designed for speed and efficiency, making it ideal for processing large XML files or high-throughput applications.
Features
- Fast: Optimized C code compiled to WebAssembly for maximum performance
- Lightweight: Zero dependencies in the runtime (yyjson is statically linked)
- Standards-compliant: Supports standard XML syntax including attributes, nested elements, and text content
- Node.js ready: Easy to use JavaScript API with ES modules support
Installation
npm install paxlUsage
import { parse } from 'paxl';
const xml = `<root>
<item id="1">
<name>Example</name>
<value>42</value>
</item>
</root>`;
const json = parse(xml);
console.log(json);
// Output: {"children":[{"tagName":"item","attributes":{"id":"1"},"children":[{"tagName":"name","children":["Example"]},{"tagName":"value","children":["42"]}]}]}JSON Output Format
Paxl converts XML to a JSON structure where:
- Elements become objects with a
tagNameproperty and optionalattributesandchildrenproperties - Text content is represented as strings in the
childrenarray - Attributes are collected in an
attributesobject - Nested elements are placed in the
childrenarray
Example:
<book id="123" category="fiction">
<title>Sample Book</title>
<author>John Doe</author>
</book>Becomes:
{
"children": [
{
"tagName": "book",
"attributes": {
"id": "123",
"category": "fiction"
},
"children": [
{
"tagName": "title",
"children": ["Sample Book"]
},
{
"tagName": "author",
"children": ["John Doe"]
}
]
}
]
}Performance
Paxl is designed for high performance and typically outperforms pure JavaScript XML parsers. Benchmarks show significant speedups compared to alternatives like rapidx2j and txml.
Run the included benchmarks to see performance on your system:
node tests/perf_test.jsBuilding from Source
Paxl requires Emscripten to build. Make sure you have the Emscripten SDK installed and activated.
# Clone the repository
git clone https://github.com/eitanwass/paxl.git
cd paxl
# Build the WebAssembly module
make build
# Run tests
node tests/simple_test.jsBuild Options
DEBUG=y: Build with debug symbols and no optimizationsW_ENTRY=y: Include a main function for CLI applications using WASMMAX_XML_DEPTH=256: Maximum XML nesting depth (default: 256)
Requirements
- Node.js >= 12.22.7
- Emscripten (for building from source)
Contributing
Contributions are welcome! Please feel free to submit issues and pull requests.
Roadmap
XSD Validation is not planned! Roadmap of features and support I want to implement, in no particular order.
- CI/CD
- [ ] CI run tests (actually write them this time)
- [ ] CD Distributed build
- Tests
- [ ] Valgrind tests!
- [ ] Reach 100% coverage!
- [ ] typescript support!
- [ ] Non-string attributes
- [ ] Online performance or/and demo
- [ ] Remove yyjson for smaller in-house implementation (unused features)
- Options:
- [ ] Single root option - eliminate top "children" element
- [ ] Parse comments
- [ ] Custom keys (e.g. tagName -> my_tag_name)
- Data validations? (Do we want that? Hit for performance)
- [ ] Parse boolean values to correct form (e.g. False -> false)
- Performance
- [ ] Parallel pass parsing
- [ ] SIMD? (Is it even applicable in this case?)
License
Apache 2.0 license - see LICENSE file for details.
Credits
- Uses yyjson for JSON handling
- Built with Emscripten
