schema_analysis

v0.7.0

Published

2 months ago

Infer schemas from JSON, YAML, XML, TOML, CBOR, and BSON

0High
0Medium
0Low

quartzlibrary

schema analysis json yaml serde

schema_analysis

Universal-ish Schema Analysis

Ever wished you could figure out what was in that json file? Or maybe it was xml... Ehr, yaml? It was definitely toml.

Alas, many great tools will only work with one of those formats, and the internet is not so nice a place as to finally understand that no, xml is not an acceptable data format.

Enter this neat little tool, a single interface to any self-describing format supported by our gymnast friend, serde.

Features

Works with any self-describing format with a Serde implementation.
Suitable for large files.
Keeps track of some useful info for each type (opt out with --minimal).
Keeps track of null/missing/duplicate values separately.
Integrates with Schemars and json_typegen to produce types and a json schema if needed.
There's a demo website here.

Installation

# Run without installing
npx schema_analysis data.json
# or
uvx schema_analysis data.json
# or
pipx run schema_analysis data.json

# Install
npm install -g schema_analysis
# or
pip install schema_analysis
# or
uv tool install schema_analysis
# or
cargo install schema_analysis --features cli --locked

CLI Usage

schema_analysis can infer schemas and generate types from data directly from the command line.

schema_analysis [OPTIONS] [FILES]...

It auto-detects the input format from file extensions (.json, .yaml/.yml, .xml, .toml, .cbor, .bson) and reads from stdin if no files are provided.

Options:

| Option | Description | Default | | --- | --- | --- | | --format <FORMAT> | Override input format (json, yaml, xml, toml, cbor, bson) | auto-detected | | --output <OUTPUT> | Output mode (schema, rust, typescript, typescript-alias, kotlin, kotlin-kotlinx, json-schema, shape) | schema | | --name <NAME> | Root type name for code generation | Root | | --compact | Compact JSON output (no pretty printing) | | | --minimal | Skip analysis info (counts, samples, min/max, etc.), outputting only the schema structure | |

Examples:

# Infer a schema from a JSON file
schema_analysis data.json

# Generate Rust types
schema_analysis data.json --output rust --name MyData

# Generate TypeScript interfaces
schema_analysis api.json --output typescript --name ApiResponse

# Generate JSON Schema
schema_analysis data.json --output json-schema

# Merge multiple files into a single schema
schema_analysis file1.json file2.json file3.json

# Read from stdin
cat data.json | schema_analysis --format json

Library Usage

For use as a library, see the Rust crate or the repo.

Performance

These are not proper benchmarks, but should give a vague idea of the performance on a i7-7700HQ laptop (2017) laptop with the raw data already loaded into memory.

| Size | wasm (MB/s) | native (MB/s) | Format | File # | | --------------------- | ------------ | ------------- | ------ | ------ | | ~180MB | ~20s (9) | ~5s (36) | json | 1 | | ~650MB | ~150s (4.3) | ~50s (13) | json | 1 | | ~1.7GB | ~470s (3.6) | ~145s (11.7) | json | 1 | | ~2.1GB | a | ~182s (11.5) | json | 1 | | ~13.3GBb | | ~810s (16.4) | xml | ~200k |

a This one seems to go over some kind of browser limit when fetching the data in the Web Worker, I believe I would have to split large files to handle it.

b ~2.7GB compressed. This one seems like it would be a worst-case scenario because it includes decompression overhead and the files had a section that was formatted text which resulted in crazy schemas. (The json pretty printed schema was almost 0.5GB!)

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

schema_analysis

Universal-ish Schema Analysis

Features

Installation

CLI Usage

Library Usage

Performance