schema_analysis
v0.7.0
Published
Infer schemas from JSON, YAML, XML, TOML, CBOR, and BSON
Maintainers
Readme
schema_analysis
Universal-ish Schema Analysis
Ever wished you could figure out what was in that json file? Or maybe it was xml... Ehr, yaml? It was definitely toml.
Alas, many great tools will only work with one of those formats, and the internet is not so nice a place as to finally understand that no, xml is not an acceptable data format.
Enter this neat little tool, a single interface to any self-describing format supported by our gymnast friend, serde.
Features
- Works with any self-describing format with a Serde implementation.
- Suitable for large files.
- Keeps track of some useful info for each type (opt out with --minimal).
- Keeps track of null/missing/duplicate values separately.
- Integrates with Schemars and json_typegen to produce types and a json schema if needed.
- There's a demo website here.
Installation
# Run without installing
npx schema_analysis data.json
# or
uvx schema_analysis data.json
# or
pipx run schema_analysis data.json
# Install
npm install -g schema_analysis
# or
pip install schema_analysis
# or
uv tool install schema_analysis
# or
cargo install schema_analysis --features cli --lockedCLI Usage
schema_analysis can infer schemas and generate types from data directly from the command line.
schema_analysis [OPTIONS] [FILES]...It auto-detects the input format from file extensions (.json, .yaml/.yml, .xml, .toml, .cbor, .bson)
and reads from stdin if no files are provided.
Options:
| Option | Description | Default |
| --- | --- | --- |
| --format <FORMAT> | Override input format (json, yaml, xml, toml, cbor, bson) | auto-detected |
| --output <OUTPUT> | Output mode (schema, rust, typescript, typescript-alias, kotlin, kotlin-kotlinx, json-schema, shape) | schema |
| --name <NAME> | Root type name for code generation | Root |
| --compact | Compact JSON output (no pretty printing) | |
| --minimal | Skip analysis info (counts, samples, min/max, etc.), outputting only the schema structure | |
Examples:
# Infer a schema from a JSON file
schema_analysis data.json
# Generate Rust types
schema_analysis data.json --output rust --name MyData
# Generate TypeScript interfaces
schema_analysis api.json --output typescript --name ApiResponse
# Generate JSON Schema
schema_analysis data.json --output json-schema
# Merge multiple files into a single schema
schema_analysis file1.json file2.json file3.json
# Read from stdin
cat data.json | schema_analysis --format jsonLibrary Usage
For use as a library, see the Rust crate or the repo.
Performance
These are not proper benchmarks, but should give a vague idea of the performance on a i7-7700HQ laptop (2017) laptop with the raw data already loaded into memory.
| Size | wasm (MB/s) | native (MB/s) | Format | File # | | --------------------- | ------------ | ------------- | ------ | ------ | | ~180MB | ~20s (9) | ~5s (36) | json | 1 | | ~650MB | ~150s (4.3) | ~50s (13) | json | 1 | | ~1.7GB | ~470s (3.6) | ~145s (11.7) | json | 1 | | ~2.1GB | a | ~182s (11.5) | json | 1 | | ~13.3GBb | | ~810s (16.4) | xml | ~200k |
a This one seems to go over some kind of browser limit when fetching the data in the Web Worker, I believe I would have to split large files to handle it.
b ~2.7GB compressed. This one seems like it would be a worst-case scenario because it includes decompression overhead and the files had a section that was formatted text which resulted in crazy schemas. (The json pretty printed schema was almost 0.5GB!)
