npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2024 – Pkg Stats / Ryan Hefner

schema-detector

v1.0.1

Published

Heuristic "column" type & size analysis w/ enumeration detection.

Downloads

8

Readme

Schema Analyzer

An Open Source joint by Dan Levy

Analyze column type & size summary from any input JSON array!

Schema Analyzer is the core library behind Dan's Schema Generator.

Features

The primary goal is to support any input JSON/CSV and infer as much as possible. More data will generally yield better results.

  • [x] Heuristic type analysis for arrays of objects.
  • [x] Browser-based (local, no server necessary)
  • [x] Automatic type detection:
    • [x] Object Id (MongoDB's 96 bit/12 Byte ID. 4B timestamp + 3B MachineID + 2B ProcessID + 3B counter)
    • [x] UUID/GUID (Common 128 bit/16 Byte ID. Stored as a hex string, dash delimited in parts: 8, 4, 4, 4, 12)
    • [x] Boolean (detects obvious strings true, false, Y, N)
    • [x] Date (Smart detection via comprehensive regex pattern)
    • [x] Timestamp (integer, number of milliseconds since unix epoch)
    • [x] Currency (62 currency symbols supported)
    • [x] Float (w/ scale & precision measurements)
    • [x] Number (Integers)
    • [x] Null (sparse column data helps w/ certain inferences)
    • [x] Email (falls back to string)
    • [x] String (big text and varchar awareness)
    • [x] Array (includes min/max/avg length)
    • [x] Object
  • [x] Detects column size minimum, maximum and average
  • [x] Includes data points at the 30th, 60th and 90th percentiles (for detecting outliers and enum types!)
  • [x] Handles some error/outliers
  • [x] Quantify # of unique values per column
  • [ ] Identify enum Fields w/ Values
  • [ ] Identify Not Null fields
  • [ ] Nested data structure & multi-table relational output.
  • [ ] Un-de-normalize JSON into flat typed objects.

Data Analysis Results

What does this library's analysis look like?

It consists of 3 key top-level properties:

  • totalRows - # of rows analyzed.
  • uniques - a 'map' of field names & the # of unique values found.
  • fields - field names with all detected types (includes metadata for each type detected, with any overlaps. e.g. an Email is also a String, "42" is a String and Number)

Review the raw results below

Details about nested types can be found below.

{
  "totalRows": 5,
  "uniques": {
    "id": 5,
    "name": 5,
    "role": 3,
    "email": 5,
    "createdAt": 5,
    "accountConfirmed": 2
  },
  "fields": {
    "id": {
      "Number": {
        "value": { "min": 1, "avg": 3, "max": 5, "percentiles": [ 2, 4, 5 ] },
        "count": 5,
        "rank": 8
      },
      "String": {
        "length": { "min": 1, "avg": 1, "max": 1, "percentiles": [ 1, 1, 1 ] },
        "count": 5,
        "rank": 12
      }
    },
    "name": {
      "String": {
        "length": { "min": 3, "avg": 7.2, "max": 15, "percentiles": [ 3, 10, 15 ] },
        "count": 5,
        "rank": 12
      }
    },
    "role": {
      "String": {
        "length": { "min": 4, "avg": 5.4, "max": 9, "percentiles": [ 4, 5, 9 ] },
        "count": 5,
        "rank": 12
      }
    },
    "email": {
      "Email": {
        "count": 5,
        "rank": 11
      },
      "String": {
        "length": { "min": 15, "avg": 19.4, "max": 26, "percentiles": [ 5, 3, 6 ] },
        "count": 5,
        "rank": 12
      }
    },
    "createdAt": {
      "String": {
        "length": { "min": 6, "avg": 9.2, "max": 10, "percentiles": [ 0, 0, 0 ] },
        "count": 5,
        "rank": 12
      }
    },
    "accountConfirmed": {
      "Boolean": {
        "count": 5,
        "rank": 3
      },
      "String": {
        "length": { "min": 4, "avg": 4.4, "max": 5, "percentiles": [ 4, 5, 5 ] },
        "count": 5,
        "rank": 12
      }
    }
  }
}

Sample input dataset for the example results above:

| id | name | role | email | createdAt | accountConfirmed | |----|-----------------|-----------|------------------------------|------------|------------------| | 1 | Eve | poweruser | [email protected] | 01/20/2020 | false | | 2 | Alice | user | [email protected] | 02/02/2020 | true | | 3 | Bob | user | [email protected] | 12/31/2019 | true | | 4 | Elliot Alderson | admin | [email protected] | 01/01/2001 | false | | 5 | Sam Sepiol | admin | [email protected] | 9/9/99 | true |

Number & String Range Object Details

Numeric and String types include a summary of the observed field sizes:

  • min the minimum number or string length
  • max the maximum number or string length
  • avg the average number or string length
  • percentiles[30th, 60th, 90th] values from the Nth percentile number or string length
{
  "min": 1, "avg": 1, "max": 1,
  "percentiles": [ 1, 1, 1 ]
}

Notes

We recommend you provide at least 100+ rows. Accuracy increases greatly with 1,000 rows.

The following features require a certain minimum # of records:

  • Enumeration detection.
    • 100+ Rows Required.
    • Number of unique values must not exceed 20 or 5% of the total number of records. (100 records will identify as Enum w/ 5 values. Up to 20 are possible given 400 or 1,000+.)
  • Not Null detection.
    • where rowCount === field count