npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2025 – Pkg Stats / Ryan Hefner

@andyball/schema-forge

v0.1.7

Published

A npm package which does two things: returns a schema for a dataset, and returns a formated version of a dataset based on its schema.

Downloads

29

Readme

schema-forge

Infer a dataset schema and format the dataset based on that schema.

  • Detects column types from sample data: Number, Boolean, Date, String
  • Recognizes common date formats and returns strftime-style formats (e.g. %Y-%m-%d)
  • Normalizes numeric fields: strips commas, currency symbols/codes, and %
  • Optional behaviors via options (e.g., convert empty numeric values to null)
  • Flags String columns with repeated values (repeating: true|false)
  • Flags Number columns that are strictly sequential (sequential: true|false)
  • Heuristic year detection: 4‑digit integers in the range 1800–2100 are classified as Date with format: "%Y"

Install

npm install @andyball/schema-forge

Quick start

const { getSchema, toJSONSchema, dataFormat } = require('@andyball/schema-forge');

const rows = [
  { name: 'Alice', age: '32', revenue: '$2,345.50', when: '2024-10-01' },
  { name: 'Bob',   age: '23', revenue: '$1,100',     when: '01/10/2024' }
];

console.log(getSchema(rows));
console.log(dataFormat(rows));

// With options
console.log(dataFormat(rows, { numberEmptyAsNull: true }));

Importing

CommonJS (require):

const { getSchema, toJSONSchema, dataFormat } = require('@andyball/schema-forge');
// or
const forge = require('@andyball/schema-forge');
forge.getSchema(...);

ESM (import):

// After v0.1.1+ you can use named ESM imports
import { getSchema, toJSONSchema, dataFormat } from '@andyball/schema-forge';
// or default
import forge from '@andyball/schema-forge';

API

getSchema(data, options?)

  • data: Array<object> or JSON string
  • options:
    • preferStringNumbers: boolean (default: false)
      • If true, numeric-looking strings remain as String unless all values are numeric
    • sanitizeKeys: boolean (default: false)
      • If true, column names are sanitized: Unicode formatting/invisible/control chars removed, whitespace collapsed, NFC normalized, and deduped; original preserved on each column as sourceName.
    • useNullMarkersForInference: boolean (default: false)
      • If true, values matched by null markers/sentinels are skipped during inference. They are excluded from completeness, distinct/topK, and type/format tallies so they don't skew detection. Full null normalization still occurs in dataFormat based on inferred types.

Returns an array of column descriptors (abbreviated example):

[
  {
    "name": "name",
    "type": "String",
    "format": "mixed",
    "repeating": true,
    "score": 0.72,
    "probably": "Text",
    "tally": { "Text": 180, "URL": 20 },
    "completeness": 1,
    "distinctCount": 160,
    "cardinality": 0.8889,
    "topK": [["Alice", 10], ["Bob", 8]],
    "uniquenessRatio": 0.8,
    "isUnique": false,
    "isPrimaryKey": false
  },
  {
    "name": "revenue",
    "type": "Number",
    "format": "Currency",
    "sequential": false,
    "numStats": { "min": 100, "max": 2345.5, "mean": 1234.2, "stdev": 345.1 },
    "score": 0.9,
    "probably": "Currency",
    "tally": { "Currency": 90, "Float": 10 },
    "completeness": 0.98,
    "distinctCount": 95,
    "cardinality": 0.9694,
    "topK": [["$100", 5], ["$200", 4]],
    "uniquenessRatio": 0.94,
    "isUnique": false,
    "isPrimaryKey": false
  },
  {
    "name": "year",
    "type": "Date",
    "format": "%Y",
    "sequential": true,
    "score": 1,
    "tally": { "%Y": 100 },
    "completeness": 1,
    "distinctCount": 10,
    "cardinality": 0.1,
    "topK": [["2020", 12], ["2021", 11]],
    "uniquenessRatio": 0.1,
    "isUnique": false,
    "isPrimaryKey": false
  }
]

Notes on detection:

  • Numbers: handles 1,234.56, $2,000, 12%, 1234 USD; outputs format Integer, Float, Currency, or Percentage.
  • Numbers also include sequential: true when successive numeric values increase by 1 (requires at least 2 numbers; non-numeric/empty values are ignored for the check).
  • Year heuristic: when a Number column consists only of 4‑digit integers all within 1800–2100 (and not currency/percentage/float), it is upgraded to type: "Date" with format: "%Y" and retains the sequential flag.
  • Booleans: true/false (case-insensitive) recognized.
  • Dates: recognizes ISO and common regional formats; returns strftime format (e.g., %d/%m/%Y, %H:%M:%S). Mixed formats → format: "mixed".
  • Strings: URLs, emails, IP addresses, phone numbers, colors, and data URIs get subformats; others default to null format. String descriptors also include repeating: true if any non-empty value repeats within the column.

Scoring, probability and tally:

  • tally: counts per detected format in the column (e.g., { Text: 120, Percentage: 30 }).
  • score: the dominance of the most frequent format, computed as max(tally.values) / sum(tally.values).
  • probably: the label of the most frequent format from tally.
    • For String columns where the top label is Text, extra hints apply:
      • If values repeat: becomes Categories (or Categories: a, b, c when the unique set is small), or Zero-variance column if only one unique value.
      • If comma/semicolon/pipe-delimited lists dominate: List.
      • If Percentage dominates overall, Percentage.

Column metrics:

  • completeness: fraction of non-empty values (nonEmptyCount / totalRows).
  • distinctCount: number of unique non-empty values (normalised as strings).
  • cardinality: distinctCount / nonEmptyCount (0 when nonEmptyCount is 0).
  • topK: up to 5 most frequent values as [ [value, count], ... ].
  • Number columns include numStats: { min, max, mean, stdev }.
  • String columns include textStats: { minLen, maxLen, avgLen }.
  • uniquenessRatio: distinctCount / totalRows.
  • isUnique: distinctCount === nonEmptyCount and nonEmptyCount > 0.
  • isPrimaryKey: isUnique and completeness === 1.

dataFormat(data, options?)

  • data: Array<object>
  • options:
    • preferStringNumbers: boolean (default: false) – passed through to getSchema
    • numberEmptyAsNull: boolean (default: true) – empty numeric values become null
    • bestGuess: boolean (default: true) – when true, uses each column's probably to coerce String columns that predominantly look numeric (e.g., Percentage, Currency, Integer, Float). In this mode, numeric-looking strings are converted to numbers (stripping %, currency symbols/codes, commas), and non-numeric strings in those columns become null (respecting numberEmptyAsNull).
    • reportIgnored: boolean (default: false) – when true, logs to console only the values that are converted to null in numeric and numeric-like columns (e.g., empty strings, non-numeric placeholders in percentage/currency columns).
    • report: (msg: string) => void – optional logger; used when reportIgnored is true (defaults to console.log).
    • convertDates: boolean (default: false) – when true, converts columns inferred as Date into JavaScript Date objects. Uses native parsing for ISO/date-time and a loose parser for common D/M/Y and M/D/Y forms. Unparseable values are left unchanged.
    • sanitizeKeys: boolean (default: false) – if enabled, output rows use sanitized keys and map from original keys using sourceName in schema.
    • dropEmptyColumns: boolean (default: false) – if enabled, columns with no non-empty values are excluded from schema and removed from formatted output rows.

Returns a new array of formatted rows. Current conversions:

  • Number columns: strings like "1,234.56", "$2,000", "12%" → numbers
  • Boolean columns: "true"/"false" → booleans
  • Date columns: left as-is unless convertDates is enabled, in which case values are converted to JavaScript Date instances when parsable.
  • String columns: left as-is unless bestGuess is enabled.

Example:

const rows = [
  { revenue: "$2,345,000.50", pct: "12%", active: "true", when: "31/12/2020" },
  { revenue: "$100", pct: "", active: "false", when: "2020-12-30" }
];

console.log(getSchema(rows));
console.log(dataFormat(rows, { reportIgnored: true }));

toJSONSchema(schemaOrRows, options?)

Generate a JSON Schema (draft 2020-12) from a schema array or directly from rows (in which case getSchema runs first).

  • If column type is Number, maps to number (or integer for format: "Integer"). Includes minimum/maximum when available.
  • Boolean maps to boolean.
  • Date maps to string with hints:
    • %Ypattern: ^(?:18|19|20|21)\d{2}$
    • %Y-%m-%d or %Y/%m/%dformat: date
    • date-time forms with %H/%Iformat: date-time
  • String columns include minLength/maxLength from text stats when available; Financial year adds a pattern.
  • Columns with completeness === 1 are marked as required.

Example:

import { getSchema, toJSONSchema, dataFormat } from '@andyball/schema-forge';

const rows = [ { id: 1, name: 'Alice' }, { id: 2, name: 'Bob' } ];
const schema = getSchema(rows);
const jsonSchema = toJSONSchema(schema);
const wrangleData = dataFormat(rows)

What is JSON Schema (draft 2020-12)?

JSON Schema is a standard, machine-readable vocabulary for describing the shape and constraints of JSON data. It lets you specify what properties exist, their types, value ranges, string patterns, required fields, and more. Tools can then validate data against a schema, generate forms and UI, scaffold types, and power API contracts.

  • Validation: Check that incoming/outgoing JSON matches the expected structure before processing.
  • Interoperability: Share schemas across services, clients, and pipelines as a single source of truth.
  • Tooling ecosystem: Works with validators (e.g., AJV), documentation generators, code generators, and form builders.
  • Draft 2020-12: A widely adopted version of the JSON Schema specification with improved features and semantics.

Learn more at the official site: json-schema.org.

CLI-like local test

Run the built-in example:

npm run example

Development

Install dev deps and build:

npm i -D typescript
npm run build

Run a quick test (after build):

npm test

Publish

  1. Update package.json fields (name, description, author, keywords).
  2. Login: npm login
  3. Publish: npm publish --access public