npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

@didrod2539/datalint

v0.1.0

Published

Lint CSV/TSV data files locally for quality issues: ragged rows, type drift, missing values, duplicates, mixed date formats, numeric outliers, and optional schema violations. Column profiling, JSON/Markdown reports, no dependencies on a data lib, no API k

Readme

📊 datalint

Lint your CSVs before they break your pipeline — locally, no Python, no API key.

npm version CI node license

A deterministic CLI that profiles every column of a CSV/TSV file and lints it for data-quality problems — ragged rows, type drift, missing values, duplicates, mixed date formats, numeric outliers, and optional schema violations — with a quality score, A–F grade and JSON/Markdown reports.


One-line summary

datalint reads your CSV/TSV files, infers each column's type, profiles the data, and reports every quality issue that would trip up an import or analysis — 100% locally, no API key, no server, and no dependency on a data library (the CSV parser is hand-rolled).

Why this project exists

CSV is the universal data format, and it's almost always messy. A file that "looks fine" in a spreadsheet hides:

  • Ragged rows — an unescaped comma silently shifts every column after it.
  • Type drift — a number column with a stray N/A, , or 1.2.3.
  • Mixed date formats2024-01-05 next to 01/06/2024 (which is which?).
  • Missing values, duplicates, stray whitespace, inconsistent casing (US vs us), and outliers that are really data-entry errors.

Eyeballing this doesn't scale, and feeding a 50k-row file to an LLM gets you a confident-but-wrong summary. You want a deterministic, repeatable audit you can run on every export and gate in CI. That's datalint.

Key features

  • 🧱 Dependency-free CSV/TSV parser — RFC 4180 quotes, embedded newlines, escaped quotes, CRLF/LF, plus automatic delimiter detection.
  • 🔎 Column profiling — inferred type, empty rate, distinct count, min/max/mean, and top values for every column.
  • 🚦 12 built-in checks — ragged rows, duplicate/empty headers, empty columns/rows, missing values, type drift, whitespace, mixed date formats, inconsistent casing, duplicate rows, and numeric outliers (Tukey/IQR).
  • 📐 Optional schema — required, type, enum, min/max, regex pattern, unique, not-null constraints per column.
  • 📊 Quality score + A–F grade, per file and overall.
  • 📄 JSON & Markdown export, colored console output, CI gate exit codes.
  • ⚙️ Config file, custom delimiter, headerless mode, per-rule severities.
  • 🔒 Runs entirely offline. Nothing is uploaded.

Install

# run without installing
npx @didrod2539/datalint scan data.csv

# or install
npm install -g @didrod2539/datalint    # global CLI (provides `datalint`)
npm install -D @didrod2539/datalint    # project dev-dependency (for CI)

Node ≥ 18. ESM + CJS + TypeScript types.

Quick start

datalint scan data.csv
data.csv  42/100 (F)  12 rows × 8 cols · comma
  • id integer · 11 distinct
  • email email · 12 distinct
  • country string · 5 distinct
  • signup_date date · 11 distinct
  • amount decimal · 10 distinct
  • note string · 3 distinct 75% empty
  ✗ 1 row(s) have a different column count than the header (8)
  ✗ Duplicate header "email" (columns 3 and 4)
  ⚠ Column "note" is 75.0% empty (9/12)
  ⚠ Column "amount" looks decimal but 1 value(s) don't match
  ⚠ Column "signup_date" mixes 2 date formats
  ⚠ 1 duplicate row(s)
  ℹ Column "country" has 1 value(s) that differ only by case

Overall  42/100 (F)  1 file(s), 12 row(s), 2 error(s), 4 warning(s), 1 info

CLI usage

datalint scan [...targets]    # analyze CSV/TSV files or directories
datalint report <input.json>  # re-render a saved JSON report as Markdown
datalint init                 # scaffold datalint.config.json (with a schema)
datalint --help
datalint --version

scan options:

| Option | Description | | --- | --- | | --config <file> | Path to a config file (otherwise auto-detected) | | --delimiter <char> | , \t ; \| or auto (default) | | --no-header | Treat the first row as data (synthesize column names) | | --json <file> | Write a JSON report | | --md <file> | Write a Markdown report | | --min-score <n> | Exit non-zero if the overall score < n (CI gate) | | --quiet | Hide info-level issues in the console |

Point scan at a directory and it finds every *.csv, *.tsv, *.txt recursively.

Example result

Full reports for the bundled sample files are in examples/sample-report.md and examples/sample-report.json.

📸 Screenshot / demo GIF placeholder: ./docs/screenshot.png — record the terminal running npx @didrod2539/datalint scan examples/messy.csv.

Configuration

Create datalint.config.json (or run datalint init):

{
  "delimiter": "auto",
  "hasHeader": true,
  "maxEmptyRate": 0.1,
  "enumThreshold": 20,
  "outlierIqrFactor": 1.5,
  "minScore": 80,
  "disableRules": [],
  "ruleSeverity": { "inconsistent-case": "warning" },
  "schema": [
    { "name": "id", "type": "integer", "required": true, "unique": true },
    { "name": "email", "type": "email", "notNull": true },
    { "name": "amount", "type": "decimal", "min": 0, "max": 100000 },
    { "name": "country", "enum": ["US", "CA", "UK"] }
  ]
}

| Field | Meaning | | --- | --- | | delimiter | "auto" or a literal delimiter | | hasHeader | Whether row 1 is a header | | maxEmptyRate | Warn columns above this empty rate (0–1) | | enumThreshold | Max distinct values for casing checks to apply | | outlierIqrFactor | Tukey IQR multiplier (1.5 default; 0 disables outliers) | | minScore | CI gate threshold (overridable with --min-score) | | disableRules | Rule ids to turn off | | ruleSeverity | Override severity per rule id | | schema | Optional per-column constraints |

Rule ids: ragged-rows, duplicate-headers, empty-column, empty-row, missing-values, type-drift, whitespace, mixed-date-formats, inconsistent-case, duplicate-rows, outliers, and schema-*.

Real-world use cases

  1. Gate a data pipeline in CI. Add datalint scan ./exports --min-score 85 to your workflow. A nightly export that arrives with shifted columns or a broken date format fails the build instead of corrupting downstream tables.
  2. Vet a file before import. Before loading a vendor/marketing CSV into your warehouse, run datalint scan leads.csv --md audit.md and fix what it finds.
  3. Profile an unfamiliar dataset. Run datalint scan dataset.csv to instantly see each column's type, null rate, distinct count and ranges — a fast EDA pass without spinning up a notebook.

Programmatic API

import { analyze, buildReport, toMarkdown } from "@didrod2539/datalint";

const ds = analyze({ source: "data.csv", content });
console.log(ds.score, ds.grade, ds.profiles, ds.issues);

const report = buildReport([ds], { version: "0.1.0" });
await fs.writeFile("report.md", toMarkdown(report));

Roadmap

  • Excel (.xlsx) and Parquet input.
  • Cross-file referential checks (foreign keys across CSVs).
  • A --fix mode to auto-trim whitespace and normalize obvious issues.
  • An HTML report with charts.
  • A GitHub Action that comments data-quality on PRs.
  • Streaming mode for very large files.

FAQ

Does it send my data anywhere? No. datalint runs entirely on your machine — no API key, no telemetry, no uploads, no network calls.

Do I need to define a schema? No. datalint is useful with zero config — it infers column types and catches drift, duplicates, missing values, etc. A schema is optional for stricter checks.

How does it parse CSV? With a small, hand-rolled RFC 4180 parser (no external CSV library) that handles quoted fields, embedded delimiters/newlines, escaped quotes and CRLF/LF — so behavior is fully predictable. Delimiter is auto-detected or set via config.

How are dates / types detected? By deterministic pattern matching (src/infer.ts). Type inference is conservative; ambiguous cells fall back to string. The date check recognizes common ISO and slash/dot formats and flags a column that mixes more than one.

Is the quality score official? No — it's a transparent metric: each issue costs a base penalty plus an amount scaled by how much of the data it affects, weighted by severity (src/score.ts). Use it to track and gate quality.

My valid data is being flagged — how do I silence it? Use disableRules, ruleSeverity, maxEmptyRate, or outlierIqrFactor in the config. Every heuristic is tunable.

Contributing

Contributions welcome! Each check is a small, self-contained rule in src/rules/. See CONTRIBUTING.md and the Code of Conduct.

git clone https://github.com/didrod205/datalint.git
cd datalint
npm install
npm test
npm run build
node dist/cli.js scan examples/messy.csv

License

MIT © datalint contributors

💖 Sponsor

datalint is free, MIT-licensed, and built in spare time. If it caught a bad export before it hit production, please consider supporting it:

Where your support goes: Excel/Parquet input, cross-file referential checks, a --fix autoclean mode, an HTML report, a PR-commenting GitHub Action, and fast issue responses.