npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

datasink

v0.1.0

Published

Data hygiene for music PR — scrub, rinse, soak your contact lists

Readme

     ___ (_)__  / /__
    (_-</ / _ \/  '_/
   /___/_/_//_/_/\_\

npm version CI License: MIT Node

Data hygiene for music PR. Scrub, rinse, and soak your contact lists.

Demo uses fictional contacts for illustration.


Quick Start

npx datasink scrub contacts.csv          # validate emails
npx datasink rinse contacts.csv          # deduplicate
npx datasink wash contacts.csv           # full pipeline

Or install globally:

npm install -g datasink
sink scrub contacts.csv

Commands

| Command | Description | | --------------------- | ------------------------------------- | | sink | Interactive menu (no args) | | sink wash <file> | Full pipeline: scrub + rinse + soak | | sink scrub <file> | Validate & clean emails | | sink rinse <file> | Deduplicate contacts | | sink soak <file> | Enrich contacts with AI | | sink spot <email> | Spot-check a single email (with SMTP) | | sink inspect <file> | Data quality score | | sink drain <file> | Convert between formats | | sink tui <file> | Full TUI dashboard |

Why sink?

  • Built for music PR. Knows BBC Radio 1 from Radio X, catches bbc.combbc.co.uk typos, flags role-based emails like press@. Not a generic email validator -- it understands your industry.
  • Zero config. Point it at a CSV and go. Flexible header matching means it works with whatever your spreadsheet exports. No mapping files, no setup wizard.
  • Three phases, one metaphor. Scrub cleans. Rinse deduplicates. Soak enriches. Run them individually or all at once with wash. Like doing the washing up, but for data.

Phases

Scrub

Validates and cleans email addresses:

  • RFC 5322 format validation
  • UK domain typo correction (bbc.combbc.co.uk, gmial.comgmail.com)
  • Disposable domain detection
  • MX record verification
  • Role-based email flagging (press@, info@)
  • Catch-all domain detection
  • Optional SMTP verification (--smtp)

Rinse

Deduplicates and resolves identities:

  • Exact email -- case-insensitive dedup, keeps the richer record
  • Fuzzy name -- Jaro-Winkler similarity within same domain (threshold: 0.92)
  • Cross-field -- matches by phone or website across different emails

Soak

Enriches contacts with AI:

  • Platform type detection (radio, press, playlist, blog, podcast)
  • Genre identification
  • Geographic scope
  • Submission guidelines
  • Pitch tips

Supports Anthropic (Claude Haiku) and OpenAI (GPT-4o-mini).

Global Flags

-o, --output <path>       Output file path
--format <csv|json|jsonl>  Output format (default: csv)
--config <path>            Config file path
--dry-run                  Preview without writing files
--verbose                  Detailed output
-q, --quiet                Suppress all output except errors
--json                     JSON stdout (for piping)
--no-colour                Disable colours
--smtp                     Enable SMTP verification (scrub phase)
--provider <name>          Enrichment provider (anthropic|openai)

Exit Codes

| Code | Meaning | | ---- | --------------------------------------------------------- | | 0 | Success | | 1 | File error (not found, permission denied, is a directory) | | 2 | Parse error (invalid CSV, no usable data) | | 3 | Config error (invalid config file) | | 4 | Pipeline error (enrichment failure, unexpected crash) |

Provider Setup

Anthropic

export ANTHROPIC_API_KEY=sk-ant-...
sink soak contacts.csv --provider anthropic

OpenAI

export OPENAI_API_KEY=sk-...
sink soak contacts.csv --provider openai

Input Format

Accepts CSV files with flexible column names:

| Field | Accepted Headers | | ------- | -------------------------------------------- | | Name | name, contact, full name, person | | Email | email, e mail, email address | | Outlet | outlet, publication, media, company, station | | Role | role, title, position, job title | | Phone | phone, telephone, mobile | | Website | website, url, web | | Notes | notes, comments, description | | Tags | tags, categories, labels |

First/last name columns are automatically joined. Unmapped columns are preserved in extras.

Configuration

Create a sink.config.ts in your project root:

export default {
  scrub: {
    smtp: false,
    mxCacheTTL: 1800,
    smtpTimeout: 10,
    typoMap: './data/custom-typos.json',
  },
  rinse: {
    fuzzyThreshold: 0.92,
    strategies: ['exact-email', 'fuzzy-name', 'cross-field'],
  },
  soak: {
    provider: 'anthropic',
    anthropic: {
      model: 'claude-haiku-4-5-20251001',
      apiKey: process.env.ANTHROPIC_API_KEY,
    },
  },
  output: {
    format: 'csv',
    locale: 'en-GB',
  },
}

Programmatic API

import { runPipeline, loadConfig } from 'datasink'

const config = await loadConfig()
const records = [
  {
    id: '1',
    raw: { name: 'Sarah Jones', email: '[email protected]', outlet: 'BBC Radio 1' },
    phases: [],
    timestamp: new Date().toISOString(),
  },
]

const { records: processed, stats } = await runPipeline(records, {
  phases: ['scrub', 'rinse'],
  config,
})

console.log(stats)

Contributing

See CONTRIBUTING.md for dev setup, code style, and PR guidelines.

Changelog

See CHANGELOG.md for release history.

Licence

MIT