csv-import-sanitizer

v1.0.0

Published

20 days ago

Secure, production-ready utility to sanitize, normalize, validate, and transform CSV import data for SaaS applications.

Downloads

121

0High
0Medium
0Low

dev_salamsheikh

csv sanitize sanitizer import validate normalize csv-parser csv-import data-validation csv-injection security saas

csv-import-sanitizer

Secure, production-ready utility to sanitize, normalize, validate, and transform CSV import data for SaaS applications.

Features

🧹 Header normalization — trim, casing, special char replacement, alias mapping
🔒 Security — CSV formula injection defense, prototype pollution prevention, control character stripping
✅ Schema validation — required fields, type coercion, enum, min/max, custom validators
🔍 Duplicate detection — configurable composite key matching
📊 Row-level error reporting — never crashes; reports every issue with row number, column, error code
📦 Dual ESM/CJS — works everywhere with full TypeScript type definitions

Installation

npm install csv-import-sanitizer

Quick Start

import { sanitizeCsv } from 'csv-import-sanitizer';

const csv = `Email,First Name,Age,Status
[email protected], Alice , 25, active
, Bob, abc, pending
[email protected], Alice2, 30, active`;

const result = sanitizeCsv(csv, {
  headers: {
    casing: 'lowercase',
    aliases: { email: 'email' },
  },
  schema: {
    email: { type: 'string', required: true },
    first_name: { type: 'string', required: true },
    age: { type: 'number', min: 0, max: 150 },
    status: { type: 'string', enum: ['active', 'inactive', 'pending'] },
  },
  duplicateFields: ['email'],
});

console.log(result.validRows);
// [{ email: '[email protected]', first_name: 'Alice', age: 25, status: 'active' }]

console.log(result.summary);
// { totalRows: 3, validRows: 1, invalidRows: 2, duplicateRows: 1, errorsCount: 3 }

API

`sanitizeCsv(input, options?)`

The main entry point. Parses, sanitizes, validates, and deduplicates CSV data.

Parameters:

| Param | Type | Description | |-------|------|-------------| | input | string | Raw CSV string | | options | SanitizeCsvOptions | Configuration (see below) |

Returns: SanitizeResult

interface SanitizeResult {
  validRows: Record<string, unknown>[];
  invalidRows: Record<string, unknown>[];
  errors: RowError[];
  summary: SummaryStats;
}

`normalizeHeaders(headers, options?)`

Normalize an array of raw header strings.

import { normalizeHeaders } from 'csv-import-sanitizer';

const { headers } = normalizeHeaders(
  ['  Email Address ', 'First Name', 'phone #'],
  { casing: 'lowercase', aliases: { email_address: 'email' } }
);
// headers → ['email', 'first_name', 'phone']

`validateRows(rows, schema, rowOffset?)`

Validate row records against a schema.

import { validateRows } from 'csv-import-sanitizer';

const result = validateRows(
  [{ email: '[email protected]', age: '25' }],
  {
    email: { type: 'string', required: true },
    age: { type: 'number', min: 0 },
  }
);
// result.validRows[0].age === 25 (coerced)

`detectDuplicates(rows, keyFields)`

Detect duplicate rows based on composite key fields.

import { detectDuplicates } from 'csv-import-sanitizer';

const result = detectDuplicates(
  [
    { email: '[email protected]', name: 'Alice' },
    { email: '[email protected]', name: 'Bob' },
    { email: '[email protected]', name: 'Alice Copy' },
  ],
  ['email']
);
// result.duplicateIndices → [2]

Options Reference

`SanitizeCsvOptions`

| Option | Type | Default | Description | |--------|------|---------|-------------| | headers.casing | 'lowercase' \| 'uppercase' \| 'none' | 'lowercase' | Header casing | | headers.aliases | Record<string, string> | {} | Alias map (raw → canonical) | | headers.replaceSpecialChars | boolean | true | Replace special chars with _ | | values.normalizeEmptyToNull | boolean | true | Convert empty strings to null | | values.preserveLineBreaks | boolean | false | Keep \n, \r, \t in cells | | values.escapeFormulas | boolean | true | Escape =, +, -, @ prefixes | | schema | Schema | — | Validation schema | | limits.maxRows | number | — | Max data rows | | limits.maxColumns | number | — | Max columns | | limits.maxFileSizeBytes | number | — | Max input size in bytes | | allowedColumns | string[] | — | Whitelist of columns to keep | | removeUnknownColumns | boolean | false | Strip columns not in schema | | duplicateFields | string[] | — | Fields for duplicate detection | | delimiter | string | auto | CSV delimiter |

`SchemaField`

| Property | Type | Description | |----------|------|-------------| | type | 'string' \| 'number' \| 'boolean' \| 'date' | Type coercion target | | required | boolean | Must be non-empty | | enum | (string \| number \| boolean)[] | Allowed values | | minLength / maxLength | number | String length bounds | | min / max | number | Number range bounds | | validate | (value) => true \| string | Custom validator |

Security

This library defends against common CSV import attack vectors:

Formula injection: Cells starting with =, +, -, @ are prefixed with ' to prevent spreadsheet formula execution
Prototype pollution: Headers like __proto__, constructor, prototype are detected and excluded from output
Control characters: Null bytes and non-printable characters (U+0000–U+001F, U+007F) are stripped
Size limits: Configurable row, column, and byte limits to prevent resource exhaustion
No eval: The library never uses eval(), new Function(), or any dynamic code execution

Dependencies

This package has a single runtime dependency:

papaparse — A fast, lightweight CSV parser that correctly handles RFC 4180 edge cases (quoted fields, newlines in cells, BOM, etc.). It has zero dependencies of its own and is used by thousands of packages.

License

MIT

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

csv-import-sanitizer

Features

Installation

Quick Start

API

sanitizeCsv(input, options?)

normalizeHeaders(headers, options?)

validateRows(rows, schema, rowOffset?)

detectDuplicates(rows, keyFields)

Options Reference

SanitizeCsvOptions

SchemaField

Security

Dependencies

License

`sanitizeCsv(input, options?)`

`normalizeHeaders(headers, options?)`

`validateRows(rows, schema, rowOffset?)`

`detectDuplicates(rows, keyFields)`

`SanitizeCsvOptions`

`SchemaField`