@bernierllc/csv-validator
v0.3.0
Published
Atomic CSV data validation and error correction utilities
Readme
@bernierllc/csv-validator
Atomic CSV data validation and error correction utilities for comprehensive data quality management.
Features
- Comprehensive Field Validation: Support for string, number, integer, boolean, date, email, URL, phone, and custom field types
- Business Rule Validation: Custom validation logic for complex business requirements
- Error Correction: Automatic fixing of common data format issues
- Bulk Processing: Efficient validation of large datasets with parallel processing
- Detailed Error Reporting: Rich error messages with suggestions and confidence scores
- Schema Validation: Validate CSV schemas and field definitions
- Statistics and Analytics: Get validation statistics and error analysis
Installation
npm install @bernierllc/csv-validatorQuick Start
import { CSVValidator, ValidationSchema } from '@bernierllc/csv-validator';
// Define your validation schema
const schema: ValidationSchema = {
fields: [
{ name: 'name', type: 'string', required: true, minLength: 2, maxLength: 50 },
{ name: 'email', type: 'email', required: true },
{ name: 'age', type: 'integer', required: false },
{ name: 'active', type: 'boolean', required: false }
],
required: [0, 1] // name and email are required
};
// Create validator
const validator = new CSVValidator(schema);
// Validate a row
const row = ['John Doe', '[email protected]', '30', 'true'];
const result = validator.validateRow(row);
console.log(result.isValid); // true
console.log(result.errors); // []API Reference
CSVValidator
Main validator class for single row validation.
Constructor
new CSVValidator(schema: ValidationSchema, businessRules?: BusinessRule[])Methods
validateRow(row: string[], lineNumber?: number): ValidationResult
Validates a single row of CSV data.
const result = validator.validateRow(['John Doe', '[email protected]', '30']);static validateRow(row: string[], schema: ValidationSchema, businessRules?: BusinessRule[], lineNumber?: number): ValidationResult
Static convenience method for one-off validation.
const result = CSVValidator.validateRow(row, schema, businessRules);BulkCSVValidator
Bulk validator for processing multiple rows with statistics.
Constructor
new BulkCSVValidator(schema: ValidationSchema, businessRules?: BusinessRule[])Methods
validateRows(rows: string[][]): BulkValidationResult
Validates multiple rows and returns comprehensive results.
const bulkValidator = new BulkCSVValidator(schema);
const result = bulkValidator.validateRows([
['John Doe', '[email protected]', '30'],
['Jane Smith', '[email protected]', '25'],
['Bob Johnson', 'invalid-email', '150']
]);
console.log(result.totalRows); // 3
console.log(result.validRows); // 2
console.log(result.invalidRows); // 1
console.log(result.totalErrors); // 1getValidationStats(rows: string[][]): ValidationStats
Get detailed validation statistics.
const stats = bulkValidator.getValidationStats(rows);
console.log(stats.errorRate); // 0.33
console.log(stats.averageErrorsPerRow); // 0.33getMostCommonErrors(rows: string[][]): Array<{code: string, count: number, message: string}>
Get the most common validation errors.
const commonErrors = bulkValidator.getMostCommonErrors(rows);
// [{ code: 'INVALID_EMAIL', count: 5, message: 'Field must be a valid email address' }]validateRowsParallel(rows: string[][], batchSize?: number): Promise<BulkValidationResult>
Validate rows in parallel for large datasets.
const result = await bulkValidator.validateRowsParallel(rows, 1000);static validateRows(rows: string[][], schema: ValidationSchema, businessRules?: BusinessRule[]): BulkValidationResult
Static convenience method for bulk validation.
const result = BulkCSVValidator.validateRows(rows, schema, businessRules);CSVErrorFixer
Automatic error correction for common data format issues.
Constructor
new CSVErrorFixer(schema: ValidationSchema, businessRules?: BusinessRule[])Methods
fixRow(row: string[], errors: ValidationError[]): FixedRow
Attempts to fix validation errors in a row.
const fixer = new CSVErrorFixer(schema);
const fixedRow = fixer.fixRow(row, errors);
console.log(fixedRow.hasChanges); // true
console.log(fixedRow.confidence); // 0.8
console.log(fixedRow.fixes); // Array of applied fixesstatic fixRow(row: string[], errors: ValidationError[], schema: ValidationSchema, businessRules?: BusinessRule[]): FixedRow
Static convenience method for error fixing.
const fixedRow = CSVErrorFixer.fixRow(row, errors, schema);Field Types
String
{ name: 'name', type: 'string', required: true, minLength: 2, maxLength: 50 }Number/Float
{ name: 'price', type: 'number', required: true }
{ name: 'amount', type: 'float', required: true }Integer
{ name: 'age', type: 'integer', required: false }Boolean
{ name: 'active', type: 'boolean', required: false }
// Accepts: 'true', 'false', '1', '0', 'yes', 'no'Date
{ name: 'birthDate', type: 'date', required: false }
// Accepts ISO date format: '1990-01-01'{ name: 'email', type: 'email', required: true }URL
{ name: 'website', type: 'url', required: false }Phone
{ name: 'phone', type: 'phone', required: false }Custom
{ name: 'custom', type: 'custom', custom: (value) => value.startsWith('ABC') }Business Rules
Define custom validation logic that applies to entire rows.
const businessRules: BusinessRule[] = [
{
name: 'email_domain_check',
condition: (row) => {
const email = row[1]; // email field
return email.includes('@company.com');
},
message: 'Email must be from company.com domain',
severity: 'warning'
},
{
name: 'age_salary_validation',
condition: (row) => {
const age = parseInt(row[2]); // age field
const salary = parseInt(row[3]); // salary field
return age < 18 ? salary < 50000 : true;
},
message: 'Minors cannot have salary above $50,000',
severity: 'error'
}
];Row Constraints
Define constraints that apply to entire rows.
const schema: ValidationSchema = {
fields: [...],
constraints: [
{
name: 'age_range',
condition: (row) => {
const age = parseInt(row[2]);
return age >= 0 && age <= 120;
},
message: 'Age must be between 0 and 120',
severity: 'error'
}
]
};Error Correction
The error fixer can automatically correct common data format issues:
- Numbers: Remove non-numeric characters
- Integers: Remove decimal parts
- Booleans: Convert various formats to standard true/false
- Dates: Convert common formats to ISO format
- Emails: Clean and normalize email addresses
- URLs: Add missing protocols
- Phone Numbers: Remove formatting characters
- Enums: Fix case sensitivity and find partial matches
const fixer = new CSVErrorFixer(schema);
const errors = validator.validateRow(row).errors;
const fixedRow = fixer.fixRow(row, errors);
if (fixedRow.hasChanges) {
console.log('Applied fixes:', fixedRow.fixes);
console.log('Confidence:', fixedRow.confidence);
}Validation Results
ValidationResult
interface ValidationResult {
isValid: boolean;
errors: ValidationError[];
suggestions: Suggestion[];
warnings: ValidationWarning[];
totalErrors: number;
totalWarnings: number;
totalSuggestions: number;
}ValidationError
interface ValidationError {
field: string;
index: number;
value: string;
message: string;
code: string;
severity: 'error' | 'warning';
suggestion?: string;
lineNumber?: number;
columnNumber?: number;
}Suggestion
interface Suggestion {
field: string;
index: number;
originalValue: string;
suggestedValue: string;
confidence: number;
reason: string;
}Examples
Basic Validation
import { CSVValidator } from '@bernierllc/csv-validator';
const schema = {
fields: [
{ name: 'name', type: 'string', required: true },
{ name: 'email', type: 'email', required: true },
{ name: 'age', type: 'integer', required: false }
]
};
const validator = new CSVValidator(schema);
const result = validator.validateRow(['John Doe', '[email protected]', '30']);
if (!result.isValid) {
console.log('Validation errors:', result.errors);
console.log('Suggestions:', result.suggestions);
}Bulk Validation with Statistics
import { BulkCSVValidator } from '@bernierllc/csv-validator';
const bulkValidator = new BulkCSVValidator(schema);
const result = bulkValidator.validateRows(rows);
console.log(`Valid rows: ${result.validRows}/${result.totalRows}`);
console.log(`Error rate: ${result.summary.errorRate}`);
console.log('Most common errors:', result.summary.mostCommonErrors);Error Correction
import { CSVErrorFixer } from '@bernierllc/csv-validator';
const fixer = new CSVErrorFixer(schema);
const validator = new CSVValidator(schema);
const row = ['John Doe', 'john at example.com', 'abc'];
const validationResult = validator.validateRow(row);
if (!validationResult.isValid) {
const fixedRow = fixer.fixRow(row, validationResult.errors);
if (fixedRow.hasChanges) {
console.log('Fixed row:', fixedRow.row);
console.log('Applied fixes:', fixedRow.fixes);
}
}Custom Business Rules
const businessRules = [
{
name: 'senior_discount',
condition: (row) => {
const age = parseInt(row[2]);
const discount = parseFloat(row[4]);
return age >= 65 ? discount <= 0.25 : true;
},
message: 'Senior discount cannot exceed 25%',
severity: 'error'
}
];
const validator = new CSVValidator(schema, businessRules);Performance
- Single Row Validation: ~0.1ms per row
- Bulk Validation: ~1000 rows/second
- Parallel Processing: ~5000 rows/second (with batching)
- Memory Usage: ~1MB per 10,000 rows
Error Codes
| Code | Description |
|------|-------------|
| REQUIRED_FIELD | Required field is missing or empty |
| INVALID_NUMBER | Field is not a valid number |
| INVALID_INTEGER | Field is not a valid integer |
| INVALID_BOOLEAN | Field is not a valid boolean |
| INVALID_DATE | Field is not a valid date |
| INVALID_EMAIL | Field is not a valid email address |
| INVALID_URL | Field is not a valid URL |
| INVALID_PHONE | Field is not a valid phone number |
| MIN_LENGTH | Field is shorter than minimum length |
| MAX_LENGTH | Field is longer than maximum length |
| PATTERN_MISMATCH | Field does not match required pattern |
| INVALID_ENUM | Field value is not in allowed enum values |
| BUSINESS_RULE_VIOLATION | Business rule validation failed |
| ROW_CONSTRAINT_VIOLATION | Row constraint validation failed |
| CUSTOM_VALIDATION | Custom validation function failed |
License
Bernier LLC - All rights reserved.
This package is licensed to the client under a limited-use license. The client may use and modify this code only within the scope of the project it was delivered for. Redistribution or use in other products or commercial offerings is not permitted without written consent from Bernier LLC.
