smart-csv-delimiter
v1.0.0
Published
Intelligent CSV delimiter auto-detection for Node.js - lightweight, fast, and reliable
Maintainers
Readme
smart-csv-delimiter
🎯 Intelligent CSV delimiter auto-detection for Node.js - lightweight, fast, and reliable.
✨ Features
- 🚀 Fast & Lightweight - Zero dependencies, pure Node.js
- 🎯 Smart Detection - Analyzes consistency across multiple lines
- 📊 Confidence Scores - Get detailed detection results with confidence metrics
- 🔧 Highly Configurable - Custom delimiters, sample size, and more
- 💪 TypeScript First - Full type definitions included
- ✅ Well Tested - Comprehensive test suite with 80%+ coverage
📦 Installation
npm install smart-csv-delimiter🚀 Quick Start
import { detectDelimiter } from 'smart-csv-delimiter';
const delimiter = await detectDelimiter('./data.csv');
console.log(`Detected delimiter: ${delimiter}`); // ',' or ';' or '|' or '\t'📖 Usage
Basic Detection
import { detectDelimiter } from 'smart-csv-delimiter';
// Simple detection
const delimiter = await detectDelimiter('./file.csv');
// Returns: ',' | ';' | '|' | '\t' | nullDetailed Detection with Confidence
import { detectDelimiterWithDetails } from 'smart-csv-delimiter';
const result = await detectDelimiterWithDetails('./file.csv');
console.log(result);
// {
// delimiter: ',',
// confidence: 0.95,
// occurrences: 150,
// allScores: {
// ',': 15.2,
// ';': 0,
// '|': 0,
// '\t': 0
// }
// }Advanced Configuration
import { CsvDelimiterDetector } from 'smart-csv-delimiter';
const detector = new CsvDelimiterDetector({
sampleSize: 20, // Analyze first 20 lines (default: 10)
encoding: 'latin1', // File encoding (default: 'utf-8')
customDelimiters: ['#', '@'], // Additional delimiters to test
customOnly: false, // If true, only test custom delimiters
});
const delimiter = await detector.detect('./file.csv');🎓 API Reference
detectDelimiter(filePath, options?)
Detects the CSV delimiter and returns the most likely one.
Parameters:
filePath(string): Path to the CSV fileoptions(DetectionOptions, optional): Configuration options
Returns: Promise<string | null> - The detected delimiter or null
detectDelimiterWithDetails(filePath, options?)
Detects the delimiter and returns detailed analysis.
Parameters:
filePath(string): Path to the CSV fileoptions(DetectionOptions, optional): Configuration options
Returns: Promise<DetectionResult>
interface DetectionResult {
delimiter: string | null; // The detected delimiter
confidence: number; // Confidence score (0-1)
occurrences: number; // Total occurrences found
allScores: Record<string, number>; // Scores for all tested delimiters
}CsvDelimiterDetector
Class-based API for reusable detection with custom configuration.
Constructor Options:
interface DetectionOptions {
sampleSize?: number; // Lines to analyze (default: 10)
customDelimiters?: string[]; // Additional delimiters to test
customOnly?: boolean; // Only test custom delimiters (default: false)
encoding?: BufferEncoding; // File encoding (default: 'utf-8')
}Methods:
detect(filePath: string): Promise<string | null>detectWithDetails(filePath: string): Promise<DetectionResult>
🔍 Supported Delimiters
By default, smart-csv-delimiter detects these common delimiters:
| Delimiter | Character | Name |
|-----------|-----------|------|
| , | Comma | Standard CSV |
| ; | Semicolon | European CSV |
| \| | Pipe | Log files |
| \t | Tab | TSV files |
You can add custom delimiters using the customDelimiters option.
💡 How It Works
The detector uses a smart algorithm that:
- Samples Multiple Lines - Reads the first N lines (configurable)
- Counts Occurrences - Counts each delimiter in every line
- Checks Consistency - Ensures the delimiter count is consistent across lines
- Calculates Score - Combines frequency and consistency for a confidence score
- Returns Best Match - Selects the delimiter with the highest score
This approach is more reliable than single-line detection and handles edge cases like:
- Delimiters appearing in data values
- Inconsistent formatting
- Mixed content
🎯 Use Cases
- Data Import Tools - Automatically handle various CSV formats
- ETL Pipelines - Process CSV files without manual configuration
- Data Analysis - Prepare datasets from unknown sources
- File Validation - Verify CSV structure before parsing
- Multi-Format Support - Build tools that work with any CSV variant
🤝 Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
📄 License
MIT © ReDodge
🔗 Links
⭐ Show Your Support
If this package helped you, please give it a ⭐ on GitHub!
