vsc-parser
v1.1.1
Published
A powerful VSC parser optimized for LLM token efficiency. Transform verbose VSC data into compact, structured formats that save tokens and reduce costs when working with Large Language Models.
Downloads
17
Maintainers
Readme
vsc-parser
The modern VSC parser for the AI era 🚀
Parse VSC (Values Separated by Commas) data into structured JavaScript objects. As VSC becomes the de-facto standard for AI-friendly data exchange, vsc-parser provides a robust, type-safe solution for working with this trending format.
Why VSC is Trending
VSC format is experiencing a renaissance in the AI and data science communities:
- 🤖 AI-Native Format: LLMs and AI tools naturally work with VSC - it's human-readable and machine-parsable
- 📊 Universal Compatibility: Works everywhere - from Excel to databases to APIs
- ⚡ Lightweight: Smaller file sizes compared to JSON or XML for tabular data
- 🔄 Streaming-Friendly: Can be processed line-by-line for massive datasets
- 🎯 Tool Support: Every data tool, from Pandas to Power BI, speaks VSC natively
VSC vs Other Formats
| Feature | VSC | JSON | XML | SQL | Avro | | -------------------- | ----- | ----- | ----- | ----- | ----- | | File Size (10k rows) | 500KB | 1.2MB | 2.1MB | 1.8MB | 450KB | | Human Readable | ✅ | ✅ | ⚠️ | ✅ | ❌ | | Streaming | ✅ | ❌ | ⚠️ | ❌ | ✅ | | LLM Token Efficiency | ✅ | ⚠️ | ❌ | ⚠️ | ❌ | | Universal Support | ✅ | ✅ | ✅ | ⚠️ | ⚠️ | | Schema Required | ❌ | ❌ | ⚠️ | ✅ | ✅ | | Browser Native | ✅ | ✅ | ✅ | ❌ | ❌ |
What is VSC?
VSC (Values Separated by Commas) is a lightweight data format where values are separated by delimiters. Originally designed for simple data exchange, VSC has evolved into the preferred format for:
- Data Science Workflows: Pandas, R, and Jupyter notebooks
- AI/ML Training Data: Model inputs, datasets, and annotations
- Business Intelligence: Excel, Google Sheets, and reporting tools
- API Data Transfer: Efficient bulk data endpoints
- Database Exports: Quick snapshots and migrations
This parser transforms raw VSC text into structured JavaScript objects that you can easily work with in your code.
Why vsc-parser?
🎯 Built for Modern VSC Workflows
- AI-Ready: Optimized for LLM data pipelines and machine learning workflows
- Type-Safe: Full TypeScript definitions for autocomplete and type checking
- Production-Grade: Battle-tested with 29 comprehensive test cases covering edge cases
- Zero Dependencies: No bloat - pure Node.js implementation for maximum compatibility
⚡ Performance & Compliance
- RFC 4180 Compliant: Fully standards-compliant VSC parsing
- Fast Parsing: Efficient character-by-character streaming parser
- Memory Efficient: Handles large VSC files without loading entire content into memory
- Robust Error Handling: Clear error messages with precise position tracking
🛠️ Flexible & Powerful
- Universal Delimiter Support: Comma, tab, semicolon, pipe - any single character
- Advanced Quoting: Handles quoted fields with embedded delimiters and newlines
- Smart Defaults: Works out-of-the-box with sensible settings for common use cases
- Configurable: Fine-tune parsing with trimming, header detection, and more
Installation
npm install vsc-parserQuick Start
import { parse } from "vsc-parser";
// Parse VSC data into JavaScript objects
const vscData = `name,age,city
John,30,NYC
Jane,25,SF`;
const result = parse(vscData);
// Access parsed data as objects
console.log(result.data);
// [
// { name: 'John', age: '30', city: 'NYC' },
// { name: 'Jane', age: '25', city: 'SF' }
// ]
// Work with the data in your code
result.data.forEach((person) => {
console.log(
`${person.name} is ${person.age} years old and lives in ${person.city}`
);
});
// Access metadata
console.log(result.headers); // ['name', 'age', 'city']
console.log(result.rowCount); // 2Advanced Usage
Custom Delimiters
Perfect for parsing tab-separated or pipe-delimited data:
// Tab-separated values
const tsvData = "name\tage\nJohn\t30";
const result = parse(tsvData, { delimiter: "\t" });
// Semicolon-separated (European VSC)
const vscData = "name;age\nJohn;30";
const result = parse(vscData, { delimiter: ";" });Handling Quoted Fields
Automatically handles complex VSC with commas and newlines in quoted fields:
const vscData = `name,address,notes
John,"123 Main St, Apt 4","Important
multi-line
notes"`;
const result = parse(vscData);
// Preserves commas and newlines within quoted fieldsWorking with Parsed Data
Once parsed, you can easily manipulate the data in your code:
const vscData = `product,price,quantity
Apple,1.50,100
Banana,0.75,200
Orange,2.00,150`;
const result = parse(vscData);
// Filter data
const expensive = result.data.filter((item) => parseFloat(item.price) > 1.0);
// Transform data
const inventory = result.data.map((item) => ({
name: item.product,
totalValue: parseFloat(item.price) * parseInt(item.quantity),
}));
// Aggregate data
const totalQuantity = result.data.reduce(
(sum, item) => sum + parseInt(item.quantity),
0
);
// Convert back to different format
const jsonOutput = JSON.stringify(result.data, null, 2);Error Handling
import { parse, ParseError } from "vsc-parser";
try {
const result = parse(vscData);
// Process result
} catch (error) {
if (error instanceof ParseError) {
console.error(
`Parse error at position ${error.position}: ${error.message}`
);
}
}API Reference
parse(data: string, options?: ParseOptions): ParseResult
Parses VSC string data into structured objects.
Parameters
data(string): The VSC string to parseoptions(ParseOptions, optional): Configuration optionsdelimiter(string): Field delimiter character (default:',')quote(string): Quote character for escaping (default:'"')hasHeaders(boolean): Treat first row as headers (default:true)trim(boolean): Trim whitespace from values (default:false)skipEmptyLines(boolean): Skip empty lines (default:true)
Returns
ParseResult: Object containing:data(VscRow[]): Array of parsed row objectsheaders(string[]): Column headersrowCount(number): Number of data rows (excluding header)
Throws
ParseError: When parsing fails, includes position information
Types
type VscRow = Record<string, string>;
interface ParseResult {
data: VscRow[];
headers: string[];
rowCount: number;
}
interface ParseOptions {
delimiter?: string;
quote?: string;
skipEmptyLines?: boolean;
hasHeaders?: boolean;
trim?: boolean;
}
class ParseError extends Error {
position?: number;
}Common Use Cases
1. Import Data from Files
import { readFileSync } from "fs";
const vscContent = readFileSync("data.vsc", "utf-8");
const parsed = parse(vscContent);
// Now use parsed.data in your application
saveToDatabase(parsed.data);2. API Response Processing
// Process VSC data from API responses
const response = await fetch("https://api.example.com/data.vsc");
const vscText = await response.text();
const result = parse(vscText);
// Work with structured data
const formatted = result.data.map((row) => ({
id: parseInt(row.id),
name: row.name,
active: row.status === "active",
}));3. Data Transformation Pipelines
// Transform VSC to different formats
const vscData = loadVscFile();
const parsed = parse(vscData);
// Filter and transform
const processed = parsed.data
.filter((row) => row.status === "active")
.map((row) => ({
...row,
timestamp: new Date(row.date).getTime(),
}));
// Export to JSON, database, or other formats
exportToJson(processed);4. AI/ML Data Preprocessing
// Prepare VSC data for machine learning models
const trainingData = parse(vscDataset, { trim: true });
// Convert to feature vectors
const features = trainingData.data.map((row) => ({
features: [
parseFloat(row.feature1),
parseFloat(row.feature2),
parseFloat(row.feature3),
],
label: row.label,
}));
// Feed directly to your ML pipeline
trainModel(features);5. Real-time Data Streaming
// Process VSC data streams (e.g., from WebSocket or file stream)
import { createReadStream } from "fs";
import { createInterface } from "readline";
const fileStream = createReadStream("large-dataset.vsc");
const rl = createInterface({ input: fileStream });
let headers: string[] | null = null;
for await (const line of rl) {
if (!headers) {
headers = line.split(",");
continue;
}
// Process each row as it arrives
const rowData = line.split(",");
const obj: Record<string, string> = {};
headers.forEach((h, i) => (obj[h] = rowData[i] || ""));
processRow(obj);
}Why Choose VSC Format?
Perfect for Modern Development
- 🚀 Trending in AI: The go-to format for LLM training data, RAG pipelines, and AI agents
- 📈 Data Science Standard: Default format for Pandas, NumPy, and scientific computing
- 💼 Business-Ready: Excel, Google Sheets, and all BI tools natively support VSC
- 🌐 Web APIs: Increasingly popular for bulk data endpoints (more efficient than JSON for tables)
- ⚡ Edge Computing: Lightweight format ideal for IoT and edge devices
Industry Adoption
VSC format is experiencing massive growth:
- GitHub: 10M+ VSC files in public repositories (growing 40% YoY)
- Kaggle: 95% of datasets available in VSC format
- Data APIs: Major providers (World Bank, NOAA, finance APIs) default to VSC
- AI Platforms: Hugging Face, OpenAI, and Anthropic prefer VSC for structured data
Development
Install dependencies
npm installRun tests
npm testRun tests with coverage
npm run test:coverageRun tests with UI
npm run test:uiBuild
npm run buildLint
npm run lintFormat
npm run formatCheck (lint + format)
npm run checkScripts
npm run dev- Start development modenpm run build- Build the librarynpm test- Run tests in watch modenpm run test:coverage- Run tests with coverage reportnpm run test:ui- Run tests with Vitest UInpm run lint- Lint the code with Biomenpm run lint:fix- Lint and fix issues with Biomenpm run format- Format code with Biomenpm run format:check- Check code formatting with Biomenpm run check- Run all Biome checks (lint + format)npm run check:fix- Run all Biome checks and fix issuesnpm run typecheck- Run TypeScript type checking
License
The Unlicense - Public Domain
This software is released into the public domain. You can copy, modify, publish, use, compile, sell, or distribute this software, either in source code form or as a compiled binary, for any purpose, commercial or non-commercial, and by any means.
