npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

ocsv

v1.3.1

Published

High-performance RFC 4180 compliant CSV parser with lazy mode for Bun. Parse 10M rows in 5s with minimal memory.

Downloads

42

Readme

OCSV - Odin CSV Parser

A high-performance, RFC 4180 compliant CSV parser written in Odin with Bun FFI support.

Release npm version CI Tests Memory Leaks RFC 4180

Platform Support: macOS

Features

  • High Performance - Fast CSV parsing with SIMD optimizations
  • 🦺 Memory Safe - Zero memory leaks, comprehensive testing
  • RFC 4180 Compliant - Full CSV specification support
  • 🌍 UTF-8 Support - Correct handling of international characters
  • 🔧 Flexible Configuration - Custom delimiters, quotes, comments
  • 📦 Bun Native - Direct FFI integration with Bun runtime
  • 🛡️ Error Handling - Detailed error messages with line/column info
  • 🎯 Schema Validation - Type checking, constraints, type conversion
  • 🌊 Streaming API - Memory-efficient chunk-based processing
  • 🔄 Transform System - Built-in transforms and pipelines
  • 🔌 Plugin System - Extensible architecture for custom functionality

Why Odin + Bun?

Key Advantages:

  • ✅ Simple build system (no node-gyp, no Python)
  • ✅ Better memory safety (explicit memory management + defer)
  • ✅ Better error handling (enums + multiple returns)
  • ✅ No C++ wrapper needed (Bun FFI is direct)

Quick Start

npm Installation (Recommended)

Install OCSV as an npm package for easy integration with your Bun projects:

# Using Bun
bun add ocsv

# Using npm
npm install ocsv

Then use it in your project:

import { parseCSV } from 'ocsv';

// Parse CSV string
const result = parseCSV('name,age\nJohn,30\nJane,25', { hasHeader: true });
console.log(result.headers); // ['name', 'age']
console.log(result.rows);    // [['John', '30'], ['Jane', '25']]

// Parse CSV file
import { parseCSVFile } from 'ocsv';
const data = await parseCSVFile('./data.csv', { hasHeader: true });
console.log(`Parsed ${data.rowCount} rows`);

Manual Installation (Development)

For building from source or contributing:

git clone https://github.com/dvrd/ocsv.git
cd ocsv

Build

Current Support: macOS ARM64 (cross-platform support in progress)

# Using Task (recommended)
task build          # Build release library
task build-dev      # Build debug library
task test           # Run all tests
task info           # Show platform info

# Manual build
odin build src -build-mode:shared -out:libocsv.dylib -o:speed

Basic Usage (Odin)

package main

import "core:fmt"
import ocsv "src"

main :: proc() {
    // Create parser
    parser := ocsv.parser_create()
    defer ocsv.parser_destroy(parser)

    // Parse CSV data
    csv_data := "name,age,city\nAlice,30,NYC\nBob,25,SF\n"
    ok := ocsv.parse_csv(parser, csv_data)

    if ok {
        // Access parsed data
        fmt.printfln("Parsed %d rows", len(parser.all_rows))
        for row in parser.all_rows {
            for field in row {
                fmt.printf("%s ", field)
            }
            fmt.printf("\n")
        }
    }
}

Bun API Examples

Basic Parsing

import { parseCSV } from 'ocsv';

// Parse CSV with headers
const result = parseCSV('name,age,city\nAlice,30,NYC\nBob,25,SF', {
  hasHeader: true
});

console.log(result.headers); // ['name', 'age', 'city']
console.log(result.rows);    // [['Alice', '30', 'NYC'], ['Bob', '25', 'SF']]
console.log(result.rowCount); // 2

Parse from File

import { parseCSVFile } from 'ocsv';

// Parse CSV file with headers
const data = await parseCSVFile('./sales.csv', {
  hasHeader: true,
  delimiter: ',',
});

console.log(`Parsed ${data.rowCount} rows`);
console.log(`Columns: ${data.headers.join(', ')}`);

// Process rows
for (const row of data.rows) {
  console.log(row);
}

Custom Configuration

import { parseCSV } from 'ocsv';

// Parse TSV (tab-separated)
const tsvData = parseCSV('col1\tcol2\trow1\tdata', {
  delimiter: '\t',
  hasHeader: true,
});

// Parse with semicolon delimiter (European CSV)
const europeanData = parseCSV('name;age;city\nJohn;30;Paris', {
  delimiter: ';',
  hasHeader: true,
});

// Relaxed mode (allows some RFC violations)
const relaxedData = parseCSV('messy,csv,"data', {
  relaxed: true,
});

Manual Parser Management

For more control, use the Parser class directly:

import { Parser } from 'ocsv';

const parser = new Parser();
try {
  const result = parser.parse('a,b,c\n1,2,3');
  console.log(result.rows);
} finally {
  parser.destroy(); // Important: free memory
}

Performance Modes

OCSV offers two access modes to optimize for different use cases:

Mode Comparison

| Feature | Eager Mode (default) | Lazy Mode | |---------|---------------------|-----------| | Performance | ~8 MB/s throughput | ≥180 MB/s (22x faster) | | Memory Usage | High (all data in JS) | Low (<200 MB for 10M rows) | | Parse Time (10M rows) | ~150s | <7s (21x faster) | | Access Pattern | Random access, arrays | Random access, on-demand | | Memory Management | Automatic (GC) | Manual (destroy() required) | | Best For | Small files, full iteration | Large files, selective access | | TypeScript Support | Full | Full (discriminated unions) |

Eager Mode (Default)

Best for: Small to medium files (<100k rows), full dataset iteration, simple workflows

All rows are materialized into JavaScript arrays immediately. Easy to use, no cleanup required.

import { parseCSV } from 'ocsv';

// Default: eager mode
const result = parseCSV(data, { hasHeader: true });

console.log(result.headers);   // ['name', 'age', 'city']
console.log(result.rows);      // [['Alice', '30', 'NYC'], ...]
console.log(result.rowCount);  // 2

// Arrays: standard JavaScript operations
result.rows.forEach(row => console.log(row));
result.rows.map(row => row[0]);
result.rows.filter(row => row[1] > '25');

Pros:

  • ✅ Simple API - standard JavaScript arrays
  • ✅ No manual cleanup required
  • ✅ Familiar array methods (map, filter, slice)
  • ✅ Safe for GC-managed memory

Cons:

  • ❌ Slower for large files (7.5x overhead)
  • ❌ High memory usage (all rows in JS heap)
  • ❌ Parse time proportional to data crossing FFI boundary

Lazy Mode (High Performance)

Best for: Large files (>1M rows), selective access, memory-constrained environments

Rows stay in native Odin memory and are accessed on-demand. Achieves near-FFI performance with minimal memory footprint.

import { parseCSV } from 'ocsv';

// Lazy mode: high performance
const result = parseCSV(data, {
  mode: 'lazy',
  hasHeader: true
});

try {
  console.log(result.headers);   // ['name', 'age', 'city']
  console.log(result.rowCount);  // 10000000

  // On-demand row access
  const row = result.getRow(5000000);
  console.log(row.get(0));       // 'Alice'
  console.log(row.get(1));       // '30'

  // Iterate fields
  for (const field of row) {
    console.log(field);
  }

  // Materialize row to array (when needed)
  const arr = row.toArray();     // ['Alice', '30', 'NYC']

  // Efficient slicing (generator)
  for (const row of result.slice(1000, 2000)) {
    console.log(row.get(0));
  }

  // Full iteration (if needed)
  for (const row of result) {
    console.log(row.get(0));
  }

} finally {
  // CRITICAL: Must cleanup native memory
  result.destroy();
}

Pros:

  • 22x faster parse time than eager mode
  • Low memory footprint (<200 MB for 10M rows)
  • ✅ LRU cache (1000 hot rows) for repeated access
  • ✅ Generator-based slicing (memory efficient)
  • ✅ Random access to any row (O(1) after cache)

Cons:

  • Manual cleanup required (destroy() must be called)
  • ❌ Not standard arrays (use .get(i) or .toArray())
  • ❌ Use-after-destroy throws errors

When to Use Each Mode

                    Start
                      |
           Is file size > 100MB or > 1M rows?
                 /         \
               Yes          No
                |            |
         Do you need to    Use Eager Mode
         access all rows?   (simple, safe)
              /    \
            No     Yes
             |      |
        Lazy Mode  Memory constrained?
     (fast, low     /              \
      memory)     Yes               No
                   |                 |
              Lazy Mode         Try Eager first
           (streaming)        (measure, switch if slow)

Use Lazy Mode when:

  • File size > 100 MB or > 1M rows
  • You need selective row access (not full iteration)
  • Memory is constrained (< 1 GB available)
  • You're building streaming/ETL pipelines
  • You need maximum parsing performance

Use Eager Mode when:

  • File size < 100 MB or < 1M rows
  • You need full dataset iteration
  • You prefer simpler API (standard arrays)
  • Memory cleanup must be automatic (GC)
  • You're prototyping or writing quick scripts

Performance Benchmarks

Test Setup: 10M rows, 4 columns, 1.2 GB CSV file

Mode          Parse Time    Throughput    Memory Usage
────────────────────────────────────────────────────────
FFI Direct    6.2s          193 MB/s      50 MB (baseline)
Lazy Mode     6.8s          176 MB/s      <200 MB
Eager Mode    151.7s        7.9 MB/s      ~8 GB

Key Metrics:

  • Lazy mode is 22x faster than eager mode
  • Lazy mode uses 40x less memory than eager mode
  • Lazy mode is only 9% slower than raw FFI (acceptable overhead)

Advanced: High-Performance FFI Mode

For advanced users who need maximum FFI throughput, OCSV offers an optimized packed buffer mode that achieves 61.25 MB/s (56% of native Odin performance).

Performance Comparison (100K rows, 13.80 MB file):

Mode              Throughput    ns/row    vs Native
──────────────────────────────────────────────────────
Native Odin       109.28 MB/s   915       100%
Packed Buffer     61.25 MB/s    2,253     56%
Bulk JSON         40.68 MB/s    2,878     37%
Field-by-Field    29.58 MB/s    3,957     27%

Optimizations:

  • 61.25 MB/s average throughput
  • 🚀 Batched TextDecoder with reduced decoder overhead
  • 💾 Pre-allocated arrays to reduce GC pressure
  • 📊 SIMD-friendly memory access patterns
  • 🔄 Adaptive processing for different row sizes
  • 📦 Binary packed format with length-prefixed strings
  • Single FFI call instead of multiple round-trips

Usage:

import { parseCSVPacked } from 'ocsv/bindings/simple';

// Optimized packed buffer mode (highest FFI performance)
const rows = parseCSVPacked(csvData);
// Returns string[][] with minimal FFI overhead

When to use Packed Buffer:

  • Need maximum FFI throughput (>40 MB/s)
  • Willing to trade API simplicity for performance
  • Working with medium-large files through Bun FFI
  • Want to minimize cross-language boundary overhead

Note: The 44% overhead compared to native Odin is inherent to the FFI serialization boundary. This is the practical limit for JavaScript-based FFI approaches.

Memory Management

Eager Mode

// Automatic cleanup via garbage collector
const result = parseCSV(data);
// ... use result.rows ...
// Memory freed automatically when result goes out of scope

Lazy Mode

// Manual cleanup required
const result = parseCSV(data, { mode: 'lazy' });
try {
  // ... use result ...
} finally {
  // CRITICAL: Always call destroy()
  result.destroy();
}

Common Pitfalls:

Forgetting to destroy:

const result = parseCSV(data, { mode: 'lazy' });
console.log(result.getRow(0));
// Memory leak! Parser not cleaned up

Use after destroy:

const result = parseCSV(data, { mode: 'lazy' });
result.destroy();
result.getRow(0);  // Error: LazyResult has been destroyed

Correct pattern:

const result = parseCSV(data, { mode: 'lazy' });
try {
  const row = result.getRow(0);
  console.log(row.toArray());
} finally {
  result.destroy();
}

TypeScript Support

OCSV provides discriminated union types for type-safe mode selection:

import { parseCSV } from 'ocsv';

// Type: ParseResult (array-based)
const eager = parseCSV(data);
console.log(eager.rows[0]);  // Type: string[]

// Type: LazyResult (on-demand)
const lazy = parseCSV(data, { mode: 'lazy' });
console.log(lazy.getRow(0)); // Type: LazyRow

// Compiler error: mode mismatch
const wrong = parseCSV(data, { mode: 'lazy' });
console.log(wrong.rows);  // Error: Property 'rows' does not exist

Configuration

// Create parser with custom configuration
parser := ocsv.parser_create()
defer ocsv.parser_destroy(parser)

// TSV (Tab-Separated Values)
parser.config.delimiter = '\t'

// European CSV (semicolon)
parser.config.delimiter = ';'

// Comments (skip lines starting with #)
parser.config.comment = '#'

// Relaxed mode (handle malformed CSV)
parser.config.relaxed = true

// Custom quote character
parser.config.quote = '\''

RFC 4180 Compliance

OCSV fully implements RFC 4180 with support for:

  • ✅ Quoted fields with embedded delimiters ("field, with, commas")
  • ✅ Nested quotes ("field with ""quotes"""field with "quotes")
  • ✅ Multiline fields (newlines inside quotes)
  • ✅ CRLF and LF line endings (Windows/Unix)
  • ✅ Empty fields (consecutive delimiters: a,,c)
  • ✅ Trailing delimiters (a,b, → 3 fields, last is empty)
  • ✅ Leading delimiters (,a,b → 3 fields, first is empty)
  • ✅ Comments (extension: lines starting with #)
  • ✅ Unicode/UTF-8 (CJK characters, emojis, etc.)

Example:

# Sales data for Q1 2024
product,price,description,quantity
"Widget A",19.99,"A great widget, now with more features!",100
"Gadget B",29.99,"Essential gadget
Multi-line description",50

Testing

~201 tests, 100% pass rate, 0 memory leaks

# Run all tests (standard)
odin test tests

# Run with memory tracking
odin test tests -debug

Test Suites

The project includes comprehensive test coverage across multiple suites:

  • Basic functionality and core parsing operations
  • RFC 4180 edge cases and compliance
  • Integration tests for end-to-end workflows
  • Schema validation and type checking
  • Transform system and pipelines
  • Plugin system functionality
  • Streaming API with chunk boundaries
  • Large file handling
  • Performance regression monitoring
  • Error handling and recovery strategies
  • Property-based fuzzing tests
  • Parallel processing capabilities
  • SIMD optimization verification

Project Structure

ocsv/
├── src/
│   ├── ocsv.odin         # Main module
│   ├── parser.odin       # RFC 4180 state machine parser
│   ├── parser_simd.odin  # SIMD-optimized parser
│   ├── parser_error.odin # Error-aware parser
│   ├── streaming.odin    # Streaming API
│   ├── parallel.odin     # Parallel processing
│   ├── transform.odin    # Transform system
│   ├── plugin.odin       # Plugin architecture
│   ├── simd.odin         # SIMD search functions
│   ├── error.odin        # Error handling system
│   ├── schema.odin       # Schema validation & type system
│   ├── config.odin       # Configuration types
│   └── ffi_bindings.odin # Bun FFI exports
├── tests/               # Comprehensive test suite
├── plugins/             # Example plugins
├── bindings/            # Bun/TypeScript bindings
├── benchmarks/          # Performance benchmarks
├── examples/            # Usage examples
└── README.md           # This file

Requirements

  • Odin: Latest version (tested with Odin dev-2025-01)
  • Bun: v1.0+ (for FFI integration, optional)
  • Platform: macOS ARM64 (cross-platform support in development)
  • Task: v3+ (optional, for automated builds)

Release Process

This project uses automated releases via semantic-release. Releases are triggered automatically when changes are pushed to the main branch.

Commit Message Format

All commits must follow Conventional Commits:

<type>(<scope>): <subject>

<body>

<footer>

Examples:

git commit -m "feat: add streaming parser API"
git commit -m "fix: handle empty fields correctly"
git commit -m "docs: update installation instructions"
git commit -m "feat!: remove deprecated parseFile method

BREAKING CHANGE: parseFile has been removed, use parseCSVFile instead"

Commit Types:

  • feat: New feature (triggers minor version bump)
  • fix: Bug fix (triggers patch version bump)
  • perf: Performance improvement (triggers patch version bump)
  • docs: Documentation changes (no release)
  • chore: Maintenance tasks (no release)
  • refactor: Code refactoring (no release)
  • test: Test changes (no release)
  • ci: CI/CD changes (no release)

Version Bumps

  • Patch (1.1.0 → 1.1.1): fix:, perf:
  • Minor (1.1.0 → 1.2.0): feat:
  • Major (1.1.0 → 2.0.0): Any commit with BREAKING CHANGE: in footer or ! after type

Release Workflow

  1. Developer pushes commits to main branch
  2. CI runs tests and builds
  3. semantic-release analyzes commits
  4. If releasable changes found:
    • Determines new version number
    • Updates CHANGELOG.md
    • Updates package.json
    • Creates git tag
    • Publishes to npm with provenance
    • Creates GitHub release with prebuilt binaries

Manual Release (Emergency Only):

npm run release:dry  # Test what would be released
git push origin main  # Trigger automated release

Contributing

Contributions are welcome! Please read CONTRIBUTING.md for detailed guidelines on commit messages and pull request process.

Development Workflow:

  1. Fork the repository
  2. Create a feature branch
  3. Make changes with tests (odin test tests)
  4. Ensure zero memory leaks
  5. Submit a pull request

License

MIT License - see LICENSE for details.

Acknowledgments

Related Projects

  • d3-dsv - Pure JavaScript CSV/DSV parser
  • papaparse - Popular JavaScript CSV parser
  • xsv - Rust CLI tool for CSV processing
  • csv-parser - Node.js streaming CSV parser

Contact


Built with ❤️ using Odin + Bun

Version: 1.3.0

Last Updated: 2025-11-09