@origints/xlsx

v0.2.0

Published

10 days ago

XLSX parsing for Origins with full traceability to sheets, ranges, and cells.

Downloads

188

0High
0Medium
0Low

fponticelli

origints data-extraction lineage provenance xlsx excel spreadsheet

@origints/xlsx

XLSX parsing for Origins with full traceability to sheets, ranges, and cells.

Features

Parse XLSX files from streams, buffers, or file paths
Navigate workbooks, sheets, ranges, and cells
Cursor-based iteration for sequential processing
Cell predicates for conditional extraction
Convert ranges to JSON, arrays, or objects
Export to CSV format
Full source location tracking for every cell

Installation

npm install @origints/xlsx @origints/core

Usage with Planner

Extract cell values from a spreadsheet

import { Planner, loadFile, run } from '@origints/core'
import { parseXlsx } from '@origints/xlsx'

const plan = new Planner()
  .in(loadFile('data.xlsx'))
  .mapIn(parseXlsx())
  .emit((out, $) =>
    out
      .add('title', $.firstSheet().cell('A1').string())
      .add('revenue', $.firstSheet().cell('B2').number())
  )
  .compile()

const result = await run(plan, { readFile, registry })
// result.value: { title: 'Q4 Report', revenue: 150000 }

Extract from specific sheets

const plan = new Planner()
  .in(loadFile('report.xlsx'))
  .mapIn(parseXlsx())
  .emit((out, $) =>
    out
      .add('totalSales', $.sheet('Sales').cell('B10').number())
      .add('totalExpenses', $.sheet('Expenses').cell('B10').number())
  )
  .compile()

Extract rows from a range

Use range().rows() to iterate over rows in a range and extract structured data:

// Spreadsheet has headers in row 1: Name | Age | Department
// Data rows in A2:C10

const plan = new Planner()
  .in(loadFile('employees.xlsx'))
  .mapIn(parseXlsx())
  .emit((out, $) =>
    out.add(
      'employees',
      $.firstSheet()
        .range('A2:C4')
        .rows(row => ({
          kind: 'object',
          properties: {
            name: row.col(1).string(),
            age: row.col(2).number(),
            dept: row.col(3).string(),
          },
        }))
    )
  )
  .compile()

const result = await run(plan, { readFile, registry })
// result.value: {
//   employees: [
//     { name: 'Alice', age: 30, dept: 'Engineering' },
//     { name: 'Bob', age: 25, dept: 'Marketing' },
//     ...
//   ]
// }

Range from two corners

range() accepts two addresses or two dynamic cell builders:

// Two string addresses — equivalent to range("A2:C10")
$.firstSheet().range("A2", "C10").rows(...)

// Dynamic corners — cells resolved at runtime
const topLeft = $.firstSheet().find(cell.equals("Name"));
const bottomRight = $.firstSheet().find(cell.equals("Total")).left();
$.firstSheet().range(topLeft, bottomRight).rows((row) => ({
  kind: "object",
  properties: {
    name: row.col(1).string(),
    value: row.col(2).number(),
  },
}));

Collect rows with predicates

Use eachSlice() to iterate rows from a starting cell while a predicate holds, with header-relative column access:

import { cell, rowCol } from '@origints/xlsx'

const hasData = rowCol(0, cell.isNotEmpty()).and(
  rowCol(0, cell.startsWith('Total').not())
)

const plan = new Planner()
  .in(loadFile('report.xlsx'))
  .mapIn(parseXlsx())
  .emit((out, $) => {
    const header = $.firstSheet().find(cell.equals('Name'))

    return out.add(
      'people',
      header.down().eachSlice('down', hasData, row => ({
        kind: 'object',
        properties: {
          name: row.colWhere(header, cell.equals('Name')).string(),
          role: row.colWhere(header, cell.equals('Role')).string(),
        },
      }))
    )
  })
  .compile()

Collect cell values in a direction

Use eachCell() to gather cells in a direction while a predicate matches:

import { cell } from '@origints/xlsx'

const plan = new Planner()
  .in(loadFile('data.xlsx'))
  .mapIn(parseXlsx())
  .emit((out, $) =>
    out.add(
      'values',
      $.firstSheet()
        .cell('B2')
        .eachCell('down', cell.isNotEmpty(), c => c.number())
    )
  )
  .compile()

const result = await run(plan, { readFile, registry })
// result.value: { values: [100, 200, 300, 400] }

Optional and fallback extraction

Handle missing or invalid cell values using optional(), tryExtract(), mapSpec(), and guard():

import { literal, optional, tryExtract, mapSpec, guard } from '@origints/core'
import { cell, rowCol } from '@origints/xlsx'

const plan = new Planner()
  .in(loadFile('portfolio.xlsx'))
  .mapIn(parseXlsx())
  .emit((out, $) => {
    const header = $.firstSheet().find(cell.equals('Company'))
    const hasData = rowCol(0, cell.isNotEmpty()).and(
      rowCol(0, cell.startsWith('Total').not())
    )

    return out.add(
      'companies',
      header.down().eachSlice('down', hasData, row => ({
        kind: 'object',
        properties: {
          // Required field
          name: row.colWhere(header, cell.equals('Company')).string(),
          // Optional: returns null when cell is empty or has wrong type
          ownership: optional(
            row.colWhere(header, cell.equals('Ownership %')).number(),
            null
          ),
          // Fallback: try number first, then parse string, then null
          revenue: tryExtract(
            row.colWhere(header, cell.equals('Revenue')).number(),
            mapSpec(
              row.colWhere(header, cell.equals('Revenue')).string(),
              v => parseFloat((v as string).replace(/[,$]/g, '')),
              'parseFloat'
            ),
            literal(null)
          ),
          // Guard: ensure investment amount is positive
          investment: guard(
            row.colWhere(header, cell.equals('Investment')).number(),
            v => (v as number) > 0,
            'Investment must be positive'
          ),
        },
      }))
    )
  })
  .compile()

Combine with other data sources

const plan = new Planner()
  .in(loadFile('budget.xlsx'))
  .mapIn(parseXlsx())
  .emit((out, $) => out.add('budget', $.firstSheet().cell('B2').number()))
  .in(loadFile('config.json'))
  .mapIn(parseJson())
  .emit((out, $) => out.add('department', $.get('department').string()))
  .compile()

Standalone usage (without Planner)

For direct workbook navigation:

import { parseXlsxAsyncImpl, XlsxWorkbook } from '@origints/xlsx'

const workbook = (await parseXlsxAsyncImpl.execute(buffer)) as XlsxWorkbook

// Navigate sheets
const sheetResult = workbook.sheet('Sheet1')
if (sheetResult.ok) {
  const sheet = sheetResult.value

  // Read a cell
  const cellResult = sheet.cell('A1')
  if (cellResult.ok) {
    console.log(cellResult.value.string().value)
  }

  // Work with ranges
  const rangeResult = sheet.range('A1:D10')
  if (rangeResult.ok) {
    const range = rangeResult.value
    // Convert to array of arrays
    const rows = range.toArray()
    // Convert to objects using first row as headers
    const records = range.toObjects()
  }
}

Cursor-based iteration

import { XlsxCursor } from '@origints/xlsx'

const sheetResult = workbook.sheet('Data')
if (sheetResult.ok) {
  const cursor = new XlsxCursor(sheetResult.value, 'A1')

  // Move and grab values
  cursor.move('right', 2)
  const value = cursor.grab()

  // Iterate rows
  while (cursor.hasMore('down')) {
    const row = cursor.grabRow(4)
    cursor.move('down')
  }
}

API

| Export | Description | | ---------------------------------- | ----------------------------------------------------- | | parseXlsx(options?) | Create a transform AST for use with Planner.mapIn() | | parseXlsxImpl | Sync transform implementation | | parseXlsxAsyncImpl | Async transform implementation (stream/buffer) | | registerXlsxTransforms(registry) | Register all XLSX transforms with a registry | | XlsxWorkbook | Workbook navigation | | XlsxSheet | Sheet navigation with cell/range access | | XlsxRange | Range operations and conversion | | XlsxCell | Cell value extraction | | XlsxCursor | Sequential cursor-based iteration |

License

MIT

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

@origints/xlsx

Features

Installation

Usage with Planner

Extract cell values from a spreadsheet

Extract from specific sheets

Extract rows from a range

Range from two corners

Collect rows with predicates

Collect cell values in a direction

Optional and fallback extraction

Combine with other data sources

Standalone usage (without Planner)

Cursor-based iteration

API

License