xlsx-for-ai

v1.4.4

Published

2 days ago

CLI that converts .xlsx files into rich text or JSON dumps that AI coding agents (Claude, Cursor, Copilot, ChatGPT, etc.) can read — preserving values, formulas, formatting, colors, column widths, frozen panes, named ranges, tables, and more.

Downloads

1,044

0High
0Medium
0Low

senoff

xlsx excel ai claude cursor copilot chatgpt llm agent cli spreadsheet text json converter ai-readable

xlsx-for-ai

👋 New here? Not a programmer? → Read WHY.md for the plain-English version. The README below is the technical reference.

The bidirectional bridge between spreadsheets and AI agents. Reads .xlsx (and .xls, .xlsb, .ods, .csv, .tsv) into the formats LLMs actually consume — markdown, JSON, text, SQL — and writes spreadsheets back out from AI-generated specs. Same tool, both directions.

AI tools — Claude, Cursor, Copilot, ChatGPT, and other LLM coding agents — can read text files but not .xlsx binaries. This CLI closes the loop:

📖 Read mode (default) — turn any spreadsheet into LLM-readable output. Every formula, every named range, every merged cell, every fill color, every cross-sheet reference. No more pasting numbers and losing context.

✍️ Write mode (xlsx-for-ai write) — turn an AI-generated JSON or markdown spec into a real .xlsx file. Closes the round-trip so an agent that reviews your spreadsheet can also deliver the corrected file. The output includes a _xlsx-for-ai review tab explaining every structural change the round-trip made (with risks, tradeoffs, and overrides) — the supervisor model: AI does the work, the human stays in control of every decision. Verified lossless on 29/30 real workbooks.

Input formats: .xlsx .xls .xlsb .ods .csv .tsv

Output modes: text dump, markdown tables (best LLM comprehension per token), JSON, SQL CREATE TABLE+INSERT, inferred schema, workbook diff, real .xlsx (write mode).

It extracts everything a human would see in Excel:

Values — strings, numbers, dates
Formulas — the actual formula expression, plus shared-formula references
Formatting — bold, italic, font colors, background fills
Number formats — percentages, currency, custom patterns
Layout — column widths, frozen panes, merged cells, alignment
Hyperlinks — URLs embedded in cells
Comments / notes — cell annotations
Named ranges — workbook-defined names and their references
Hidden rows & columns — flagged so the AI knows data is suppressed
Data validation — dropdown lists, numeric constraints
Tables — Excel Table objects with their names and column headers
Images & charts — existence and position noted (content not rendered)
Auto-filters — active filter ranges
Print areas — defined print regions

Previously published as cursor-reads-xlsx. The old name still works as an alias on the CLI, but please install the new package: npm install -g xlsx-for-ai.

Install

npm install -g xlsx-for-ai

Or run directly with npx (no install needed):

npx xlsx-for-ai budget.xlsx

Usage

# Dump all sheets
npx xlsx-for-ai data.xlsx

# Dump a specific sheet
npx xlsx-for-ai data.xlsx "Sheet1"

# List sheet names and dimensions without dumping
npx xlsx-for-ai data.xlsx --list-sheets

# Print to stdout instead of writing files
npx xlsx-for-ai data.xlsx --stdout

# Limit to first 200 rows per sheet (useful for huge files)
npx xlsx-for-ai data.xlsx --max-rows 200

# Limit to first 8 columns (useful for very wide sheets)
npx xlsx-for-ai data.xlsx --max-cols 8

# Suppress noisy default tags (default text colors, white fills, etc.)
npx xlsx-for-ai data.xlsx --stdout --compact

# Emit structured JSON (one entry per cell) instead of the text dump
npx xlsx-for-ai data.xlsx --json --stdout > out.json

# Combine flags
npx xlsx-for-ai data.xlsx "Sheet1" --stdout --max-rows 50 --compact

Options

Output modes (mutually exclusive; default = text):

| Flag | Description | |------|-------------| | --md | Markdown tables — highest LLM comprehension per token | | --json | Structured JSON, one object per cell | | --sql | CREATE TABLE + INSERT statements (uses inferred schema) | | --schema | Per-column schema (name, type, nullable, samples) as JSON |

Selection:

| Flag | Description | |------|-------------| | [sheetName] | Positional: dump only this sheet | | --range A1:D50 | Dump only this rectangular range | | --named-range NAME | Dump only the cells covered by a workbook-defined name | | --max-rows N | Cap at the first N rows per sheet | | --max-cols N | Cap at the first N columns per sheet |

Output control:

| Flag | Description | |------|-------------| | --list-sheets | Print sheet names + dimensions and exit | | --stdout | Print to stdout instead of writing files in .xlsx-read/ | | --compact | Suppress noisy default tags (default colors, "General" format) | | --max-tokens N | Truncate output to ~N tokens; appends a tail summary noting what was dropped | | --evaluate | Promote cached formula results to primary value; re-evaluate simple formulas via formulajs |

Other modes:

| Flag | Description | |------|-------------| | --diff OTHER | Diff this workbook vs OTHER — emit changed/added/removed cells and sheets | | --stream | Streaming reader for huge .xlsx files (>100MB); emits row-by-row, drops some sheet metadata | | -h, --help | Show help |

Write mode (`xlsx-for-ai write`)

The write sub-command produces a real .xlsx from a JSON or markdown spec.

xlsx-for-ai write spec.json                    # → spec.xlsx
xlsx-for-ai write spec.json -o report.xlsx     # explicit output
xlsx-for-ai write report.md                    # markdown table → xlsx
cat spec.json | xlsx-for-ai write -            # stdin

Minimum JSON spec:

{
  "name": "Budget",
  "headers": ["Category", "Q1", "Q2"],
  "rows": [
    ["Marketing", 10000, 12000],
    ["R&D", 50000, 55000]
  ]
}

Multi-sheet, with formulas:

{
  "sheets": [
    {
      "name": "Summary",
      "headers": ["Region", "Revenue", "Cost", "Profit"],
      "rows": [
        ["North", 100, 60, {"formula": "=B2-C2"}],
        ["South", 200, 110, {"formula": "=B3-C3"}]
      ],
      "frozen": {"rowSplit": 1, "colSplit": 0}
    },
    {
      "name": "Detail",
      "headers": ["SKU", "Qty"],
      "rows": [["A", 10], ["B", 20]]
    }
  ],
  "namedRanges": {"Profits": "Summary!D2:D3"}
}

Round-trip: the output of xlsx-for-ai data.xlsx --json is a valid input to xlsx-for-ai write, so reading then re-writing reproduces the file (verified on 29/30 real workbooks; the one MINOR is a CRLF→LF normalization in shared strings — visible content is identical).

Markdown spec: one or more tables; ## Sheet Name headings split into multiple sheets. Backtick-fenced cells become formulas (e.g., `=A1+B1`). Numbers, booleans, and ISO dates auto-detect.

v1 limitations: edit-in-place (deferred to v1.5), charts, pivot tables, conditional formatting, images, macros — none of these are written. Shared formulas degrade to their cached values (formula link is lost; computed value is preserved).

The `_xlsx-for-ai` review tab

When the round-trip introduces any lossy structural changes (shared-formula degradation, line-ending normalization, etc.), xlsx-for-ai write adds a _xlsx-for-ai sheet to the output as the last tab. It's a review note, not just a warning list — for each issue type it explains:

What happened — the source structure that couldn't be preserved
What we did — the choice the tool made
Risk — what could go wrong (e.g., "if you edit cells the formula depended on, they won't recalculate")
Tradeoff — what's worse about this choice vs. alternatives
Alternative — exactly what flag/source change to apply if you want different behavior
Affected cells — the specific refs, plus a full detail table at the bottom

The point: the user (or an AI agent reading the file) can understand every decision the tool made and override any of them. Same shape as a code reviewer's PR comment — observation + reasoning + alternative.

--no-report suppresses the tab if you want byte-clean output (useful for CI / round-trip tests). The --diff mode also ignores the _xlsx-for-ai tab automatically so it doesn't pollute change reports.

Output files are written to .xlsx-read/ in the current working directory. The path(s) are printed to stdout so your agent knows where to read.

Output Format

Text dump (default)

=== Sheet: Sales ===
Frozen: row 1, col 0
Columns: A(12) B(20) C(15) D(10)
Auto-filter: A1:D20
Named ranges:
  Totals: Sales!$D$2:$D$20
Table: "SalesTable" A1:D20 — columns: Region, Q1, Q2, Total

--- Row 1 [bold] ---
  A1: "Region"  [bold]
  B1: "Q1"  [bold] [align:center]
  C1: "Q2"  [bold] [align:center]
  D1: "Total"  [bold] [align:center]
--- Row 2 ---
  A2: "North"  [link: https://example.com/north]
  B2: 14500  [numFmt: #,##0]
  C2: 17200  [numFmt: #,##0]
  D2: 31700  [formula: =B2+C2] [numFmt: #,##0] [note: Includes returns]
--- Row 3 ---
  A3: "South"  [fill:FFFFFF00]
  B3: 9800  [numFmt: #,##0] [validation: list [North,South,East,West]]
  C3: 11050  [numFmt: #,##0]
  D3: 20850  [shared formula ref: D2] [numFmt: #,##0]
--- Row 4 (empty) [hidden] ---

JSON dump (`--json`)

{
  "name": "Sales",
  "rowCount": 4,
  "columnCount": 4,
  "frozen": { "rowSplit": 1, "colSplit": 0 },
  "columns": [{ "letter": "A", "width": 12 }, ...],
  "namedRanges": [{ "name": "Totals", "ranges": ["Sales!$D$2:$D$20"] }],
  "tables": [{ "name": "SalesTable", "ref": "A1:D20", "columns": ["Region", "Q1", "Q2", "Total"] }],
  "cells": [
    { "ref": "D2", "row": 2, "col": 4, "value": { "formula": "B2+C2", "result": 31700 }, "numFmt": "#,##0" },
    { "ref": "D3", "row": 3, "col": 4, "value": { "sharedFormulaRef": "D2", "result": 20850 }, "numFmt": "#,##0" }
  ]
}

Sheet Metadata

| Line | Meaning | |------|---------| | Frozen: row 1, col 2 | Frozen panes position | | Columns: A(12) B(20) | Column widths (Excel character units) | | Hidden columns: E, F | Columns hidden in the spreadsheet | | Merged: A1:B1 | Merged cell ranges | | Auto-filter: A1:D20 | Active auto-filter range | | Print area: A1:D50 | Defined print area | | Named ranges: | Workbook-defined names referencing this sheet | | Table: "Name" A1:D20 | Excel Table objects with column headers | | Image: A1 to C5 | Embedded image position |

Cell Tags

| Tag | Meaning | |-----|---------| | [formula: =SUM(A1:A10)] | Cell contains this formula (master cell) | | [shared formula ref: D2] | Cell shares D2's formula (Excel "shared formula" — common when you drag-fill) | | [numFmt: 0.00%] | Number format (when not "General") | | [bold] | Bold font | | [italic] | Italic font | | [color:FF8B0000] | Font color (ARGB hex) | | [fill:FFFFFF00] | Cell background color (ARGB hex) | | [align:center] | Horizontal alignment (when not default) | | [link: https://...] | Hyperlink URL | | [note: ...] | Cell comment or note text | | [validation: list [...]] | Data validation (dropdown values or constraints) | | [hidden] | Row is hidden in the spreadsheet |

`--list-sheets` Output

Sales  250 rows × 12 cols
Config  15 rows × 4 cols
Archive  1200 rows × 8 cols [hidden]

Cursor / Claude / Agent Rule Template

Copy the included rule template into your project so your AI agent automatically uses this tool when it encounters .xlsx files:

mkdir -p .cursor/rules
cp node_modules/xlsx-for-ai/cursor-rule-template/read-xlsx.mdc .cursor/rules/

Or fetch it directly:

mkdir -p .cursor/rules
curl -o .cursor/rules/read-xlsx.mdc https://raw.githubusercontent.com/senoff/xlsx-for-ai/main/cursor-rule-template/read-xlsx.mdc

The same rule works for Claude Code (.claude/rules/), Copilot (.github/copilot-instructions.md), or any other agent — just adjust the path.

Why This Exists

Spreadsheets are everywhere in real projects — financial models, data exports, config files, tax estimates. AI coding agents choke on binary formats. This tool makes spreadsheets legible to AI with zero information loss, including the tricky bits like shared formulas, named ranges, and merged cells that other tools drop.

License

MIT

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

xlsx-for-ai

Install

Usage

Options

Write mode (xlsx-for-ai write)

The _xlsx-for-ai review tab

Output Format

Text dump (default)

JSON dump (--json)

Sheet Metadata

Cell Tags

--list-sheets Output

Cursor / Claude / Agent Rule Template

Why This Exists

License

Write mode (`xlsx-for-ai write`)

The `_xlsx-for-ai` review tab

JSON dump (`--json`)

`--list-sheets` Output