npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

llm-xlsx-parser

v1.0.0

Published

Convert XLSX files into LLM-friendly formats with visual, CSV, and structured record processing

Readme

LLM XLSX Parser

A Node.js module that converts Excel (XLSX) files into LLM-friendly formats. This package transforms spreadsheet data into multiple representations (visual images, CSV, and structured records) that Large Language Models can easily understand and process.

Why This Package Exists

LLMs struggle with tabular data for three key reasons:

  1. Poor Spatial Awareness: These models have difficulty reading information side-to-side and are much better at processing data from top to bottom.

  2. Pattern Recognition: LLMs excel at recognizing patterns. The repeating record structure this package creates reinforces the model's inherent pattern-matching abilities.

  3. Distance Problem: In traditional tables, there's often significant distance between column values (especially in row 100+) and their headers. By formatting data as records, every value is immediately paired with its column name.

Inspiration: This package was inspired by discussions in the OpenAI community about how to format Excel files best for API ingestion, where developers shared techniques for making tabular data more LLM-friendly.

Features

  • 📊 Multi-format Conversion: Transforms XLSX files into images, CSV, and structured records
  • 🧠 LLM-Optimized: Formats data specifically for optimal LLM comprehension
  • 🖼️ Visual Processing: Generates images to help LLMs understand spatial relationships
  • 🌐 Headless Server Ready: Accepts file paths OR base64 strings (auto-detected)
  • ⚙️ Configurable: Customizable image generation and processing options
  • 📦 NPM Module: Easy to integrate into existing projects

Installation

npm install llm-xlsx-parser

Setup

Gemini is optional. You only need an API key when using mode: "llm".

  1. Get a Google Gemini API key from Google AI Studio (optional)
  2. Set up your environment (optional):
# Create .env file
echo "GEMINI_API_KEY=your_api_key_here" > .env

Or pass the API key directly in the options.

Note: While any Gemini model can be used, this author has only had success with gemini-2.5-pro for mode: "llm".

How It Works

The package converts traditional spreadsheet data like this:

| name | age | favorite color | | ----- | --- | -------------- | | Steve | 56 | red | | Ava | 1 | pink | | Donna | 50 | purple |

Into this LLM-friendly format:

name: Steve
age: 56
favorite color: red

name: Ava
age: 1
favorite color: pink

name: Donna
age: 50
favorite color: purple

This transformation makes it much easier for LLMs to:

  • Understand the relationship between values and their column names
  • Process data in a top-to-bottom reading pattern
  • Recognize the repeating record structure

Usage

Basic Usage

import parseXlsx from "llm-xlsx-parser";

const result = await parseXlsx(
  "path/to/your/file.xlsx",
  "output/formatted-data.txt",
  {
    mode: "text", // default mode, no LLM required
  }
);

console.log(result); // deterministic text output (records/CSV)

Image Output Mode

import parseXlsx from "llm-xlsx-parser";

// Generate only an image (no LLM processing)
const imagePath = await parseXlsx(
  "data/spreadsheet.xlsx",
  "output/spreadsheet-image.png",
  {
    mode: "image", // image output mode
    maxRows: 100,
    maxCols: 50,
    fontSize: 12,
    cellPadding: 4,
  }
);

console.log(`Image saved to: ${imagePath}`);

Advanced Usage with Options

import parseXlsx from "llm-xlsx-parser";

const result = await parseXlsx(
  "data/spreadsheet.xlsx",
  "output/formatted-data.txt",
  {
    mode: "llm", // Gemini analysis mode
    maxRows: 100, // Maximum rows to process for image
    maxCols: 50, // Maximum columns to process for image
    viewportWidth: 1920, // Browser viewport width for image
    viewportHeight: 1080, // Browser viewport height for image
    fontSize: 10, // Font size for image generation
    cellPadding: 4, // Cell padding for image generation
    fullPage: true, // Capture full page screenshot
    geminiApiKey: "your-key", // API key (if not in environment)
    systemPrompt: "Custom formatting prompt...", // Custom system prompt
  }
);

Headless Server Usage (Base64 Mode)

For headless server environments, you can pass XLSX files as base64 strings instead of file paths. The function automatically detects whether the input is a file path or base64 data:

import parseXlsx from "llm-xlsx-parser";
import fs from "fs";

// Convert XLSX file to base64 (simulate receiving from client)
const xlsxBuffer = fs.readFileSync("data/spreadsheet.xlsx");
const base64String = xlsxBuffer.toString("base64");

// Process base64 data directly (no temp files needed on your end)
const result = await parseXlsx(
  base64String, // Function auto-detects this is base64
  "output/analysis.txt",
  {
    mode: "text", // no LLM required
    maxRows: 100,
    maxCols: 50,
  }
);

console.log(result); // LLM analysis result

Base64 Image Output

You can also return images as base64 strings instead of saving them to the filesystem, perfect for headless servers:

import parseXlsx from "llm-xlsx-parser";

// Return image as base64 string (no file saved)
const base64Image = await parseXlsx(
  base64XlsxString, // Input XLSX as base64
  "output.png", // Path ignored in base64 mode
  {
    mode: "image", // Image output mode
    returnImageAsBase64: true, // Return as base64 string
    maxRows: 50,
    maxCols: 30,
  }
);

console.log(`data:image/png;base64,${base64Image}`); // Ready for web use

Multiple Sheets with Base64 Images

const results = await parseXlsx(xlsxBase64String, "output.png", {
  mode: "image",
  returnImageAsBase64: true,
  sheets: [0, 1, 2], // Process first 3 sheets
});

// results = {
//   "Sheet1": "iVBORw0KGgoAAAANSUhEUgAA...", // base64 string
//   "Sheet2": "iVBORw0KGgoAAAANSUhEUgAA...", // base64 string
//   "Sheet3": "iVBORw0KGgoAAAANSUhEUgAA..."  // base64 string
// }

API Endpoint Example

// Express.js endpoint example
app.post("/analyze-xlsx", async (req, res) => {
  try {
    const { xlsxBase64, returnImage = false } = req.body;

    if (returnImage) {
      // Return image as base64 (no file system usage)
      const base64Image = await parseXlsx(
        xlsxBase64,
        "temp.png", // Ignored in base64 mode
        {
          mode: "image",
          returnImageAsBase64: true,
          maxRows: 100,
          maxCols: 50,
        }
      );
      res.json({ success: true, image: base64Image, type: "image" });
    } else {
      // Return LLM analysis
      const analysis = await parseXlsx(
        xlsxBase64,
        `temp/analysis-${Date.now()}.txt`,
        { mode: "llm" }
      );
      res.json({ success: true, analysis, type: "text" });
    }
  } catch (error) {
    res.status(500).json({ error: error.message });
  }
});

Serverless Function Example

// AWS Lambda / Vercel function example
export async function handler(event) {
  try {
    const { xlsxBase64, mode = "llm" } = JSON.parse(event.body);

    if (mode === "image") {
      // Return base64 image (no /tmp filesystem usage)
      const base64Image = await parseXlsx(
        xlsxBase64,
        "output.png", // Ignored in base64 mode
        {
          mode: "image",
          returnImageAsBase64: true,
          maxRows: 100,
          maxCols: 50,
        }
      );

      return {
        statusCode: 200,
        body: JSON.stringify({
          success: true,
          image: base64Image,
          mimeType: "image/png",
        }),
      };
    } else {
      // Return LLM analysis
      const result = await parseXlsx(
        xlsxBase64,
        "/tmp/analysis.txt", // Temp directory in serverless
        { mode: "llm" }
      );

      return {
        statusCode: 200,
        body: JSON.stringify({ success: true, analysis: result }),
      };
    }
  } catch (error) {
    return {
      statusCode: 500,
      body: JSON.stringify({ error: error.message }),
    };
  }
}

Auto-Detection Logic: The function determines input type by checking:

  1. Contains / or \ (file path indicators)
  2. Ends with .xlsx or .xls
  3. File exists on filesystem
  4. Otherwise, treats as base64 string

API Reference

parseXlsx(xlsxInput, outputPath, options)

Converts an XLSX file into text, image, or optional LLM analysis outputs.

Parameters

  • xlsxInput (string|Buffer): Path to XLSX file OR base64 string OR Buffer containing XLSX data
  • outputPath (string): Path where the formatted data will be saved
  • options (object, optional): Configuration options

Options

| Option | Type | Default | Description | | --------------------- | ------- | ---------------------------- | ------------------------------------------------ | | maxRows | number | 50 | Maximum rows to process for image generation | | maxCols | number | 40 | Maximum columns to process for image generation | | viewportWidth | number | 1920 | Browser viewport width for image generation | | viewportHeight | number | 1080 | Browser viewport height for image generation | | fontSize | number | 8 | Font size for image generation | | cellPadding | number | 2 | Cell padding for image generation | | fullPage | boolean | true | Whether to capture full page screenshot | | mode | string | "text" | Processing mode: "text", "image", or "llm" | | outputImage | boolean | false | Legacy alias: when true, behaves like mode: "image" | | returnImageAsBase64 | boolean | false | Return images as base64 strings instead of files | | geminiApiKey | string | process.env.GEMINI_API_KEY | Gemini API key (required only for "llm" mode) | | systemPrompt | string | Built-in prompt | Custom system prompt for formatting |

Returns

  • Promise<string|Object>:
    • Text mode: Deterministic text output from records/CSV
    • LLM mode: Gemini-formatted analysis text
    • Image mode (file): Image file path(s)
    • Image mode (base64): Base64 image string(s)
    • Multiple sheets: Object with sheet names as keys

Throws

  • Error: If Gemini API key is missing or invalid in "llm" mode
  • Error: If XLSX file cannot be read
  • Error: If image generation fails

Example

Run the included example:

# Clone this repository
git clone https://github.com/your-username/llm-xlsx-parser.git
cd llm-xlsx-parser

# Install dependencies
npm install

# Optional: set up your API key for mode: "llm"
echo "GEMINI_API_KEY=your_api_key_here" > .env

# Run the example
npm run example

Output Modes

The package supports three primary output modes:

1. Text Mode (Default)

Processes spreadsheets locally and returns deterministic text output.

  • 📋 Structured Records: Key-value pairs for each row (primary format)
  • 📊 CSV Data: Traditional comma-separated values
  • ⚡ No API calls: No external LLM dependency

2. Image Output Mode

Generates and saves a visual image of the spreadsheet as the primary output. This mode:

  • 🖼️ Creates a PNG image of the spreadsheet data
  • ⚡ Skips LLM processing for faster execution
  • 💰 No API costs - doesn't require Gemini API key
  • 🎨 Highly customizable image generation options

3. LLM Analysis Mode (Optional)

Sends spreadsheet data to Google Gemini and returns model-generated analysis.

  • 🤖 Gemini analysis: AI-generated summary/insights
  • 🖼️ Optional visual context: Include rendered sheet image
  • 📋 Optional structured inputs: Include records and CSV
  • 🔑 Requires API key: GEMINI_API_KEY or geminiApiKey

All modes can be used independently based on your needs.

Processing Steps

Text Mode

  1. 📖 File Reading: Reads the XLSX file and extracts data
  2. 🔄 Format Conversion: Converts data to CSV and structured records
  3. 📝 Output: Returns deterministic text format

Image Output Mode

  1. 📖 File Reading: Reads the XLSX file and extracts data
  2. 🖼️ Image Generation: Creates a visual representation using Playwright
  3. 💾 Save Image: Saves the image to the specified output path
  4. 📝 Output: Returns the image file path

LLM Analysis Mode

  1. 📖 File Reading: Reads the XLSX file and extracts data
  2. 🔄 Format Conversion: Converts data to CSV and structured records
  3. 🖼️ Image Generation (optional): Creates visual representation when enabled
  4. 📤 LLM Processing: Sends selected formats to Gemini
  5. 📝 Output: Returns model-generated analysis
  6. 🧹 Cleanup: Removes temporary files

Why Use Multiple Formats?

  • Records: Optimal for LLM processing and understanding
  • CSV: Familiar format for data validation and backup
  • Image: Helps LLMs understand complex layouts and spatial relationships

This combination addresses the limitations of traditional tabular data presentation to AI models.

Requirements

  • Node.js 18+ (ES modules support)
  • Gemini API key and internet connection only for mode: "llm"

Dependencies

  • @google/genai - Google Gemini AI integration
  • xlsx - Excel file parsing
  • canvas - Image rendering
  • playwright - Browser automation for screenshots
  • dotenv - Environment variable management

Error Handling

The module includes comprehensive error handling:

try {
  const result = await parseXlsx("file.xlsx", "output.txt");
  console.log("Success:", result);
} catch (error) {
  if (error.message.includes("Gemini API key")) {
    console.error("API key issue:", error.message);
  } else if (error.message.includes("XLSX")) {
    console.error("File reading issue:", error.message);
  } else {
    console.error("General error:", error.message);
  }
}

Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Add tests if applicable
  5. Submit a pull request

License

ISC

Support

For issues and questions:

  • Create an issue on GitHub
  • Check the documentation
  • Review the example code

Note: Gemini is optional and only used in mode: "llm". The default mode: "text" path performs local conversion without LLM dependencies.