node-pptx-parser

v1.0.1

Published

a year ago

A PowerPoint (PPTX) parser that extracts text content with preserved formatting

0High
0Medium
0Low

mirza-glitch

pptx powerpoint parser text extract

node-pptx-parser

A Node.js library for parsing PowerPoint (PPTX) files and extracting text content. This library maintains text formatting, line breaks, and paragraph structures from the original presentation.

Features

Extract text content from PPTX files with preserved formatting
Parse PPTX structure into manageable JavaScript objects
Access raw XML content of presentation components
Written in TypeScript for type safety
Promise-based API
Preserves line breaks and paragraph formatting
Minimal dependencies

Installation


npm  install  node-pptx-parser

Usage

Once the package is installed you can you it with import or require statements like this:

// ESM import:
import PptxParser from "node-pptx-parser";

// CommonJs require:
const PptxParser = require("node-pptx-parser").default;

Basic Text Extraction

import PptxParser from "node-pptx-parser";

async function main() {
  const parser = new PptxParser("presentation.pptx");

  try {
    // Extract text from all slides
    const textContent = await parser.extractText();

    // Print text from each slide
    textContent.forEach((slide) => {
      console.log(`\nSlide ${slide.id}:`);

      console.log(slide.text.join("\n"));
    });
  } catch (error) {
    console.error("Error:", error.message);
  }
}

main();

Advanced Usage - Full Presentation Parsing

import PptxParser from "node-pptx-parser";

async function main() {
  const parser = new PptxParser("presentation.pptx");

  try {
    // Get complete parsed presentation content
    const parsedContent = await parser.parse();

    // Access presentation structure
    console.log(parsedContent.presentation.parsed);

    // Access individual slides
    parsedContent.slides.forEach((slide) => {
      console.log(`Slide ${slide.id}:`, slide.parsed);
    });

    // Access raw XML if needed
    console.log(parsedContent.presentation.xml);
  } catch (error) {
    console.error("Error:", error.message);
  }
}

main();

API Reference

`PptxParser`

The main class for parsing PPTX files.

Constructor


constructor(filePath: string)

Creates a new instance of PptxParser.

filePath: Path to the PPTX file to be parsed

Methods

`parse()`


async parse(): Promise<ParsedPresentation>

Parses the entire PPTX file and returns its content.

Returns: Promise resolving to a ParsedPresentation object containing the complete presentation structure

`extractText()`


async extractText(): Promise<SlideTextContent[]>

Extracts formatted text content from all slides.

Returns: Promise resolving to an array of SlideTextContent objects

Types

`ParsedPresentation`

interface ParsedPresentation {
  presentation: {
    path: string;
    xml: string;
    parsed: any;
  };
  relationships: {
    path: string;
    xml: string;
    parsed: any;
  };
  slides: ParsedSlide[];
}

`ParsedSlide`

interface ParsedSlide {
  id: string;
  path: string;
  xml: string;
  parsed: any;
}

`SlideTextContent`

interface SlideTextContent extends ParsedSlide {
  text: string[];
}

Error Handling

The library throws errors in the following cases:

Invalid PPTX file structure
File reading errors
XML parsing errors

Example error handling:

try {
  const parser = new PptxParser("presentation.ppt");
  const content = await parser.extractText();
} catch (error) {
  if (error.message.includes("Invalid PPTX file structure")) {
    console.error("The PPTX file is corrupted or invalid");
  } else {
    console.error("An error occurred:", error.message);
  }
}

Dependencies

unzipper: For extracting PPTX files
xml2js: For parsing XML content

License

MIT

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

node-pptx-parser

Features

Installation

Usage

Basic Text Extraction

Advanced Usage - Full Presentation Parsing

API Reference

PptxParser

Constructor

Methods

parse()

extractText()

Types

ParsedPresentation

ParsedSlide

SlideTextContent

Error Handling

Dependencies

License

Contributing

`PptxParser`

`parse()`

`extractText()`

`ParsedPresentation`

`ParsedSlide`

`SlideTextContent`