npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

@xcvzmoon/document-metadata-extractor

v1.1.0

Published

Metadata extractor for document files

Readme

Document Metadata Extractor

A TypeScript library for extracting metadata from various document types. This library provides a unified interface for extracting metadata from PDFs, images, Excel files, Word documents, and PowerPoint presentations.

Overview

This library is built on top of various specialized libraries to extract metadata from different document formats. Each document type uses its underlying library to parse and extract relevant metadata:

  • PDF: Built on top of unpdf for extracting PDF metadata and page counts
  • Images: Built on top of exiftool-vendored for extracting EXIF and image metadata
  • Excel: Built on top of xlsx for extracting spreadsheet metadata, sheet information, and document properties
  • DOCX/PPTX: Built on top of jszip and @xmldom/xmldom for parsing Office Open XML documents and extracting metadata from core and application properties

Installation

npm install @xcvzmoon/document-metadata-extractor
# or
pnpm add @xcvzmoon/document-metadata-extractor
# or
yarn add @xcvzmoon/document-metadata-extractor
# or
bun add @xcvzmoon/document-metadata-extractor

Usage

import { getMetadata } from '@xcvzmoon/document-metadata-extractor';
import { readFile } from 'fs/promises';

// Read a file as Buffer
const fileBuffer = await readFile('document.pdf');

// Extract metadata
const metadata = await getMetadata(fileBuffer, { target: 'pdf' });
console.log(metadata);

Supported Document Types

PDF

Extracts PDF metadata including title, author, subject, creator, producer, creation date, modification date, and page count.

const metadata = await getMetadata(pdfBuffer, { target: 'pdf' });
// Returns: PdfMetadata with pages, title, author, subject, creator, producer, creationDate, modificationDate

Images

Extracts EXIF and image metadata using ExifTool. Returns all available tags from the image file.

const metadata = await getMetadata(imageBuffer, { target: 'image' });
// Returns: All ExifTool tags for the image

Excel

Extracts spreadsheet metadata including sheet names, sheet count, row/column counts, author, last modified by, creation/modification dates, company, and file size.

const metadata = await getMetadata(excelBuffer, { target: 'excel' });
// Returns: ExcelMetadata with sheets, sheetCount, rows, columns, author, lastModifiedBy, created, modified, company, fileSize

DOCX

Extracts Word document metadata including title, subject, creator, keywords, description, last modified by, revision, creation/modification dates, category, company, page count, word count, character count, and file size.

const metadata = await getMetadata(docxBuffer, { target: 'docx' });
// Returns: DocxMetadata with title, subject, creator, keywords, description, lastModifiedBy, revision, created, modified, category, company, pageCount, wordCount, characterCount, fileSize

PPTX

Extracts PowerPoint presentation metadata using the same extraction method as DOCX files.

const metadata = await getMetadata(pptxBuffer, { target: 'pptx' });
// Returns: DocxMetadata (same structure as DOCX)

API

getMetadata(data: Buffer, options: { target: 'image' | 'pdf' | 'docx' | 'excel' | 'pptx' })

Extracts metadata from a document buffer based on the specified target type.

Parameters:

  • data: A Buffer containing the document file data
  • options.target: The document type to extract metadata from

Returns:

  • Promise resolving to the appropriate metadata type based on the target:
    • PdfMetadata for PDF files
    • ExifTool tags object for images
    • ExcelMetadata for Excel files
    • DocxMetadata for DOCX and PPTX files

Type Definitions

The library exports TypeScript type definitions for all metadata types:

  • PdfMetadata
  • ExcelMetadata
  • DocxMetadata

License

ISC

Author

Mon Albert Gamil - GitHub