npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2025 – Pkg Stats / Ryan Hefner

@shilendra-dev/pdf-to-json

v1.0.4

Published

Convert PDF documents to structured JSON with text content, formatting, and hyperlinks

Readme

PDF to JSON Converter

A TypeScript utility that converts PDF documents into structured JSON data while preserving text content, formatting, and hyperlinks. Perfect for resume parsing, document analysis, and content extraction workflows.

✨ Features

  • Text Extraction: Extract text content with precise positioning and styling
  • Hyperlink Detection: Capture clickable links with their coordinates and target URLs
  • Font Preservation: Maintains font information for each text element
  • Multi-page Support: Processes documents of any length
  • Type Safety: Built with TypeScript for better development experience
  • Lightweight: Minimal dependencies

📦 Installation

Prerequisites

Make sure you have the following installed on your system:

  • Node.js (v16 or higher)
  • npm (v7 or higher) or yarn

Install the package

Using npm:

npm install @shilendra-dev/pdf-to-json

Or using yarn:

yarn add @shilendra-dev/pdf-to-json

Peer Dependencies

This package requires the following peer dependencies which will be installed automatically:

  • pdfjs-dist: ^3.4.120 (PDF.js library for PDF parsing)
  • @types/node: ^18.0.0 (TypeScript types for Node.js)

🚀 Usage

import { pdfToJson } from '@shilendra-dev/pdf-to-json';
import fs from 'fs/promises';

async function convertPdfToJson() {
  try {
    // Read PDF file
    const pdfBuffer = await fs.readFile('path/to/your/document.pdf');

    // Convert to JSON
    const result = await pdfToJson(pdfBuffer, {
      outputPath: 'output.json'  // Optional: Path to save the JSON output
    });

    console.log('Conversion complete!');
    console.log(`Processed ${result.numPages} pages`);
  } catch (error) {
    console.error('Error converting PDF:', error);
  }
}

convertPdfToJson();

📝 API

pdfToJson(pdfSource: Buffer | string, options?: PdfToJsonOptions): Promise<PdfJsonResult>

Converts a PDF document to JSON.

Parameters:

  • pdfSource: PDF file as Buffer or file path
  • options: (Optional) Configuration options
    • outputPath: (string) Path to save the JSON output file
    • includeTextContent: (boolean) Whether to include raw text content (default: true)
    • includeStyles: (boolean) Whether to include font and style information (default: true)
    • includeLinks: (boolean) Whether to include hyperlinks (default: true)

Returns: Promise that resolves to the parsed PDF data

📂 Output Format

The converter generates a JSON object with the following structure:

{
  numPages: number;
  pages: Array<{
    pageNumber: number;
    width: number;
    height: number;
    items: Array<{
      type: 'text' | 'link';
      content: string;
      x: number;
      y: number;
      width: number;
      height: number;
      fontFamily?: string;
      fontSize?: number;
      color?: string;
      url?: string;  // For links
    }>;
  }>;
}

🔍 Example

import { pdfToJson } from '@shilendra-dev/pdf-to-json';

// Convert PDF from URL
const response = await fetch('https://example.com/document.pdf');
const pdfBuffer = await response.arrayBuffer();
const result = await pdfToJson(Buffer.from(pdfBuffer));

// Process the extracted data
result.pages.forEach(page => {
  console.log(`Page ${page.pageNumber} (${page.width}x${page.height}):`);
  console.log(`- Contains ${page.items.length} text items`);
});

🤝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

  1. Fork the repository
  2. Create your feature branch (git checkout -b feature/AmazingFeature)
  3. Commit your changes (git commit -m 'Add some AmazingFeature')
  4. Push to the branch (git push origin feature/AmazingFeature)
  5. Open a Pull Request

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.


Made with ❤️ by Shilendra Singh