pdf-texty

v2.0.0

Published

10 months ago

A React/Next.js library for uploading, parsing, and extracting text content from PDF files

0High
0Medium
0Low

aniket_ap

pdf parser text extract react nextjs component upload viewer

pdf-texty

A lightweight React/Next.js library for uploading, parsing, and extracting text content from PDF files.

Features

📄 Easy PDF uploading with drag & drop support
🔍 Extract text content from PDFs
📝 Parse PDF metadata
📑 Extract page-by-page content
📱 Responsive design
⚛️ React components for displaying extracted data
🔧 Highly customizable with styling options
🚀 Compatible with Next.js and React

Installation

npm install pdf-texty
# or
yarn add pdf-texty

Usage

Basic Example

import React, { useState } from 'react';
import { PDFUploader, PDFViewer } from 'pdf-texty';

const PDFExtractor = () => {
  const [pdfData, setPdfData] = useState(null);
  
  const handlePDFProcessed = (result) => {
    console.log('PDF processed:', result);
    setPdfData(result.data);
  };
  
  return (
    <div>
      <h1>PDF Extractor</h1>
      <PDFUploader onPDFProcessed={handlePDFProcessed} />
      
      {pdfData && (
        <PDFViewer pdfData={pdfData} />
      )}
    </div>
  );
};

export default PDFExtractor;

Using with Next.js

// pages/pdf-extractor.js
import React, { useState } from 'react';
import dynamic from 'next/dynamic';

// Import the components with dynamic import to avoid SSR issues
const PDFUploader = dynamic(() => import('pdf-texty').then(mod => mod.PDFUploader), { ssr: false });
const PDFViewer = dynamic(() => import('pdf-texty').then(mod => mod.PDFViewer), { ssr: false });

export default function PDFExtractorPage() {
  const [pdfData, setPdfData] = useState(null);
  
  const handlePDFProcessed = (result) => {
    console.log('PDF processed:', result);
    setPdfData(result.data);
  };
  
  return (
    <div>
      <h1>PDF Extractor</h1>
      <PDFUploader onPDFProcessed={handlePDFProcessed} />
      
      {pdfData && (
        <PDFViewer pdfData={pdfData} />
      )}
    </div>
  );
}

Direct API Usage (Without Components)

import { parsePDF } from 'pdf-texty';

// In an async function
async function processPDF(file) {
  try {
    const result = await parsePDF(file, {
      extractMetadata: true,
      extractText: true,
      extractPages: true
    });
    
    console.log('PDF Text:', result.text);
    console.log('PDF Metadata:', result.metadata);
    console.log('PDF Pages:', result.pages);
    
    return result;
  } catch (error) {
    console.error('Error processing PDF:', error);
  }
}

// Usage with file input
document.getElementById('fileInput').addEventListener('change', async (e) => {
  const file = e.target.files[0];
  if (file) {
    const pdfData = await processPDF(file);
    // Do something with the data
  }
});

Component API

PDFUploader

A component for uploading and processing PDF files.

Props

| Prop | Type | Default | Description | |------|------|---------|-------------| | onPDFProcessed | Function | - | Callback function that receives the processed PDF data | | options | Object | {} | Options for the PDF parser | | style | Object | {} | Custom styles for the uploader | | acceptMultiple | Boolean | false | Whether to accept multiple files | | className | String | '' | Additional CSS class names |

Options

| Option | Type | Default | Description | |--------|------|---------|-------------| | extractMetadata | Boolean | true | Whether to extract metadata | | extractText | Boolean | true | Whether to extract text | | extractPages | Boolean | true | Whether to extract individual pages |

PDFViewer

A component for displaying the extracted PDF data.

Props

| Prop | Type | Default | Description | |------|------|---------|-------------| | pdfData | Object | - | The processed PDF data | | style | Object | {} | Custom styles for the viewer | | showMetadata | Boolean | true | Whether to show metadata section | | showText | Boolean | true | Whether to show the full text section | | showPages | Boolean | true | Whether to show individual pages section | | className | String | '' | Additional CSS class names |

PDFParser

A class for parsing PDF files.

import { PDFParser } from 'pdf-texty';

const parser = new PDFParser({
  extractMetadata: true,
  extractText: true,
  extractPages: true
});

const result = await parser.parsePDF(file);

Styling

Both the PDFUploader and PDFViewer components accept a style prop for customization. The style object can override any of the default styles used by the components.

Example:

<PDFUploader
  onPDFProcessed={handlePDFProcessed}
  style={{
    container: {
      border: '2px dashed #3498db',
      backgroundColor: '#ecf0f1'
    },
    button: {
      backgroundColor: '#3498db'
    }
  }}
/>

<PDFViewer
  pdfData={pdfData}
  style={{
    container: {
      maxWidth: '800px',
      border: '1px solid #3498db'
    },
    header: {
      backgroundColor: '#3498db',
      color: 'white'
    }
  }}
/>

Browser Support

pdf-texty supports all modern browsers, including:

Chrome (latest)
Firefox (latest)
Safari (latest)
Edge (latest)

Dependencies

This package depends on:

pdfjs-dist - For PDF parsing
React >= 16.8.0

Development

To contribute to this project:

Clone the repository: git clone https://github.com/yourusername/pdf-texty.git
Install dependencies: npm install or yarn
Build the package: npm run build or yarn build
Run tests: npm test or yarn test

License

MIT

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

pdf-texty

Features

Installation

Usage

Basic Example

Using with Next.js

Direct API Usage (Without Components)

Component API

PDFUploader

Props

Options

PDFViewer

Props

PDFParser

Styling

Browser Support

Dependencies

Development

License

Contributing