pdf-texty
v2.0.0
Published
A React/Next.js library for uploading, parsing, and extracting text content from PDF files
Maintainers
Readme
pdf-texty
A lightweight React/Next.js library for uploading, parsing, and extracting text content from PDF files.
Features
- 📄 Easy PDF uploading with drag & drop support
- 🔍 Extract text content from PDFs
- 📝 Parse PDF metadata
- 📑 Extract page-by-page content
- 📱 Responsive design
- ⚛️ React components for displaying extracted data
- 🔧 Highly customizable with styling options
- 🚀 Compatible with Next.js and React
Installation
npm install pdf-texty
# or
yarn add pdf-textyUsage
Basic Example
import React, { useState } from 'react';
import { PDFUploader, PDFViewer } from 'pdf-texty';
const PDFExtractor = () => {
const [pdfData, setPdfData] = useState(null);
const handlePDFProcessed = (result) => {
console.log('PDF processed:', result);
setPdfData(result.data);
};
return (
<div>
<h1>PDF Extractor</h1>
<PDFUploader onPDFProcessed={handlePDFProcessed} />
{pdfData && (
<PDFViewer pdfData={pdfData} />
)}
</div>
);
};
export default PDFExtractor;Using with Next.js
// pages/pdf-extractor.js
import React, { useState } from 'react';
import dynamic from 'next/dynamic';
// Import the components with dynamic import to avoid SSR issues
const PDFUploader = dynamic(() => import('pdf-texty').then(mod => mod.PDFUploader), { ssr: false });
const PDFViewer = dynamic(() => import('pdf-texty').then(mod => mod.PDFViewer), { ssr: false });
export default function PDFExtractorPage() {
const [pdfData, setPdfData] = useState(null);
const handlePDFProcessed = (result) => {
console.log('PDF processed:', result);
setPdfData(result.data);
};
return (
<div>
<h1>PDF Extractor</h1>
<PDFUploader onPDFProcessed={handlePDFProcessed} />
{pdfData && (
<PDFViewer pdfData={pdfData} />
)}
</div>
);
}Direct API Usage (Without Components)
import { parsePDF } from 'pdf-texty';
// In an async function
async function processPDF(file) {
try {
const result = await parsePDF(file, {
extractMetadata: true,
extractText: true,
extractPages: true
});
console.log('PDF Text:', result.text);
console.log('PDF Metadata:', result.metadata);
console.log('PDF Pages:', result.pages);
return result;
} catch (error) {
console.error('Error processing PDF:', error);
}
}
// Usage with file input
document.getElementById('fileInput').addEventListener('change', async (e) => {
const file = e.target.files[0];
if (file) {
const pdfData = await processPDF(file);
// Do something with the data
}
});Component API
PDFUploader
A component for uploading and processing PDF files.
Props
| Prop | Type | Default | Description |
|------|------|---------|-------------|
| onPDFProcessed | Function | - | Callback function that receives the processed PDF data |
| options | Object | {} | Options for the PDF parser |
| style | Object | {} | Custom styles for the uploader |
| acceptMultiple | Boolean | false | Whether to accept multiple files |
| className | String | '' | Additional CSS class names |
Options
| Option | Type | Default | Description |
|--------|------|---------|-------------|
| extractMetadata | Boolean | true | Whether to extract metadata |
| extractText | Boolean | true | Whether to extract text |
| extractPages | Boolean | true | Whether to extract individual pages |
PDFViewer
A component for displaying the extracted PDF data.
Props
| Prop | Type | Default | Description |
|------|------|---------|-------------|
| pdfData | Object | - | The processed PDF data |
| style | Object | {} | Custom styles for the viewer |
| showMetadata | Boolean | true | Whether to show metadata section |
| showText | Boolean | true | Whether to show the full text section |
| showPages | Boolean | true | Whether to show individual pages section |
| className | String | '' | Additional CSS class names |
PDFParser
A class for parsing PDF files.
import { PDFParser } from 'pdf-texty';
const parser = new PDFParser({
extractMetadata: true,
extractText: true,
extractPages: true
});
const result = await parser.parsePDF(file);Styling
Both the PDFUploader and PDFViewer components accept a style prop for customization. The style object can override any of the default styles used by the components.
Example:
<PDFUploader
onPDFProcessed={handlePDFProcessed}
style={{
container: {
border: '2px dashed #3498db',
backgroundColor: '#ecf0f1'
},
button: {
backgroundColor: '#3498db'
}
}}
/>
<PDFViewer
pdfData={pdfData}
style={{
container: {
maxWidth: '800px',
border: '1px solid #3498db'
},
header: {
backgroundColor: '#3498db',
color: 'white'
}
}}
/>Browser Support
pdf-texty supports all modern browsers, including:
- Chrome (latest)
- Firefox (latest)
- Safari (latest)
- Edge (latest)
Dependencies
This package depends on:
- pdfjs-dist - For PDF parsing
- React >= 16.8.0
Development
To contribute to this project:
- Clone the repository:
git clone https://github.com/yourusername/pdf-texty.git - Install dependencies:
npm installoryarn - Build the package:
npm run buildoryarn build - Run tests:
npm testoryarn test
License
MIT
Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
