pdf-worker-package
v1.0.1
Published
A simple and robust PDF text extraction utility using pdfjs-dist
Maintainers
Readme
PDF Worker Package
A simple and robust PDF text extraction utility using pdfjs-dist. This package provides an easy-to-use interface for extracting text content from PDF files.
Installation
npm install pdf-worker-packageUsage
import { processPDF } from 'pdf-worker-package';
// Example with Node.js
import * as fs from 'fs';
async function extractPDFText() {
try {
// Read the PDF file
const pdfBuffer = fs.readFileSync('path/to/your/file.pdf');
// Process the PDF and get the text
const text = await processPDF(pdfBuffer);
console.log('Extracted text:', text);
} catch (error) {
console.error('Error processing PDF:', error);
}
}
// Example with browser
async function extractPDFTextFromFile(file: File) {
try {
const arrayBuffer = await file.arrayBuffer();
const text = await processPDF(arrayBuffer);
console.log('Extracted text:', text);
} catch (error) {
console.error('Error processing PDF:', error);
}
}API
processPDF(pdfData: ArrayBuffer): Promise<string>
Processes a PDF file and extracts its text content.
pdfData: The PDF file as an ArrayBuffer- Returns: A promise that resolves to the extracted text
- Throws: Error if PDF processing fails
Error Handling
The package includes robust error handling for common PDF processing issues:
- Worker initialization failures
- Invalid PDF files
- Processing errors
🔗 Related
👉 Need to clean and normalize text before embedding it?
Check outtext-prep-lite👉 Need word embeddings for semantic analysis?
Check outwink-embeddings-small-en-50d
License
MIT thegreatbey
