document-page-counter
v1.0.1
Published
A utility library for counting pages in PDF and DOCX documents
Downloads
476
Maintainers
Readme
Document Page Counter
A TypeScript library for counting pages in PDF and DOCX documents with multiple fallback strategies.
Features
- PDF Support: Uses
pdf-libto extract exact page count from PDF metadata - DOCX Support: Extracts page count from document properties or estimates from word count
- Fallback Strategies: Multiple methods to ensure you always get a page count
- Next.js Integration: Pre-built API route handlers
- TypeScript: Full type safety and IntelliSense support
- Configurable: Customize heuristics for different document types
Installation
npm install document-page-counterQuick Start
Basic Usage
import { countDocumentPages } from 'document-page-counter';
// From a File object
const file = document.getElementById('fileInput').files[0];
const buffer = await file.arrayBuffer();
const result = await countDocumentPages(buffer, file.type);
console.log(`Document has ${result.pages} pages (method: ${result.method})`);Class-based Usage
import { DocumentPageCounter } from 'document-page-counter';
const counter = new DocumentPageCounter({
wordsPerPage: 300, // Custom words per page estimate
pdfBytesPerPage: 50000, // Custom PDF size heuristic
docxBytesPerPage: 20000 // Custom DOCX size heuristic
});
const result = await counter.countPagesFromFile(file);Next.js API Routes
Option 1: Use the pre-built handler
// app/api/count-pages/route.ts
import { createPageCountHandler } from 'document-page-counter';
export const POST = createPageCountHandler({
wordsPerPage: 250
});Option 2: Manual integration
// app/api/count-pages/route.ts
import { NextResponse } from 'next/server';
import { countDocumentPages } from 'document-page-counter';
export async function POST(request: Request) {
const formData = await request.formData();
const file = formData.get('file') as File;
if (!file) {
return NextResponse.json({ error: 'No file provided' }, { status: 400 });
}
try {
const buffer = await file.arrayBuffer();
const result = await countDocumentPages(buffer, file.type);
return NextResponse.json({
pages: result.pages,
method: result.method
});
} catch (error) {
return NextResponse.json({
error: 'Failed to count pages'
}, { status: 500 });
}
}API Reference
countDocumentPages(buffer, mimeType, options?)
Count pages in a document buffer.
Parameters:
buffer: ArrayBuffer- The document buffermimeType: string- MIME type ('application/pdf' or 'application/vnd.openxmlformats-officedocument.wordprocessingml.document')options?: PageCountOptions- Configuration options
Returns: Promise<PageCountResult>
DocumentPageCounter
Class for counting pages with configurable options.
Constructor Options:
wordsPerPage?: number- Words per page for DOCX estimation (default: 250)pdfBytesPerPage?: number- Bytes per page for PDF size estimation (default: 40,000)docxBytesPerPage?: number- Bytes per page for DOCX size estimation (default: 15,000)
Methods:
countPages(buffer, mimeType)- Count pages from buffercountPagesFromFile(file)- Count pages from File objectcountPdfPages(buffer)- Count PDF pages specificallycountDocxPages(buffer)- Count DOCX pages specifically
PageCountResult
interface PageCountResult {
pages: number;
method: 'metadata' | 'heuristic' | 'word-count';
}metadata: Extracted from document properties (most accurate)word-count: Estimated from word count (DOCX fallback)heuristic: Estimated from file size (last resort)
Supported File Types
- PDF:
application/pdf - DOCX:
application/vnd.openxmlformats-officedocument.wordprocessingml.document
Error Handling
The library uses fallback strategies to ensure you always get a page count:
- PDF: Try to parse with pdf-lib → fallback to size estimation
- DOCX: Try document properties → word count estimation → size estimation
Contributing
- Fork the repository
- Create your feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add some amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
License
MIT License - see the LICENSE file for details.
Dependencies
jszip- For reading DOCX filesfast-xml-parser- For parsing XML in DOCX filesmammoth- For extracting text from DOCX filespdf-lib- For reading PDF files
