page-to-pdf-converter

v1.0.0

Published

a month ago

A simple web app to convert any web page to a PDF file with a single click

0High
0Medium
0Low

arun_acharya

pdf page converter html-to-pdf puppeteer web-to-pdf pdf-generator

Page to PDF Converter

A simple web application that converts any web page to a PDF file with a single click.

Features

🌐 Convert any web page to PDF
📄 Generates PDFs using headless browser for accurate rendering
⚡ Real-time progress updates
🎨 Clean, modern UI
⏱️ Timeout and error handling
🗑️ Automatic cleanup of generated files
🚀 Can be installed globally via npm

Prerequisites

Node.js (v16 or higher)
npm or yarn

Installation

Option 1: Install Globally via NPM (Recommended)

npm install -g page-to-pdf-converter

Then run:

page-to-pdf

Option 2: Use with npx (No Installation)

npx page-to-pdf-converter

Option 3: Local Development

Clone the repository:

git clone https://github.com/yourusername/page-to-pdf-converter.git
cd page-to-pdf-converter

Install dependencies:
```
npm install
```
This will install:
- Express (web server)
- Puppeteer (headless browser)
- pdf-lib (PDF processing)
- Cheerio (HTML parsing)
- CORS (cross-origin support)

Usage

If Installed Globally or via npx

Simply run:

page-to-pdf

This will:

Start the server on port 3000
Automatically open your browser to http://localhost:3000

For Local Development

Start the server:
```
npm start
```
For development with auto-reload:
```
npm run dev
```
Open your browser: Navigate to http://localhost:3000

Converting a Page

Enter a web page URL (e.g., https://example.com/page)
Click "Generate PDF"
Wait for the process to complete
Download your PDF

How It Works

Input: User provides a web page URL
Rendering: The page is rendered using Puppeteer's headless Chrome
PDF Generation: A PDF is created from the rendered page
Download: The PDF is available for download

Configuration

You can modify these settings in server.js:

PORT: Server port (default: 3000, or set via PORT environment variable)
Timeout values for page loading and requests
PDF format and margins

Example with custom port:

PORT=8080 page-to-pdf
``Converts single pages only (doesn't crawl links)
- 30-second timeout per page
- Generated PDFs are automatically deleted after 1 hour
- Some dynamic content may not render perfectlyl links)
- Maximum 50 pages per website (configurable)
- 30-second timeout per page
- Skips common file types (images, PDFs, executables, etc.)
- Generated PDFs are automatically deleted after 1 hour

## Troubleshooting

**Puppeteer installation issues:**
If Puppeteer fails to install, try:
```bash
npm install puppeteer --unsafe-perm=true

"Browser not found" error: Puppeteer downloads Chrome automatically. If it fails, reinstall:

npm uninstall puppeteer
npm install puppeteer

Port already in use: Change the PORT value in server.js or kill the process using port 3000:

lsof -ti:3000 | xargs kill

API Endpoints

POST `/api/generate`

Start PDF generation for a website.

Request:

{
  "url": "https://example.com"
}

Response:

{
  "jobId": "unique-job-id"
}

GET `/api/status/:jobId`

Get the status of a PDF generation job.

Response:

{
  "status": "generating",
  "progress": 45,
  "message": "Processing page 5 of 10...",
  "downloadUrl": "/downloads/website-xyz.pdf",
  "pageCount": 10
}

Project Structure

.
├── server.js           # Express server and PDF generation logic
├── package.json        # Dependencies and scripts
├── public/
│   ├── index.html     # Frontend interface
│   └── downloads/     # Generated PDFs (auto-created)
└── README.md          # This file

Technologies Used

Backend: Node.js, Express
Browser Automation: Puppeteer
PDF Processing: pdf-lib
HTML Parsing: Cheerio
Frontend: Vanilla JavaScript, HTML5, CSS3

License

Publishing to NPM

See NPM_PUBLISH_GUIDE.md for detailed instructions on how to publish this package to npm.

License

MIT

Notes

The app respects the crawl limit to prevent excessive resource usage
PDF files are temporary and cleaned up automatically
Large websites may take several minutes to process
Some dynamic websites may not render perfectly in the headless browser

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

Page to PDF Converter

Features

Prerequisites

Installation

Option 1: Install Globally via NPM (Recommended)

Option 2: Use with npx (No Installation)

Option 3: Local Development

Usage

If Installed Globally or via npx

For Local Development

Converting a Page

How It Works

Configuration

API Endpoints

POST /api/generate

GET /api/status/:jobId

Project Structure

Technologies Used

License

License

Notes

POST `/api/generate`

GET `/api/status/:jobId`