page-to-pdf-converter
v1.0.0
Published
A simple web app to convert any web page to a PDF file with a single click
Maintainers
Readme
Page to PDF Converter
A simple web application that converts any web page to a PDF file with a single click.
Features
- 🌐 Convert any web page to PDF
- 📄 Generates PDFs using headless browser for accurate rendering
- ⚡ Real-time progress updates
- 🎨 Clean, modern UI
- ⏱️ Timeout and error handling
- 🗑️ Automatic cleanup of generated files
- 🚀 Can be installed globally via npm
Prerequisites
- Node.js (v16 or higher)
- npm or yarn
Installation
Option 1: Install Globally via NPM (Recommended)
npm install -g page-to-pdf-converterThen run:
page-to-pdfOption 2: Use with npx (No Installation)
npx page-to-pdf-converterOption 3: Local Development
Clone the repository:
git clone https://github.com/yourusername/page-to-pdf-converter.git cd page-to-pdf-converterInstall dependencies:
npm installThis will install:
- Express (web server)
- Puppeteer (headless browser)
- pdf-lib (PDF processing)
- Cheerio (HTML parsing)
- CORS (cross-origin support)
Usage
If Installed Globally or via npx
Simply run:
page-to-pdfThis will:
- Start the server on port 3000
- Automatically open your browser to
http://localhost:3000
For Local Development
Start the server:
npm startFor development with auto-reload:
npm run devOpen your browser: Navigate to
http://localhost:3000
Converting a Page
- Enter a web page URL (e.g.,
https://example.com/page) - Click "Generate PDF"
- Wait for the process to complete
- Download your PDF
How It Works
- Input: User provides a web page URL
- Rendering: The page is rendered using Puppeteer's headless Chrome
- PDF Generation: A PDF is created from the rendered page
- Download: The PDF is available for download
Configuration
You can modify these settings in server.js:
PORT: Server port (default: 3000, or set viaPORTenvironment variable)- Timeout values for page loading and requests
- PDF format and margins
Example with custom port:
PORT=8080 page-to-pdf
``Converts single pages only (doesn't crawl links)
- 30-second timeout per page
- Generated PDFs are automatically deleted after 1 hour
- Some dynamic content may not render perfectlyl links)
- Maximum 50 pages per website (configurable)
- 30-second timeout per page
- Skips common file types (images, PDFs, executables, etc.)
- Generated PDFs are automatically deleted after 1 hour
## Troubleshooting
**Puppeteer installation issues:**
If Puppeteer fails to install, try:
```bash
npm install puppeteer --unsafe-perm=true"Browser not found" error: Puppeteer downloads Chrome automatically. If it fails, reinstall:
npm uninstall puppeteer
npm install puppeteerPort already in use:
Change the PORT value in server.js or kill the process using port 3000:
lsof -ti:3000 | xargs killAPI Endpoints
POST /api/generate
Start PDF generation for a website.
Request:
{
"url": "https://example.com"
}Response:
{
"jobId": "unique-job-id"
}GET /api/status/:jobId
Get the status of a PDF generation job.
Response:
{
"status": "generating",
"progress": 45,
"message": "Processing page 5 of 10...",
"downloadUrl": "/downloads/website-xyz.pdf",
"pageCount": 10
}Project Structure
.
├── server.js # Express server and PDF generation logic
├── package.json # Dependencies and scripts
├── public/
│ ├── index.html # Frontend interface
│ └── downloads/ # Generated PDFs (auto-created)
└── README.md # This fileTechnologies Used
- Backend: Node.js, Express
- Browser Automation: Puppeteer
- PDF Processing: pdf-lib
- HTML Parsing: Cheerio
- Frontend: Vanilla JavaScript, HTML5, CSS3
License
Publishing to NPM
See NPM_PUBLISH_GUIDE.md for detailed instructions on how to publish this package to npm.
License
MIT
Notes
- The app respects the crawl limit to prevent excessive resource usage
- PDF files are temporary and cleaned up automatically
- Large websites may take several minutes to process
- Some dynamic websites may not render perfectly in the headless browser
