websnapper

v0.0.2

Published

6 months ago

High-performance Node.js library for converting URLs or HTML to PDF, PNG, JPG with progress events and batch support.

Downloads

WebSnapper

A powerful Node.js library for converting web pages to PDF and images using Puppeteer. WebSnapper provides a simple yet comprehensive API for web-to-file conversion with advanced features and progress tracking.

Features

🚀 Convert URLs, HTML files, or HTML strings to PDF, PNG, JPG
📊 Real-time progress tracking with events
🎯 Advanced page manipulation (hide/remove elements, inject CSS/JS)
📱 Responsive viewport configuration
🔄 Batch processing support
🛡️ Robust error handling
💾 Memory-efficient browser management
⚡ High-performance headless Chrome automation

Installation

npm install websnapper

Quick Start

const { WebSnapper } = require('websnapper');

async function example() {
  const snapper = new WebSnapper();
  
  // Convert URL to PDF
  await snapper.pdf('https://example.com', {
    path: 'output.pdf'
  });
  
  // Take screenshot
  await snapper.png('https://example.com', {
    path: 'screenshot.png'
  });
  
  await snapper.close();
}

example();

API Reference

WebSnapper Class

Constructor

const snapper = new WebSnapper();

Methods

`convert(input, options)`

Main conversion method that handles all output formats.

Parameters:

input (string): URL, file path, or HTML content
options (object): Configuration options

Returns: Promise resolving to conversion result

Format-specific methods

pdf(input, options) - Convert to PDF
png(input, options) - Convert to PNG
jpg(input, options) - Convert to JPG
jpeg(input, options) - Convert to JPEG

Utility methods

init() - Initialize browser instance
close() - Close browser and cleanup
batch(inputs, options) - Process multiple inputs
getPageInfo(input) - Extract page metadata

Configuration Options

| Option | Type | Default | Description | |--------|------|---------|-------------| | output | string | 'pdf' | Output format: 'pdf', 'png', 'jpg', 'jpeg' | | path | string | undefined | Output file path | | quality | number | 90 | Image quality (1-100, not for PNG) | | viewport | object | {width: 1920, height: 1080} | Browser viewport size | | waitFor | number/string/function | 2000 | Wait condition before capture | | margin | object | {top: '1cm', right: '1cm', bottom: '1cm', left: '1cm'} | PDF margins | | landscape | boolean | false | PDF orientation | | pageSize | string | 'A4' | PDF page size | | fullPage | boolean | true | Capture full page or viewport only | | printBackground | boolean | true | Include background graphics | | hideElements | array | [] | CSS selectors to hide | | removeElements | array | [] | CSS selectors to remove | | injectCSS | string | undefined | Custom CSS to inject | | injectJS | string | undefined | Custom JavaScript to inject | | scrollToBottom | boolean | false | Auto-scroll to load lazy content | | timeout | number | 30000 | Navigation timeout in ms | | userAgent | string | undefined | Custom user agent | | cookies | array | undefined | Cookies to set | | extraHTTPHeaders | object | undefined | Additional HTTP headers |

Examples

Basic PDF Generation

const { WebSnapper } = require('websnapper');

async function createPDF() {
  const snapper = new WebSnapper();
  
  const result = await snapper.pdf('https://github.com', {
    path: 'github.pdf',
    format: 'A4',
    margin: {
      top: '2cm',
      right: '2cm',
      bottom: '2cm',
      left: '2cm'
    }
  });
  
  console.log('PDF created:', result.path);
  await snapper.close();
}

createPDF();

Screenshot with Custom Viewport

const { WebSnapper } = require('websnapper');

async function takeScreenshot() {
  const snapper = new WebSnapper();
  
  const result = await snapper.png('https://example.com', {
    path: 'mobile-view.png',
    viewport: { width: 375, height: 667 }, // iPhone viewport
    quality: 95,
    fullPage: true
  });
  
  console.log('Screenshot saved:', result.filename);
  await snapper.close();
}

takeScreenshot();

Advanced Page Manipulation

const { WebSnapper } = require('websnapper');

async function cleanCapture() {
  const snapper = new WebSnapper();
  
  await snapper.pdf('https://news.ycombinator.com', {
    path: 'clean-hn.pdf',
    hideElements: [
      '.ad',
      '.sidebar', 
      'footer'
    ],
    removeElements: [
      '.popup',
      '.cookie-banner'
    ],
    injectCSS: `
      body { font-size: 14px !important; }
      .main { max-width: 800px !important; }
    `,
    waitFor: 3000 // Wait for dynamic content
  });
  
  await snapper.close();
}

cleanCapture();

Batch Processing

const { WebSnapper } = require('websnapper');

async function batchConvert() {
  const snapper = new WebSnapper();
  
  const urls = [
    'https://github.com',
    'https://stackoverflow.com',
    'https://developer.mozilla.org'
  ];
  
  const options = [
    { path: 'github.pdf', format: 'A4' },
    { path: 'stackoverflow.pdf', format: 'A4' },
    { path: 'mdn.pdf', format: 'A4' }
  ];
  
  // Track overall progress
  snapper.on('progress', (data) => {
    console.log(`Progress: ${data.message} - ${data.step}`);
  });
  
  const results = await snapper.batch(urls, options);
  
  results.forEach((result, index) => {
    if (result.success) {
      console.log(`✅ ${urls[index]} -> ${result.filename}`);
    } else {
      console.log(`❌ ${urls[index]} failed: ${result.error}`);
    }
  });
  
  await snapper.close();
}

batchConvert();

Converting HTML Content

const { WebSnapper } = require('websnapper');

async function htmlToPDF() {
  const snapper = new WebSnapper();
  
  const htmlContent = `
    <!DOCTYPE html>
    <html>
    <head>
      <title>My Document</title>
      <style>
        body { font-family: Arial, sans-serif; padding: 2rem; }
        h1 { color: #333; }
      </style>
    </head>
    <body>
      <h1>Hello World</h1>
      <p>This is a dynamically generated PDF from HTML content.</p>
    </body>
    </html>
  `;
  
  await snapper.pdf(htmlContent, {
    path: 'dynamic.pdf',
    format: 'A4'
  });
  
  await snapper.close();
}

htmlToPDF();

Progress Tracking

const { WebSnapper } = require('websnapper');

async function trackProgress() {
  const snapper = new WebSnapper();
  
  // Listen to progress events
  snapper.on('progress', (data) => {
    console.log(`📈 ${data.step}: ${data.progress}% - ${data.message}`);
  });
  
  await snapper.pdf('https://example.com', {
    path: 'tracked.pdf',
    onProgress: (data) => {
      // Custom progress handler
      if (data.progress % 10 === 0) {
        console.log(`🔄 Custom: ${data.progress}% complete`);
      }
    }
  });
  
  await snapper.close();
}

trackProgress();

Quick Convert Functions

For simple use cases, use the quick convert functions:

const { quickConvert } = require('websnapper');

// Quick PDF
await quickConvert.pdf('https://example.com', 'quick.pdf');

// Quick screenshot
await quickConvert.png('https://example.com', 'quick.png');

// With progress callback
await quickConvert.pdf('https://example.com', 'progress.pdf', (data) => {
  console.log(`Progress: ${data.progress}%`);
});

Page Information Extraction

const { WebSnapper } = require('websnapper');

async function getPageInfo() {
  const snapper = new WebSnapper();
  
  const info = await snapper.getPageInfo('https://github.com');
  
  console.log('Page Info:', {
    title: info.title,
    dimensions: info.dimensions,
    description: info.meta.description
  });
  
  await snapper.close();
}

getPageInfo();

Working with Cookies and Headers

const { WebSnapper } = require('websnapper');

async function authenticatedCapture() {
  const snapper = new WebSnapper();
  
  await snapper.pdf('https://example.com/dashboard', {
    path: 'dashboard.pdf',
    cookies: [
      {
        name: 'session',
        value: 'your-session-token',
        domain: 'example.com'
      }
    ],
    extraHTTPHeaders: {
      'Authorization': 'Bearer your-token',
      'Custom-Header': 'value'
    },
    userAgent: 'Mozilla/5.0 (compatible; WebSnapper/1.0)'
  });
  
  await snapper.close();
}

authenticatedCapture();

Error Handling

const { WebSnapper } = require('websnapper');

async function handleErrors() {
  const snapper = new WebSnapper();
  
  try {
    const result = await snapper.pdf('https://invalid-url', {
      path: 'test.pdf',
      timeout: 10000
    });
    
    if (result.success) {
      console.log('✅ PDF created successfully');
    }
  } catch (error) {
    console.error('❌ Conversion failed:', error.message);
    
    // Handle specific error types
    if (error.message.includes('timeout')) {
      console.log('💡 Try increasing the timeout option');
    }
  } finally {
    await snapper.close(); // Always cleanup
  }
}

handleErrors();

Best Practices

1. Always Close the Browser

const snapper = new WebSnapper();
try {
  // Your operations
} finally {
  await snapper.close(); // Prevent memory leaks
}

2. Reuse Browser Instance

// Good: Reuse for multiple operations
const snapper = new WebSnapper();
await snapper.pdf('url1', options1);
await snapper.png('url2', options2);
await snapper.close();

// Avoid: Creating new instances
// const snapper1 = new WebSnapper();
// const snapper2 = new WebSnapper();

3. Handle Large Pages

await snapper.pdf('https://long-page.com', {
  timeout: 60000,        // Increase timeout
  waitFor: 5000,         // Wait for content to load
  scrollToBottom: true   // Load lazy content
});

4. Optimize for Performance

const snapper = new WebSnapper();

// Disable unnecessary features for faster processing
await snapper.pdf('https://example.com', {
  printBackground: false,  // Skip backgrounds if not needed
  preferCSSPageSize: true, // Use CSS page size
  waitFor: 1000           // Reduce wait time if possible
});

TypeScript Support

WebSnapper includes TypeScript definitions:

import { WebSnapper, WebSnapperOptions, WebSnapperResult } from 'websnapper';

const snapper = new WebSnapper();

const options: WebSnapperOptions = {
  output: 'pdf',
  path: 'output.pdf',
  quality: 90
};

const result: WebSnapperResult = await snapper.convert('https://example.com', options);

Requirements

Node.js 12.0.0 or higher
Chrome/Chromium browser (automatically managed by Puppeteer)

Creator

Made by BLUEZLY

WebSnapper - Convert the web, one page at a time. 🌐➡️📄