npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2025 – Pkg Stats / Ryan Hefner

@profullstack/summary-forge-module

v1.10.2

Published

An intelligent tool that uses AI to create comprehensive summaries of technical books

Downloads

2,537

Readme

Summary Forge Module

An intelligent tool that uses OpenAI's GPT-5 to forge comprehensive summaries of ebooks in multiple formats.

Repository: [email protected]:profullstack/summary-forge-module.git

Features

  • 📚 Multiple Input Formats: Supports PDF, EPUB files, and web page URLs
  • 🌐 Web Page Summarization: Fetch and summarize any web page with automatic content extraction
  • 🤖 AI-Powered Summaries: Uses GPT-5 with direct PDF upload for better quality
  • 📊 Vision API: Preserves formatting, tables, diagrams, and images from PDFs
  • 🧩 Intelligent Chunking: Automatically processes large PDFs (500+ pages) without truncation
  • 🛡️ Directory Protection: Prompts before overwriting existing summaries (use --force to skip)
  • 📦 Multiple Output Formats: Creates Markdown, PDF, EPUB, plain text, and MP3 audio summaries
  • 🃏 Printable Flashcards: Generates double-sided flashcard PDFs for studying
  • 🖼️ Flashcard Images: Individual PNG images for web app integration (q-001.png, a-001.png, etc.)
  • 🎙️ Natural Audio Narration: AI-generated conversational audio script for better listening
  • 🗜️ Bundled Output: Packages everything into a convenient .tgz archive
  • 🔄 Auto-Conversion: Automatically converts EPUB to PDF using Calibre
  • 🔍 Book Search: Search Amazon by title using Rainforest API
  • 📖 Auto-Download: Downloads books from Anna's Archive with CAPTCHA solving
  • 💻 CLI & Module: Use as a command-line tool or import as an ESM module
  • 🎨 Interactive Mode: Guided workflow with inquirer prompts
  • 📥 EPUB Priority: Automatically prefers EPUB format (open standard, more flexible)

Installation

Global Installation (CLI)

pnpm install -g @profullstack/summary-forge-module

Local Installation (Module)

pnpm add @profullstack/summary-forge-module

Prerequisites

  1. Node.js v20 or newer

  2. Calibre (for EPUB conversion - provides ebook-convert command)

    # macOS
    brew install calibre
       
    # Ubuntu/Debian
    sudo apt-get install calibre
       
    # Arch Linux
    sudo pacman -S calibre
  3. Pandoc (for document conversion)

    # macOS
    brew install pandoc
       
    # Ubuntu/Debian
    sudo apt-get install pandoc
       
    # Arch Linux
    sudo pacman -S pandoc
  4. XeLaTeX (for PDF generation)

    # macOS
    brew install --cask mactex
       
    # Ubuntu/Debian
    sudo apt-get install texlive-xetex
       
    # Arch Linux
    sudo pacman -S texlive-core texlive-xetex

CLI Usage

First-Time Setup

Before using the CLI, configure your API keys:

summary setup

This interactive command will prompt you for:

  • OpenAI API Key (required)
  • Rainforest API Key (optional - for Amazon book search)
  • ElevenLabs API Key (optional - for audio generation, get key here)
  • 2Captcha API Key (optional - for CAPTCHA solving, sign up here)
  • Browserless API Key (optional)
  • Browser and proxy settings

Configuration is saved to ~/.config/summary-forge/settings.json and used automatically by all CLI commands.

Managing Configuration

# View current configuration
summary config

# Update configuration
summary setup

# Delete configuration
summary config --delete

Note: The CLI will use configuration in this priority order:

  1. Environment variables (.env file)
  2. Configuration file (~/.config/summary-forge/settings.json)

Interactive Mode (Recommended)

summary interactive
# or
summary i

This launches an interactive menu where you can:

  • Process local files (PDF/EPUB)
  • Process web page URLs
  • Search for books by title
  • Look up books by ISBN/ASIN

Process a File

summary file /path/to/book.pdf
summary file /path/to/book.epub

# Force overwrite if directory already exists
summary file /path/to/book.pdf --force
summary file /path/to/book.pdf -f

Process a Web Page URL

summary url https://example.com/article
summary url https://blog.example.com/post/123

# Force overwrite if directory already exists
summary url https://example.com/article --force
summary url https://example.com/article -f

Features:

  • Automatically fetches web page content using Puppeteer
  • Sanitizes HTML to remove navigation, ads, footers, and other non-content elements
  • Saves web page as PDF for processing
  • Generates clean title from page title or uses OpenAI to create one
  • Prompts specifically optimized for web page content (ignores nav/ads/footers)
  • Creates same output formats as book processing (MD, TXT, PDF, EPUB, MP3, flashcards)

Search by Title

# Search for books (defaults to 1lib.sk - faster, no DDoS protection)
summary search "LLM Fine Tuning"
summary search "JavaScript" --max-results 5 --extensions pdf,epub
summary search "Python" --year-from 2020 --year-to 2024
summary search "Machine Learning" --languages english --order date

# Use Anna's Archive instead (has DDoS protection, slower)
summary search "Clean Code" --source anna
summary search "Rare Book" --source anna --sources zlib,lgli

# Title search (shortcut for search command)
summary title "A Philosophy of Software Design"
summary title "Clean Code" --force  # Auto-select first result
summary title "Python" --source anna  # Use Anna's Archive

# ISBN lookup (defaults to 1lib.sk)
summary isbn 9780134685991
summary isbn B075HYVHWK --force  # Auto-select and process
summary isbn 9780134685991 --source anna  # Use Anna's Archive

# Common Options:
#   --source <source>              Search source: zlib (1lib.sk, default) or anna (Anna's Archive)
#   -n, --max-results <number>     Maximum results to display (default: 10)
#   -f, --force                    Auto-select first result and process immediately
#
# 1lib.sk Options (--source zlib, default):
#   --year-from <year>             Filter by publication year from (e.g., 2020)
#   --year-to <year>               Filter by publication year to (e.g., 2024)
#   -l, --languages <languages>    Language filter, comma-separated (default: english)
#   -e, --extensions <extensions>  File extensions, comma-separated (case-insensitive, default: PDF)
#   --content-types <types>        Content types, comma-separated (default: book)
#   -s, --order <order>            Sort order: date (newest) or empty for relevance
#   --view <view>                  View type: list or grid (default: list)
#
# Anna's Archive Options (--source anna):
#   -f, --format <format>          Filter by format: pdf, epub, pdf,epub, or all (default: pdf)
#   -s, --sort <sort>              Sort by: date (newest) or empty for relevance (default: '')
#   -l, --language <language>      Language code(s), comma-separated (e.g., en, es, fr) (default: en)
#   --sources <sources>            Data sources, comma-separated (default: all sources)
#                                  Options: zlib, lgli, lgrs, and others

Look up by ISBN/ASIN

summary isbn B075HYVHWK

# Force overwrite if directory already exists
summary isbn B075HYVHWK --force
summary isbn B075HYVHWK -f

Help

summary --help
summary file --help

Programmatic Usage

JSON API Format

All methods now return consistent JSON objects with the following structure:

{
  success: true | false,  // Indicates if operation succeeded
  ...data,                // Method-specific data fields
  error?: string,         // Error message (only when success is false)
  message?: string        // Success message (optional)
}

This enables:

  • Consistent error handling - Check success field instead of try-catch
  • REST API ready - Direct JSON responses for HTTP endpoints
  • Better debugging - Rich metadata in all responses
  • Type-safe - Predictable structure for TypeScript users

Basic Example

import { SummaryForge } from '@profullstack/summary-forge-module';
import { loadConfig } from '@profullstack/summary-forge-module/config';

// Load config from ~/.config/summary-forge/settings.json
const configResult = await loadConfig();
if (!configResult.success) {
  console.error('Failed to load config:', configResult.error);
  process.exit(1);
}

const forge = new SummaryForge(configResult.config);

const result = await forge.processFile('./my-book.pdf');
if (result.success) {
  console.log('Summary created:', result.archive);
  console.log('Files:', result.files);
  console.log('Costs:', result.costs);
} else {
  console.error('Processing failed:', result.error);
}

Configuration Options

import { SummaryForge } from '@profullstack/summary-forge-module';

const forge = new SummaryForge({
  // Required
  openaiApiKey: 'sk-...',
  
  // Optional API keys
  rainforestApiKey: 'your-key',      // For Amazon search
  elevenlabsApiKey: 'sk-...',        // For audio generation (get key: https://try.elevenlabs.io/oh7kgotrpjnv)
  twocaptchaApiKey: 'your-key',      // For CAPTCHA solving (sign up: https://2captcha.com/?from=9630996)
  browserlessApiKey: 'your-key',     // For browserless.io
  
  // Processing options
  maxChars: 500000,                  // Max chars to process
  maxTokens: 20000,                  // Max tokens in output summary
  maxInputTokens: 250000,            // Max input tokens per API call (default: 250000 for GPT-5)
  
  // Audio options
  voiceId: '21m00Tcm4TlvDq8ikWAM',  // ElevenLabs voice
  voiceSettings: {
    stability: 0.5,
    similarity_boost: 0.75
  },
  
  // Browser options
  headless: true,                    // Run browser in headless mode
  enableProxy: false,                // Enable proxy
  proxyUrl: 'http://proxy.com',     // Proxy URL
  proxyUsername: 'user',             // Proxy username
  proxyPassword: 'pass',             // Proxy password
  proxyPoolSize: 36                  // Number of proxies in pool (default: 36)
});

const result = await forge.processFile('./book.epub');
console.log('Archive:', result.archive);

Search for Books

Using Amazon/Rainforest API

const forge = new SummaryForge({
  openaiApiKey: process.env.OPENAI_API_KEY,
  rainforestApiKey: process.env.RAINFOREST_API_KEY
});

const searchResult = await forge.searchBookByTitle('Clean Code');
if (!searchResult.success) {
  console.error('Search failed:', searchResult.error);
  process.exit(1);
}

console.log(`Found ${searchResult.count} results:`);
console.log(searchResult.results.map(b => ({
  title: b.title,
  author: b.author,
  asin: b.asin
})));

// Get download URL
const url = forge.getAnnasArchiveUrl(searchResult.results[0].asin);
console.log('Download from:', url);

Using Anna's Archive Direct Search (No Rainforest API Required)

const forge = new SummaryForge({
  openaiApiKey: process.env.OPENAI_API_KEY,
  enableProxy: true,
  proxyUrl: process.env.PROXY_URL,
  proxyUsername: process.env.PROXY_USERNAME,
  proxyPassword: process.env.PROXY_PASSWORD
});

// Basic search
const searchResult = await forge.searchAnnasArchive('JavaScript', {
  maxResults: 10,
  format: 'pdf',
  sortBy: 'date'  // Sort by newest
});

if (!searchResult.success) {
  console.error('Search failed:', searchResult.error);
  process.exit(1);
}

console.log(`Found ${searchResult.count} results`);
console.log(searchResult.results.map(r => ({
  title: r.title,
  author: r.author,
  format: r.format,
  size: `${r.sizeInMB.toFixed(1)} MB`,
  url: r.url
})));

// Download the first result
if (searchResult.results.length > 0) {
  const md5 = searchResult.results[0].href.match(/\/md5\/([a-f0-9]+)/)[1];
  const downloadResult = await forge.downloadFromAnnasArchive(md5, '.', searchResult.results[0].title);
  
  if (downloadResult.success) {
    console.log('Downloaded:', downloadResult.filepath);
    console.log('Directory:', downloadResult.directory);
  } else {
    console.error('Download failed:', downloadResult.error);
  }
}

Using 1lib.sk Search (Faster, No DDoS Protection)

const forge = new SummaryForge({
  openaiApiKey: process.env.OPENAI_API_KEY,
  enableProxy: true,
  proxyUrl: process.env.PROXY_URL,
  proxyUsername: process.env.PROXY_USERNAME,
  proxyPassword: process.env.PROXY_PASSWORD
});

// Basic search
const searchResult = await forge.search1lib('LLM Fine Tuning', {
  maxResults: 10,
  yearFrom: 2020,
  languages: ['english'],
  extensions: ['PDF']
});

if (!searchResult.success) {
  console.error('Search failed:', searchResult.error);
  process.exit(1);
}

console.log(`Found ${searchResult.count} results`);
console.log(searchResult.results.map(r => ({
  title: r.title,
  author: r.author,
  year: r.year,
  extension: r.extension,
  size: r.size,
  language: r.language,
  isbn: r.isbn,
  url: r.url
})));

// Download the first result
if (searchResult.results.length > 0) {
  const downloadResult = await forge.downloadFrom1lib(
    searchResult.results[0].url,
    '.',
    searchResult.results[0].title
  );
  
  if (downloadResult.success) {
    console.log('Downloaded:', downloadResult.filepath);
    
    // Process the downloaded book
    const processResult = await forge.processFile(downloadResult.filepath, downloadResult.identifier);
    if (processResult.success) {
      console.log('Summary created:', processResult.archive);
      console.log('Costs:', processResult.costs);
    } else {
      console.error('Processing failed:', processResult.error);
    }
  } else {
    console.error('Download failed:', downloadResult.error);
  }
}

Enhanced Error Handling:

The 1lib.sk download functionality includes robust error handling with automatic debugging:

  • Multiple Selector Fallbacks: Tries 6 different selectors to find download buttons
  • Debug HTML Capture: Saves page HTML when download button isn't found
  • Link Analysis: Lists all links on the page for troubleshooting
  • Detailed Error Messages: Provides actionable information for debugging

If a download fails, check the debug-book-page.html file in the book's directory for detailed page structure information.

API Reference

Constructor Options

new SummaryForge({
  // API Keys
  openaiApiKey: string,      // Required: OpenAI API key
  rainforestApiKey: string,  // Optional: For title search
  elevenlabsApiKey: string,  // Optional: For audio generation
  twocaptchaApiKey: string,  // Optional: For CAPTCHA solving
  browserlessApiKey: string, // Optional: For browserless.io
  
  // Processing Options
  maxChars: number,          // Optional: Max chars to process (default: 400000)
  maxTokens: number,         // Optional: Max tokens in output summary (default: 16000)
  maxInputTokens: number,    // Optional: Max input tokens per API call (default: 250000 for GPT-5)
  
  // Audio Options
  voiceId: string,           // Optional: ElevenLabs voice ID (default: Brian)
  voiceSettings: object,     // Optional: Voice customization settings
  
  // Browser Options
  headless: boolean,         // Optional: Run browser in headless mode (default: true)
  enableProxy: boolean,      // Optional: Enable proxy (default: false)
  proxyUrl: string,          // Optional: Proxy URL
  proxyUsername: string,     // Optional: Proxy username
  proxyPassword: string,     // Optional: Proxy password
  proxyPoolSize: number      // Optional: Number of proxies in pool (default: 36)
})

Methods

All methods return JSON objects with { success, ...data, error?, message? } format.

Processing Methods
  • processFile(filePath, asin?) - Process a PDF or EPUB file

    • Returns: { success, basename, markdown, files, archive, hasAudio, asin, costs, message, error? }
    • Example:
      const result = await forge.processFile('./book.pdf');
      if (result.success) {
        console.log('Archive:', result.archive);
        console.log('Costs:', result.costs);
      }
  • processWebPage(url, outputDir?) - Process a web page URL

    • Returns: { success, basename, dirName, markdown, files, directory, archive, hasAudio, url, title, costs, message, error? }
    • Example:
      const result = await forge.processWebPage('https://example.com/article');
      if (result.success) {
        console.log('Summary:', result.markdown.substring(0, 100));
      }
Search Methods
  • searchBookByTitle(title) - Search Amazon using Rainforest API

    • Returns: { success, results, count, query, message, error? }
    • Example:
      const result = await forge.searchBookByTitle('Clean Code');
      if (result.success) {
        console.log(`Found ${result.count} books`);
      }
  • searchAnnasArchive(query, options?) - Search Anna's Archive directly

    • Returns: { success, results, count, query, options, message, error? }
    • Example:
      const result = await forge.searchAnnasArchive('JavaScript', {
        maxResults: 10,
        format: 'pdf',
        sortBy: 'date'
      });
      if (result.success) {
        console.log(`Found ${result.count} results`);
      }
  • search1lib(query, options?) - Search 1lib.sk

    • Returns: { success, results, count, query, options, message, error? }
Download Methods
  • downloadFromAnnasArchive(asin, outputDir?, bookTitle?) - Download from Anna's Archive

    • Returns: { success, filepath, directory, asin, format, message, error? }
    • Example:
      const result = await forge.downloadFromAnnasArchive('B075HYVHWK', '.');
      if (result.success) {
        console.log('Downloaded to:', result.filepath);
      }
  • downloadFrom1lib(bookUrl, outputDir?, bookTitle?, downloadUrl?) - Download from 1lib.sk

    • Returns: { success, filepath, directory, title, format, message, error? }
  • search1libAndDownload(query, searchOptions?, outputDir?, selectCallback?) - Search and download in one session

    • Returns: { success, results, download, message, error? }
Generation Methods
  • generateSummary(pdfPath) - Generate AI summary from PDF

    • Returns: { success, markdown, length, method, chunks?, message, error? }
    • Methods: gpt5_pdf_upload, text_extraction_single, text_extraction_chunked
    • Example:
      const result = await forge.generateSummary('./book.pdf');
      if (result.success) {
        console.log(`Generated ${result.length} char summary using ${result.method}`);
      }
  • generateAudioScript(markdown) - Generate audio-friendly narration script

    • Returns: { success, script, length, message }
  • generateAudio(text, outputPath) - Generate audio using ElevenLabs TTS

    • Returns: { success, path, size, duration, message, error? }
  • generateOutputFiles(markdown, basename, outputDir) - Generate all output formats

    • Returns: { success, files: {...}, message }
Utility Methods
  • convertEpubToPdf(epubPath) - Convert EPUB to PDF

    • Returns: { success, pdfPath, originalPath, message, error? }
  • createBundle(files, archiveName) - Create tar.gz archive

    • Returns: { success, path, files, message, error? }
  • getCostSummary() - Get cost tracking information

    • Returns: { success, openai, elevenlabs, rainforest, total, breakdown }

Configuration

CLI Configuration (Recommended)

For CLI usage, run the setup command to configure your API keys:

summary setup

This saves your configuration to ~/.config/summary-forge/settings.json so you don't need to manage environment variables.

Environment Variables (Alternative)

For programmatic usage or if you prefer environment variables, create a .env file:

OPENAI_API_KEY=sk-your-key-here
RAINFOREST_API_KEY=your-key-here
ELEVENLABS_API_KEY=sk-your-key-here  # Optional: for audio generation
TWOCAPTCHA_API_KEY=your-key-here      # Optional: for CAPTCHA solving
BROWSERLESS_API_KEY=your-key-here     # Optional

# Browser Configuration
HEADLESS=true                          # Run browser in headless mode
ENABLE_PROXY=false                     # Enable proxy for browser requests
PROXY_URL=http://proxy.example.com    # Proxy URL (if enabled)
PROXY_USERNAME=username                # Proxy username (if enabled)
PROXY_PASSWORD=password                # Proxy password (if enabled)
PROXY_POOL_SIZE=36                     # Number of proxies in your pool (default: 36)

Or set them in your shell:

export OPENAI_API_KEY=sk-your-key-here
export RAINFOREST_API_KEY=your-key-here
export ELEVENLABS_API_KEY=sk-your-key-here  # Optional

Configuration Priority

When using the module programmatically, configuration is loaded in this order (highest priority first):

  1. Constructor options - Passed directly to new SummaryForge(options)
  2. Environment variables - From .env file or shell
  3. Config file - From ~/.config/summary-forge/settings.json (CLI only)

Proxy Configuration (Recommended for Anna's Archive)

To avoid IP bans when downloading from Anna's Archive, configure a proxy during setup:

summary setup

When prompted:

  1. Enable proxy: Yes
  2. Enter proxy URL: http://your-proxy.com:8080
  3. Enter proxy username and password

Why use a proxy?

  • ✅ Avoids IP bans from Anna's Archive
  • ✅ USA-based proxies prevent geo-location issues
  • ✅ Works with both browser navigation and file downloads
  • ✅ Automatically applied to all download operations

Recommended Proxy Service:

We recommend Webshare.io for reliable, USA-based proxies:

  • 🌎 USA-based IPs (no geo-location issues)
  • ⚡ Fast and reliable
  • 💰 Affordable pricing with free tier
  • 🔒 HTTP/HTTPS/SOCKS5 support

Important: Use Static Proxies for Sticky Sessions

For Anna's Archive downloads, you need a static/direct proxy (not rotating) to maintain the same IP:

  1. In your Webshare dashboard, go to ProxyList
  2. Copy a Static Proxy endpoint (not the rotating endpoint)
  3. Use the format: http://host:port (e.g., http://45.95.96.132:8080)
  4. Username format: dmdgluqz-US-{session_id} (session ID added automatically)

The tool automatically generates a unique session ID (1 to PROXY_POOL_SIZE) for each download to get a fresh IP, while maintaining that IP throughout the 5-10 minute download process.

Proxy Pool Size Configuration:

Set PROXY_POOL_SIZE to match your Webshare plan (default: 36):

  • Free tier: 10 proxies → PROXY_POOL_SIZE=10
  • Starter plan: 25 proxies → PROXY_POOL_SIZE=25
  • Professional plan: 100 proxies → PROXY_POOL_SIZE=100
  • Enterprise plan: 250+ proxies → PROXY_POOL_SIZE=250

The tool will randomly select a session ID from 1 to your pool size, distributing load across all available proxies.

Smart ISBN Detection:

When searching Anna's Archive, the tool automatically detects whether an identifier is a real ISBN or an Amazon ASIN:

  • Real ISBNs (10 or 13 numeric digits): Searches by ISBN for precise results
  • Amazon ASINs (alphanumeric): Searches by book title instead for better results
  • This ensures you get relevant search results even when Amazon returns proprietary ASINs instead of standard ISBNs

Note: Rotating proxies (p.webshare.io) don't support sticky sessions. Use individual static proxy IPs from your proxy list instead.

Testing your proxy:

node test-proxy.js <ASIN>

This will verify your proxy configuration by attempting to download a book.

Audio Generation

Audio generation is optional and requires an ElevenLabs API key. If the key is not provided, the tool will skip audio generation and only create text-based outputs.

Get ElevenLabs API Key: Sign up here for high-quality text-to-speech.

Features:

  • Uses ElevenLabs Turbo v2.5 model (optimized for audiobooks)
  • Default voice: Brian (best for technical content, customizable)
  • Automatically truncates long texts to fit API limits
  • Generates high-quality MP3 audio files
  • Natural, conversational narration style

Output

The tool generates:

  • <book_name>_summary.md - Markdown summary
  • <book_name>_summary.txt - Plain text summary
  • <book_name>_summary.pdf - PDF summary with table of contents
  • <book_name>_summary.epub - EPUB summary with clickable TOC
  • <book_name>_summary.mp3 - Audio summary (if ElevenLabs key provided)
  • <book_name>.pdf - Original or converted PDF
  • <book_name>.epub - Original EPUB (if input was EPUB)
  • <book_name>_bundle.tgz - Compressed archive containing all files

Example Workflow

# 1. Search for a book
summary search
# Enter: "A Philosophy of Software Design"
# Select from results, get ASIN

# 2. Download and process automatically
summary isbn B075HYVHWK
# Downloads, asks if you want to process
# Creates summary bundle automatically!

# Alternative: Process a local file
summary file ~/Downloads/book.epub

How It Works

  1. Input Processing: Accepts PDF or EPUB files (EPUB is converted to PDF)
  2. Smart Processing Strategy:
    • Small PDFs (<400k chars): Direct upload to OpenAI's vision API
    • Large PDFs (>400k chars): Intelligent chunking with synthesis
  3. AI Summarization: GPT-5 analyzes content with full formatting, tables, and diagrams
  4. Format Conversion: Uses Pandoc to convert the Markdown summary to PDF and EPUB
  5. Audio Generation: Optional TTS conversion using ElevenLabs
  6. Bundling: Creates a compressed archive with all generated files

Intelligent Chunking for Large PDFs

For PDFs exceeding 400,000 characters (typically 500+ pages), the tool automatically uses an intelligent chunking strategy:

How it works:

  1. Analysis: Calculates optimal chunk size based on PDF statistics and GPT-5's token limits
  2. Smart Token Management: Respects GPT-5's 272k input token limit with safety margins
  3. Page-Based Chunking: Splits PDF into logical chunks that fit within token limits
  4. Parallel Processing: Each chunk is summarized independently by GPT-5
  5. Intelligent Synthesis: All chunk summaries are combined into a cohesive final summary
  6. Quality Preservation: Maintains narrative flow and eliminates redundancy

Token Limit Handling:

  • GPT-5 Input Limit: 272,000 tokens
  • System Overhead: 20,000 tokens reserved for prompts and instructions
  • Available Tokens: 250,000 tokens for content
  • Safety Margin: 70% utilization to account for token estimation variance
  • Chunk Size: ~565,000 characters per chunk (based on 3.5 chars/token estimate)

Benefits:

  • Complete Coverage: Processes entire books without truncation
  • High Quality: Each section gets full AI attention
  • Seamless Output: Final summary reads as a unified document
  • Cost Efficient: Optimizes token usage across multiple API calls
  • Automatic: No configuration needed - works transparently
  • Token-Aware: Respects API limits to prevent errors

Example Output:

📊 PDF Stats: 523 pages, 1,245,678 chars, ~311,420 tokens
📚 PDF is large - using intelligent chunking strategy
   This will process the ENTIRE 523-page PDF without truncation
📐 Using chunk size: 120,000 chars
📦 Created 11 chunks for processing
   Chunk 1: Pages 1-48 (119,234 chars)
   Chunk 2: Pages 49-95 (118,901 chars)
   ...
✅ All 11 chunks processed successfully
🔄 Synthesizing chunk summaries into final comprehensive summary...
✅ Final summary synthesized: 45,678 characters

Why Direct PDF Upload?

The tool prioritizes OpenAI's vision API for direct PDF upload when possible:

  • Better Quality: Preserves document formatting, tables, and diagrams
  • More Accurate: AI can see the actual PDF layout and structure
  • Better for Technical Books: Code examples and diagrams are preserved
  • Fallback Strategy: Automatically switches to intelligent chunking for large files

Testing

Summary Forge includes a comprehensive test suite using Vitest.

Run Tests

# Run all tests
pnpm test

# Run tests in watch mode
pnpm test:watch

# Run tests with coverage report
pnpm test:coverage

Test Coverage

The test suite includes:

  • ✅ 30+ passing tests
  • Constructor validation
  • Helper method tests
  • PDF upload functionality tests
  • API integration tests
  • Error handling tests
  • Edge case coverage
  • File operation tests

See test/summary-forge.test.js for the complete test suite.

Flashcard Generation

Summary Forge includes powerful flashcard generation capabilities for study and review.

Printable PDF Flashcards

Generate double-sided flashcard PDFs optimized for printing:

import { extractFlashcards, generateFlashcardsPDF } from '@profullstack/summary-forge-module/flashcards';
import fs from 'node:fs/promises';

// Read your markdown summary
const markdown = await fs.readFile('./book_summary.md', 'utf-8');

// Extract Q&A pairs
const extractResult = extractFlashcards(markdown, { maxCards: 50 });
console.log(`Extracted ${extractResult.count} flashcards`);

// Generate printable PDF
const pdfResult = await generateFlashcardsPDF(
  extractResult.flashcards,
  './flashcards.pdf',
  {
    title: 'JavaScript Fundamentals',
    branding: 'SummaryForge.com',
    cardWidth: 3.5,   // inches
    cardHeight: 2.5,  // inches
    fontSize: 11
  }
);

console.log(`PDF created: ${pdfResult.path}`);
console.log(`Total pages: ${pdfResult.pages}`);

Individual Flashcard Images

Generate individual PNG images for each flashcard, perfect for web applications:

import { extractFlashcards, generateFlashcardImages } from '@profullstack/summary-forge-module/flashcards';
import fs from 'node:fs/promises';

// Read your markdown summary
const markdown = await fs.readFile('./book_summary.md', 'utf-8');

// Extract Q&A pairs
const extractResult = extractFlashcards(markdown);

// Generate individual PNG images
const imageResult = await generateFlashcardImages(
  extractResult.flashcards,
  './flashcards',  // Output directory
  {
    title: 'JavaScript Fundamentals',
    branding: 'SummaryForge.com',
    width: 800,   // pixels
    height: 600,  // pixels
    fontSize: 24
  }
);

if (imageResult.success) {
  console.log(`Generated ${imageResult.images.length} images`);
  console.log('Files:', imageResult.images);
  // Output: ['./flashcards/q-001.png', './flashcards/a-001.png', ...]
}

Image Naming Convention:

  • q-001.png, q-002.png, etc. - Question cards
  • a-001.png, a-002.png, etc. - Answer cards

Use Cases:

  • 🌐 Web-based flashcard applications
  • 📱 Mobile learning apps
  • 🎮 Interactive quiz games
  • 📊 Study progress tracking systems
  • 🔄 Spaced repetition software

Features:

  • ✅ Clean, professional design with book title
  • ✅ Automatic text wrapping for long content
  • ✅ Customizable dimensions and styling
  • ✅ SVG-based rendering for crisp quality
  • ✅ Works in Docker (no native dependencies)

Flashcard Extraction Formats

The extractFlashcards function supports multiple markdown formats:

1. Explicit Q&A Format:

**Q: What is a closure?**
A: A closure is a function that has access to variables in its outer scope.

2. Definition Lists:

**Closure**
: A function that has access to variables in its outer scope.

3. Question Headers:

### What is a closure?

A closure is a function that has access to variables in its outer scope.

Examples

See the examples/ directory for more usage examples:

Troubleshooting

Rate Limiting (1lib.sk)

If you encounter "Too many requests" errors from 1lib.sk:

Error Message:

Too many requests from your IP xxx.xxx.xxx.xxx
Please wait 10 seconds. [email protected]. Err #ipd1

Automatic Handling: The tool automatically detects rate limiting and:

  • ✅ Waits the requested time (usually 10 seconds)
  • ✅ Retries up to 3 times with exponential backoff
  • ✅ Adds a 2-second buffer to ensure rate limit has cleared

Manual Solutions:

  1. Wait a few minutes before trying again
  2. Use a different proxy session (the tool rotates through your proxy pool automatically)
  3. Switch to Anna's Archive: summary search "book title" --source anna
  4. Reduce concurrent requests if running multiple downloads

Note: The proxy pool helps distribute requests across different IPs, reducing rate limiting issues.

Download Button Not Found (1lib.sk)

If you encounter "Download button not found" errors when downloading from 1lib.sk:

  1. Check Debug Files: The tool automatically saves debug-book-page.html in the book's directory

    • Open this file to inspect the actual page structure
    • Look for download links or buttons that might have different selectors
  2. Review Error Output: The error message includes:

    • All selectors that were tried
    • List of links found on the page
    • Location of the debug HTML file
  3. Common Causes:

    • Z-Access/Library Access Page: Book page redirects to authentication page (most common)
    • Page structure changed (1lib.sk updates their site)
    • Book is deleted or unavailable
    • Session expired or cookies not maintained
    • Proxy issues preventing proper page load
  4. Solutions:

    • Recommended: Use Anna's Archive instead: summary search "book title" --source anna
    • Try the search1lib command separately to verify the book exists
    • Check if the book page loads correctly in a regular browser with the same proxy
    • Verify proxy configuration is working correctly
    • Try a different book from search results
  5. Known Issue - Z-Access Page: If you see links to library-access.sk or Z-Access page in the debug output, this means:

    • The book page requires authentication or special access
    • 1lib.sk's session management is blocking automated access
    • Workaround: Use Anna's Archive which has better automation support

Example Debug Output (Z-Access Issue):

❌ Download button not found on book page
   Debug HTML saved to: ./uploads/book_name/debug-book-page.html
   Found 6 links on page
   First 5 links:
   - https://library-access.sk (Z-Access page)
   - mailto:[email protected] ([email protected])
   - https://www.reddit.com/r/zlibrary (https://www.reddit.com/r/zlibrary)

Recommended Alternative:

# Use Anna's Archive instead (more reliable for automation)
summary search "prompt engineering" --source anna

IP Bans from Anna's Archive

If you're getting blocked by Anna's Archive:

  1. Enable proxy in your configuration:

    summary setup
  2. Use a USA-based proxy to avoid geo-location issues

  3. Test your proxy before downloading:

    node test-proxy.js B0BCTMXNVN
  4. Run browser in visible mode to debug:

    summary config --headless false

Proxy Configuration

The proxy is used for:

  • ✅ Browser navigation (Puppeteer)
  • ✅ File downloads (fetch with https-proxy-agent)
  • ✅ All HTTP requests to Anna's Archive

Supported proxy formats:

  • http://proxy.example.com:8080
  • https://proxy.example.com:8080
  • socks5://proxy.example.com:1080
  • http://proxy.example.com:8080-session-<SESSION_ID> (sticky session)

Recommended Service: Webshare.io - Reliable USA-based proxies with free tier available.

Webshare Sticky Sessions: Add -session-<YOUR_SESSION_ID> to your proxy URL to maintain the same IP:

http://p.webshare.io:80-session-myapp123

CAPTCHA Solving

When downloading from Anna's Archive, you may encounter CAPTCHAs. To automatically solve them:

  1. Sign up for 2Captcha: Get API key here
  2. Add to configuration:
    summary setup
  3. Enter your 2Captcha API key when prompted

The tool will automatically detect and solve CAPTCHAs during downloads, making the process fully automated.

Limitations

  • Maximum PDF file size: No practical limit (intelligent chunking handles any size)
  • GPT-5 uses default temperature of 1 (not configurable)
  • Requires external tools: Calibre, Pandoc, XeLaTeX
  • CAPTCHA solving requires 2captcha.com API key (optional)
  • Very large PDFs (1000+ pages) may incur higher API costs due to multiple chunk processing
  • Anna's Archive may block IPs without proxy configuration
  • Chunked processing uses text extraction (images/diagrams described in text only)

Roadmap

  • [x] ISBN/ASIN lookup via Anna's Archive
  • [x] Automatic download from Anna's Archive with CAPTCHA solving
  • [x] Book title search via Rainforest API
  • [x] CLI with interactive mode
  • [x] ESM module for programmatic use
  • [x] Audio generation with ElevenLabs TTS
  • [x] Direct PDF upload to OpenAI vision API
  • [x] EPUB format prioritization (open standard)
  • [ ] Support for more input formats (MOBI, AZW3)
  • [ ] Chunked processing for very large books (>100MB)
  • [ ] Custom summary templates
  • [ ] Web interface
  • [ ] Multiple voice options for audio
  • [ ] Audio chapter markers
  • [ ] Batch processing multiple books

License

ISC

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.