@profullstack/summary-forge-module

v1.10.2

Published

3 months ago

An intelligent tool that uses AI to create comprehensive summaries of technical books

0High
0Medium
0Low

chovy

devpreshy

ai summary books pdf epub gpt openai

Summary Forge Module

An intelligent tool that uses OpenAI's GPT-5 to forge comprehensive summaries of ebooks in multiple formats.

Repository: [email protected]:profullstack/summary-forge-module.git

Features

📚 Multiple Input Formats: Supports PDF, EPUB files, and web page URLs
🌐 Web Page Summarization: Fetch and summarize any web page with automatic content extraction
🤖 AI-Powered Summaries: Uses GPT-5 with direct PDF upload for better quality
📊 Vision API: Preserves formatting, tables, diagrams, and images from PDFs
🧩 Intelligent Chunking: Automatically processes large PDFs (500+ pages) without truncation
🛡️ Directory Protection: Prompts before overwriting existing summaries (use --force to skip)
📦 Multiple Output Formats: Creates Markdown, PDF, EPUB, plain text, and MP3 audio summaries
🃏 Printable Flashcards: Generates double-sided flashcard PDFs for studying
🖼️ Flashcard Images: Individual PNG images for web app integration (q-001.png, a-001.png, etc.)
🎙️ Natural Audio Narration: AI-generated conversational audio script for better listening
🗜️ Bundled Output: Packages everything into a convenient .tgz archive
🔄 Auto-Conversion: Automatically converts EPUB to PDF using Calibre
🔍 Book Search: Search Amazon by title using Rainforest API
📖 Auto-Download: Downloads books from Anna's Archive with CAPTCHA solving
💻 CLI & Module: Use as a command-line tool or import as an ESM module
🎨 Interactive Mode: Guided workflow with inquirer prompts
📥 EPUB Priority: Automatically prefers EPUB format (open standard, more flexible)

Installation

Global Installation (CLI)

pnpm install -g @profullstack/summary-forge-module

Local Installation (Module)

pnpm add @profullstack/summary-forge-module

Prerequisites

Node.js v20 or newer

Calibre (for EPUB conversion - provides ebook-convert command)

# macOS
brew install calibre
   
# Ubuntu/Debian
sudo apt-get install calibre
   
# Arch Linux
sudo pacman -S calibre

Pandoc (for document conversion)

# macOS
brew install pandoc
   
# Ubuntu/Debian
sudo apt-get install pandoc
   
# Arch Linux
sudo pacman -S pandoc

XeLaTeX (for PDF generation)

# macOS
brew install --cask mactex
   
# Ubuntu/Debian
sudo apt-get install texlive-xetex
   
# Arch Linux
sudo pacman -S texlive-core texlive-xetex

CLI Usage

First-Time Setup

Before using the CLI, configure your API keys:

summary setup

This interactive command will prompt you for:

OpenAI API Key (required)
Rainforest API Key (optional - for Amazon book search)
ElevenLabs API Key (optional - for audio generation, get key here)
2Captcha API Key (optional - for CAPTCHA solving, sign up here)
Browserless API Key (optional)
Browser and proxy settings

Configuration is saved to ~/.config/summary-forge/settings.json and used automatically by all CLI commands.

Managing Configuration

# View current configuration
summary config

# Update configuration
summary setup

# Delete configuration
summary config --delete

Note: The CLI will use configuration in this priority order:

Environment variables (.env file)
Configuration file (~/.config/summary-forge/settings.json)

Interactive Mode (Recommended)

summary interactive
# or
summary i

This launches an interactive menu where you can:

Process local files (PDF/EPUB)
Process web page URLs
Search for books by title
Look up books by ISBN/ASIN

Process a File

summary file /path/to/book.pdf
summary file /path/to/book.epub

# Force overwrite if directory already exists
summary file /path/to/book.pdf --force
summary file /path/to/book.pdf -f

Process a Web Page URL

summary url https://example.com/article
summary url https://blog.example.com/post/123

# Force overwrite if directory already exists
summary url https://example.com/article --force
summary url https://example.com/article -f

Features:

Automatically fetches web page content using Puppeteer
Sanitizes HTML to remove navigation, ads, footers, and other non-content elements
Saves web page as PDF for processing
Generates clean title from page title or uses OpenAI to create one
Prompts specifically optimized for web page content (ignores nav/ads/footers)
Creates same output formats as book processing (MD, TXT, PDF, EPUB, MP3, flashcards)

Search by Title

# Search for books (defaults to 1lib.sk - faster, no DDoS protection)
summary search "LLM Fine Tuning"
summary search "JavaScript" --max-results 5 --extensions pdf,epub
summary search "Python" --year-from 2020 --year-to 2024
summary search "Machine Learning" --languages english --order date

# Use Anna's Archive instead (has DDoS protection, slower)
summary search "Clean Code" --source anna
summary search "Rare Book" --source anna --sources zlib,lgli

# Title search (shortcut for search command)
summary title "A Philosophy of Software Design"
summary title "Clean Code" --force  # Auto-select first result
summary title "Python" --source anna  # Use Anna's Archive

# ISBN lookup (defaults to 1lib.sk)
summary isbn 9780134685991
summary isbn B075HYVHWK --force  # Auto-select and process
summary isbn 9780134685991 --source anna  # Use Anna's Archive

# Common Options:
#   --source <source>              Search source: zlib (1lib.sk, default) or anna (Anna's Archive)
#   -n, --max-results <number>     Maximum results to display (default: 10)
#   -f, --force                    Auto-select first result and process immediately
#
# 1lib.sk Options (--source zlib, default):
#   --year-from <year>             Filter by publication year from (e.g., 2020)
#   --year-to <year>               Filter by publication year to (e.g., 2024)
#   -l, --languages <languages>    Language filter, comma-separated (default: english)
#   -e, --extensions <extensions>  File extensions, comma-separated (case-insensitive, default: PDF)
#   --content-types <types>        Content types, comma-separated (default: book)
#   -s, --order <order>            Sort order: date (newest) or empty for relevance
#   --view <view>                  View type: list or grid (default: list)
#
# Anna's Archive Options (--source anna):
#   -f, --format <format>          Filter by format: pdf, epub, pdf,epub, or all (default: pdf)
#   -s, --sort <sort>              Sort by: date (newest) or empty for relevance (default: '')
#   -l, --language <language>      Language code(s), comma-separated (e.g., en, es, fr) (default: en)
#   --sources <sources>            Data sources, comma-separated (default: all sources)
#                                  Options: zlib, lgli, lgrs, and others

Look up by ISBN/ASIN

summary isbn B075HYVHWK

# Force overwrite if directory already exists
summary isbn B075HYVHWK --force
summary isbn B075HYVHWK -f

Help

summary --help
summary file --help

Programmatic Usage

JSON API Format

All methods now return consistent JSON objects with the following structure:

{
  success: true | false,  // Indicates if operation succeeded
  ...data,                // Method-specific data fields
  error?: string,         // Error message (only when success is false)
  message?: string        // Success message (optional)
}

This enables:

✅ Consistent error handling - Check success field instead of try-catch
✅ REST API ready - Direct JSON responses for HTTP endpoints
✅ Better debugging - Rich metadata in all responses
✅ Type-safe - Predictable structure for TypeScript users

Basic Example

import { SummaryForge } from '@profullstack/summary-forge-module';
import { loadConfig } from '@profullstack/summary-forge-module/config';

// Load config from ~/.config/summary-forge/settings.json
const configResult = await loadConfig();
if (!configResult.success) {
  console.error('Failed to load config:', configResult.error);
  process.exit(1);
}

const forge = new SummaryForge(configResult.config);

const result = await forge.processFile('./my-book.pdf');
if (result.success) {
  console.log('Summary created:', result.archive);
  console.log('Files:', result.files);
  console.log('Costs:', result.costs);
} else {
  console.error('Processing failed:', result.error);
}

Configuration Options

import { SummaryForge } from '@profullstack/summary-forge-module';

const forge = new SummaryForge({
  // Required
  openaiApiKey: 'sk-...',
  
  // Optional API keys
  rainforestApiKey: 'your-key',      // For Amazon search
  elevenlabsApiKey: 'sk-...',        // For audio generation (get key: https://try.elevenlabs.io/oh7kgotrpjnv)
  twocaptchaApiKey: 'your-key',      // For CAPTCHA solving (sign up: https://2captcha.com/?from=9630996)
  browserlessApiKey: 'your-key',     // For browserless.io
  
  // Processing options
  maxChars: 500000,                  // Max chars to process
  maxTokens: 20000,                  // Max tokens in output summary
  maxInputTokens: 250000,            // Max input tokens per API call (default: 250000 for GPT-5)
  
  // Audio options
  voiceId: '21m00Tcm4TlvDq8ikWAM',  // ElevenLabs voice
  voiceSettings: {
    stability: 0.5,
    similarity_boost: 0.75
  },
  
  // Browser options
  headless: true,                    // Run browser in headless mode
  enableProxy: false,                // Enable proxy
  proxyUrl: 'http://proxy.com',     // Proxy URL
  proxyUsername: 'user',             // Proxy username
  proxyPassword: 'pass',             // Proxy password
  proxyPoolSize: 36                  // Number of proxies in pool (default: 36)
});

const result = await forge.processFile('./book.epub');
console.log('Archive:', result.archive);

Search for Books

Using Amazon/Rainforest API

const forge = new SummaryForge({
  openaiApiKey: process.env.OPENAI_API_KEY,
  rainforestApiKey: process.env.RAINFOREST_API_KEY
});

const searchResult = await forge.searchBookByTitle('Clean Code');
if (!searchResult.success) {
  console.error('Search failed:', searchResult.error);
  process.exit(1);
}

console.log(`Found ${searchResult.count} results:`);
console.log(searchResult.results.map(b => ({
  title: b.title,
  author: b.author,
  asin: b.asin
})));

// Get download URL
const url = forge.getAnnasArchiveUrl(searchResult.results[0].asin);
console.log('Download from:', url);

Using Anna's Archive Direct Search (No Rainforest API Required)

const forge = new SummaryForge({
  openaiApiKey: process.env.OPENAI_API_KEY,
  enableProxy: true,
  proxyUrl: process.env.PROXY_URL,
  proxyUsername: process.env.PROXY_USERNAME,
  proxyPassword: process.env.PROXY_PASSWORD
});

// Basic search
const searchResult = await forge.searchAnnasArchive('JavaScript', {
  maxResults: 10,
  format: 'pdf',
  sortBy: 'date'  // Sort by newest
});

if (!searchResult.success) {
  console.error('Search failed:', searchResult.error);
  process.exit(1);
}

console.log(`Found ${searchResult.count} results`);
console.log(searchResult.results.map(r => ({
  title: r.title,
  author: r.author,
  format: r.format,
  size: `${r.sizeInMB.toFixed(1)} MB`,
  url: r.url
})));

// Download the first result
if (searchResult.results.length > 0) {
  const md5 = searchResult.results[0].href.match(/\/md5\/([a-f0-9]+)/)[1];
  const downloadResult = await forge.downloadFromAnnasArchive(md5, '.', searchResult.results[0].title);
  
  if (downloadResult.success) {
    console.log('Downloaded:', downloadResult.filepath);
    console.log('Directory:', downloadResult.directory);
  } else {
    console.error('Download failed:', downloadResult.error);
  }
}

Using 1lib.sk Search (Faster, No DDoS Protection)

const forge = new SummaryForge({
  openaiApiKey: process.env.OPENAI_API_KEY,
  enableProxy: true,
  proxyUrl: process.env.PROXY_URL,
  proxyUsername: process.env.PROXY_USERNAME,
  proxyPassword: process.env.PROXY_PASSWORD
});

// Basic search
const searchResult = await forge.search1lib('LLM Fine Tuning', {
  maxResults: 10,
  yearFrom: 2020,
  languages: ['english'],
  extensions: ['PDF']
});

if (!searchResult.success) {
  console.error('Search failed:', searchResult.error);
  process.exit(1);
}

console.log(`Found ${searchResult.count} results`);
console.log(searchResult.results.map(r => ({
  title: r.title,
  author: r.author,
  year: r.year,
  extension: r.extension,
  size: r.size,
  language: r.language,
  isbn: r.isbn,
  url: r.url
})));

// Download the first result
if (searchResult.results.length > 0) {
  const downloadResult = await forge.downloadFrom1lib(
    searchResult.results[0].url,
    '.',
    searchResult.results[0].title
  );
  
  if (downloadResult.success) {
    console.log('Downloaded:', downloadResult.filepath);
    
    // Process the downloaded book
    const processResult = await forge.processFile(downloadResult.filepath, downloadResult.identifier);
    if (processResult.success) {
      console.log('Summary created:', processResult.archive);
      console.log('Costs:', processResult.costs);
    } else {
      console.error('Processing failed:', processResult.error);
    }
  } else {
    console.error('Download failed:', downloadResult.error);
  }
}

Enhanced Error Handling:

The 1lib.sk download functionality includes robust error handling with automatic debugging:

Multiple Selector Fallbacks: Tries 6 different selectors to find download buttons
Debug HTML Capture: Saves page HTML when download button isn't found
Link Analysis: Lists all links on the page for troubleshooting
Detailed Error Messages: Provides actionable information for debugging

If a download fails, check the debug-book-page.html file in the book's directory for detailed page structure information.

API Reference

Constructor Options

new SummaryForge({
  // API Keys
  openaiApiKey: string,      // Required: OpenAI API key
  rainforestApiKey: string,  // Optional: For title search
  elevenlabsApiKey: string,  // Optional: For audio generation
  twocaptchaApiKey: string,  // Optional: For CAPTCHA solving
  browserlessApiKey: string, // Optional: For browserless.io
  
  // Processing Options
  maxChars: number,          // Optional: Max chars to process (default: 400000)
  maxTokens: number,         // Optional: Max tokens in output summary (default: 16000)
  maxInputTokens: number,    // Optional: Max input tokens per API call (default: 250000 for GPT-5)
  
  // Audio Options
  voiceId: string,           // Optional: ElevenLabs voice ID (default: Brian)
  voiceSettings: object,     // Optional: Voice customization settings
  
  // Browser Options
  headless: boolean,         // Optional: Run browser in headless mode (default: true)
  enableProxy: boolean,      // Optional: Enable proxy (default: false)
  proxyUrl: string,          // Optional: Proxy URL
  proxyUsername: string,     // Optional: Proxy username
  proxyPassword: string,     // Optional: Proxy password
  proxyPoolSize: number      // Optional: Number of proxies in pool (default: 36)
})

Methods

All methods return JSON objects with { success, ...data, error?, message? } format.

Processing Methods

processFile(filePath, asin?) - Process a PDF or EPUB file

Returns: { success, basename, markdown, files, archive, hasAudio, asin, costs, message, error? }

Example:

const result = await forge.processFile('./book.pdf');
if (result.success) {
  console.log('Archive:', result.archive);
  console.log('Costs:', result.costs);
}

processWebPage(url, outputDir?) - Process a web page URL
- Returns: { success, basename, dirName, markdown, files, directory, archive, hasAudio, url, title, costs, message, error? }
- Example:
```
const result = await forge.processWebPage('https://example.com/article');
if (result.success) {
  console.log('Summary:', result.markdown.substring(0, 100));
}
```

Search Methods

searchBookByTitle(title) - Search Amazon using Rainforest API

Returns: { success, results, count, query, message, error? }

Example:

const result = await forge.searchBookByTitle('Clean Code');
if (result.success) {
  console.log(`Found ${result.count} books`);
}

searchAnnasArchive(query, options?) - Search Anna's Archive directly

Returns: { success, results, count, query, options, message, error? }

Example:

const result = await forge.searchAnnasArchive('JavaScript', {
  maxResults: 10,
  format: 'pdf',
  sortBy: 'date'
});
if (result.success) {
  console.log(`Found ${result.count} results`);
}

search1lib(query, options?) - Search 1lib.sk
- Returns: { success, results, count, query, options, message, error? }

Download Methods

downloadFromAnnasArchive(asin, outputDir?, bookTitle?) - Download from Anna's Archive
- Returns: { success, filepath, directory, asin, format, message, error? }
- Example:
```
const result = await forge.downloadFromAnnasArchive('B075HYVHWK', '.');
if (result.success) {
  console.log('Downloaded to:', result.filepath);
}
```
downloadFrom1lib(bookUrl, outputDir?, bookTitle?, downloadUrl?) - Download from 1lib.sk
- Returns: { success, filepath, directory, title, format, message, error? }
search1libAndDownload(query, searchOptions?, outputDir?, selectCallback?) - Search and download in one session
- Returns: { success, results, download, message, error? }

Generation Methods

generateSummary(pdfPath) - Generate AI summary from PDF
- Returns: { success, markdown, length, method, chunks?, message, error? }
- Methods: gpt5_pdf_upload, text_extraction_single, text_extraction_chunked
- Example:
```
const result = await forge.generateSummary('./book.pdf');
if (result.success) {
  console.log(`Generated ${result.length} char summary using ${result.method}`);
}
```
generateAudioScript(markdown) - Generate audio-friendly narration script
- Returns: { success, script, length, message }
generateAudio(text, outputPath) - Generate audio using ElevenLabs TTS
- Returns: { success, path, size, duration, message, error? }
generateOutputFiles(markdown, basename, outputDir) - Generate all output formats
- Returns: { success, files: {...}, message }

Utility Methods

convertEpubToPdf(epubPath) - Convert EPUB to PDF
- Returns: { success, pdfPath, originalPath, message, error? }
createBundle(files, archiveName) - Create tar.gz archive
- Returns: { success, path, files, message, error? }
getCostSummary() - Get cost tracking information
- Returns: { success, openai, elevenlabs, rainforest, total, breakdown }

Configuration

CLI Configuration (Recommended)

For CLI usage, run the setup command to configure your API keys:

summary setup

This saves your configuration to ~/.config/summary-forge/settings.json so you don't need to manage environment variables.

Environment Variables (Alternative)

For programmatic usage or if you prefer environment variables, create a .env file:

OPENAI_API_KEY=sk-your-key-here
RAINFOREST_API_KEY=your-key-here
ELEVENLABS_API_KEY=sk-your-key-here  # Optional: for audio generation
TWOCAPTCHA_API_KEY=your-key-here      # Optional: for CAPTCHA solving
BROWSERLESS_API_KEY=your-key-here     # Optional

# Browser Configuration
HEADLESS=true                          # Run browser in headless mode
ENABLE_PROXY=false                     # Enable proxy for browser requests
PROXY_URL=http://proxy.example.com    # Proxy URL (if enabled)
PROXY_USERNAME=username                # Proxy username (if enabled)
PROXY_PASSWORD=password                # Proxy password (if enabled)
PROXY_POOL_SIZE=36                     # Number of proxies in your pool (default: 36)

Or set them in your shell:

export OPENAI_API_KEY=sk-your-key-here
export RAINFOREST_API_KEY=your-key-here
export ELEVENLABS_API_KEY=sk-your-key-here  # Optional

Configuration Priority

When using the module programmatically, configuration is loaded in this order (highest priority first):

Constructor options - Passed directly to new SummaryForge(options)
Environment variables - From .env file or shell
Config file - From ~/.config/summary-forge/settings.json (CLI only)

Proxy Configuration (Recommended for Anna's Archive)

To avoid IP bans when downloading from Anna's Archive, configure a proxy during setup:

summary setup

When prompted:

Enable proxy: Yes
Enter proxy URL: http://your-proxy.com:8080
Enter proxy username and password

Why use a proxy?

✅ Avoids IP bans from Anna's Archive
✅ USA-based proxies prevent geo-location issues
✅ Works with both browser navigation and file downloads
✅ Automatically applied to all download operations

Recommended Proxy Service:

We recommend Webshare.io for reliable, USA-based proxies:

🌎 USA-based IPs (no geo-location issues)
⚡ Fast and reliable
💰 Affordable pricing with free tier
🔒 HTTP/HTTPS/SOCKS5 support

Important: Use Static Proxies for Sticky Sessions

For Anna's Archive downloads, you need a static/direct proxy (not rotating) to maintain the same IP:

In your Webshare dashboard, go to Proxy → List
Copy a Static Proxy endpoint (not the rotating endpoint)
Use the format: http://host:port (e.g., http://45.95.96.132:8080)
Username format: dmdgluqz-US-{session_id} (session ID added automatically)

The tool automatically generates a unique session ID (1 to PROXY_POOL_SIZE) for each download to get a fresh IP, while maintaining that IP throughout the 5-10 minute download process.

Proxy Pool Size Configuration:

Set PROXY_POOL_SIZE to match your Webshare plan (default: 36):

Free tier: 10 proxies → PROXY_POOL_SIZE=10
Starter plan: 25 proxies → PROXY_POOL_SIZE=25
Professional plan: 100 proxies → PROXY_POOL_SIZE=100
Enterprise plan: 250+ proxies → PROXY_POOL_SIZE=250

The tool will randomly select a session ID from 1 to your pool size, distributing load across all available proxies.

Smart ISBN Detection:

When searching Anna's Archive, the tool automatically detects whether an identifier is a real ISBN or an Amazon ASIN:

Real ISBNs (10 or 13 numeric digits): Searches by ISBN for precise results
Amazon ASINs (alphanumeric): Searches by book title instead for better results
This ensures you get relevant search results even when Amazon returns proprietary ASINs instead of standard ISBNs

Note: Rotating proxies (p.webshare.io) don't support sticky sessions. Use individual static proxy IPs from your proxy list instead.

Testing your proxy:

node test-proxy.js <ASIN>

This will verify your proxy configuration by attempting to download a book.

Audio Generation

Audio generation is optional and requires an ElevenLabs API key. If the key is not provided, the tool will skip audio generation and only create text-based outputs.

Get ElevenLabs API Key: Sign up here for high-quality text-to-speech.

Features:

Uses ElevenLabs Turbo v2.5 model (optimized for audiobooks)
Default voice: Brian (best for technical content, customizable)
Automatically truncates long texts to fit API limits
Generates high-quality MP3 audio files
Natural, conversational narration style

Output

The tool generates:

<book_name>_summary.md - Markdown summary
<book_name>_summary.txt - Plain text summary
<book_name>_summary.pdf - PDF summary with table of contents
<book_name>_summary.epub - EPUB summary with clickable TOC
<book_name>_summary.mp3 - Audio summary (if ElevenLabs key provided)
<book_name>.pdf - Original or converted PDF
<book_name>.epub - Original EPUB (if input was EPUB)
<book_name>_bundle.tgz - Compressed archive containing all files

Example Workflow

# 1. Search for a book
summary search
# Enter: "A Philosophy of Software Design"
# Select from results, get ASIN

# 2. Download and process automatically
summary isbn B075HYVHWK
# Downloads, asks if you want to process
# Creates summary bundle automatically!

# Alternative: Process a local file
summary file ~/Downloads/book.epub

How It Works

Input Processing: Accepts PDF or EPUB files (EPUB is converted to PDF)
Smart Processing Strategy:
- Small PDFs (<400k chars): Direct upload to OpenAI's vision API
- Large PDFs (>400k chars): Intelligent chunking with synthesis
AI Summarization: GPT-5 analyzes content with full formatting, tables, and diagrams
Format Conversion: Uses Pandoc to convert the Markdown summary to PDF and EPUB
Audio Generation: Optional TTS conversion using ElevenLabs
Bundling: Creates a compressed archive with all generated files

Intelligent Chunking for Large PDFs

For PDFs exceeding 400,000 characters (typically 500+ pages), the tool automatically uses an intelligent chunking strategy:

How it works:

Analysis: Calculates optimal chunk size based on PDF statistics and GPT-5's token limits
Smart Token Management: Respects GPT-5's 272k input token limit with safety margins
Page-Based Chunking: Splits PDF into logical chunks that fit within token limits
Parallel Processing: Each chunk is summarized independently by GPT-5
Intelligent Synthesis: All chunk summaries are combined into a cohesive final summary
Quality Preservation: Maintains narrative flow and eliminates redundancy

Token Limit Handling:

GPT-5 Input Limit: 272,000 tokens
System Overhead: 20,000 tokens reserved for prompts and instructions
Available Tokens: 250,000 tokens for content
Safety Margin: 70% utilization to account for token estimation variance
Chunk Size: ~565,000 characters per chunk (based on 3.5 chars/token estimate)

Benefits:

✅ Complete Coverage: Processes entire books without truncation
✅ High Quality: Each section gets full AI attention
✅ Seamless Output: Final summary reads as a unified document
✅ Cost Efficient: Optimizes token usage across multiple API calls
✅ Automatic: No configuration needed - works transparently
✅ Token-Aware: Respects API limits to prevent errors

Example Output:

📊 PDF Stats: 523 pages, 1,245,678 chars, ~311,420 tokens
📚 PDF is large - using intelligent chunking strategy
   This will process the ENTIRE 523-page PDF without truncation
📐 Using chunk size: 120,000 chars
📦 Created 11 chunks for processing
   Chunk 1: Pages 1-48 (119,234 chars)
   Chunk 2: Pages 49-95 (118,901 chars)
   ...
✅ All 11 chunks processed successfully
🔄 Synthesizing chunk summaries into final comprehensive summary...
✅ Final summary synthesized: 45,678 characters

Why Direct PDF Upload?

The tool prioritizes OpenAI's vision API for direct PDF upload when possible:

✅ Better Quality: Preserves document formatting, tables, and diagrams
✅ More Accurate: AI can see the actual PDF layout and structure
✅ Better for Technical Books: Code examples and diagrams are preserved
✅ Fallback Strategy: Automatically switches to intelligent chunking for large files

Testing

Summary Forge includes a comprehensive test suite using Vitest.

Run Tests

# Run all tests
pnpm test

# Run tests in watch mode
pnpm test:watch

# Run tests with coverage report
pnpm test:coverage

Test Coverage

The test suite includes:

✅ 30+ passing tests
Constructor validation
Helper method tests
PDF upload functionality tests
API integration tests
Error handling tests
Edge case coverage
File operation tests

See test/summary-forge.test.js for the complete test suite.

Flashcard Generation

Summary Forge includes powerful flashcard generation capabilities for study and review.

Printable PDF Flashcards

Generate double-sided flashcard PDFs optimized for printing:

import { extractFlashcards, generateFlashcardsPDF } from '@profullstack/summary-forge-module/flashcards';
import fs from 'node:fs/promises';

// Read your markdown summary
const markdown = await fs.readFile('./book_summary.md', 'utf-8');

// Extract Q&A pairs
const extractResult = extractFlashcards(markdown, { maxCards: 50 });
console.log(`Extracted ${extractResult.count} flashcards`);

// Generate printable PDF
const pdfResult = await generateFlashcardsPDF(
  extractResult.flashcards,
  './flashcards.pdf',
  {
    title: 'JavaScript Fundamentals',
    branding: 'SummaryForge.com',
    cardWidth: 3.5,   // inches
    cardHeight: 2.5,  // inches
    fontSize: 11
  }
);

console.log(`PDF created: ${pdfResult.path}`);
console.log(`Total pages: ${pdfResult.pages}`);

Individual Flashcard Images

Generate individual PNG images for each flashcard, perfect for web applications:

import { extractFlashcards, generateFlashcardImages } from '@profullstack/summary-forge-module/flashcards';
import fs from 'node:fs/promises';

// Read your markdown summary
const markdown = await fs.readFile('./book_summary.md', 'utf-8');

// Extract Q&A pairs
const extractResult = extractFlashcards(markdown);

// Generate individual PNG images
const imageResult = await generateFlashcardImages(
  extractResult.flashcards,
  './flashcards',  // Output directory
  {
    title: 'JavaScript Fundamentals',
    branding: 'SummaryForge.com',
    width: 800,   // pixels
    height: 600,  // pixels
    fontSize: 24
  }
);

if (imageResult.success) {
  console.log(`Generated ${imageResult.images.length} images`);
  console.log('Files:', imageResult.images);
  // Output: ['./flashcards/q-001.png', './flashcards/a-001.png', ...]
}

Image Naming Convention:

q-001.png, q-002.png, etc. - Question cards
a-001.png, a-002.png, etc. - Answer cards

Use Cases:

🌐 Web-based flashcard applications
📱 Mobile learning apps
🎮 Interactive quiz games
📊 Study progress tracking systems
🔄 Spaced repetition software

Features:

✅ Clean, professional design with book title
✅ Automatic text wrapping for long content
✅ Customizable dimensions and styling
✅ SVG-based rendering for crisp quality
✅ Works in Docker (no native dependencies)

Flashcard Extraction Formats

The extractFlashcards function supports multiple markdown formats:

1. Explicit Q&A Format:

**Q: What is a closure?**
A: A closure is a function that has access to variables in its outer scope.

2. Definition Lists:

**Closure**
: A function that has access to variables in its outer scope.

3. Question Headers:

### What is a closure?

A closure is a function that has access to variables in its outer scope.

Examples

See the examples/ directory for more usage examples:

programmatic-usage.js - Using as a module
flashcard-images-demo.js - Generating flashcard images

Troubleshooting

Rate Limiting (1lib.sk)

If you encounter "Too many requests" errors from 1lib.sk:

Error Message:

Too many requests from your IP xxx.xxx.xxx.xxx
Please wait 10 seconds. [email protected]. Err #ipd1

Automatic Handling: The tool automatically detects rate limiting and:

✅ Waits the requested time (usually 10 seconds)
✅ Retries up to 3 times with exponential backoff
✅ Adds a 2-second buffer to ensure rate limit has cleared

Manual Solutions:

Wait a few minutes before trying again
Use a different proxy session (the tool rotates through your proxy pool automatically)
Switch to Anna's Archive: summary search "book title" --source anna
Reduce concurrent requests if running multiple downloads

Note: The proxy pool helps distribute requests across different IPs, reducing rate limiting issues.

Download Button Not Found (1lib.sk)

If you encounter "Download button not found" errors when downloading from 1lib.sk:

Check Debug Files: The tool automatically saves debug-book-page.html in the book's directory
- Open this file to inspect the actual page structure
- Look for download links or buttons that might have different selectors
Review Error Output: The error message includes:
- All selectors that were tried
- List of links found on the page
- Location of the debug HTML file
Common Causes:
- Z-Access/Library Access Page: Book page redirects to authentication page (most common)
- Page structure changed (1lib.sk updates their site)
- Book is deleted or unavailable
- Session expired or cookies not maintained
- Proxy issues preventing proper page load
Solutions:
- Recommended: Use Anna's Archive instead: summary search "book title" --source anna
- Try the search1lib command separately to verify the book exists
- Check if the book page loads correctly in a regular browser with the same proxy
- Verify proxy configuration is working correctly
- Try a different book from search results
Known Issue - Z-Access Page: If you see links to library-access.sk or Z-Access page in the debug output, this means:
- The book page requires authentication or special access
- 1lib.sk's session management is blocking automated access
- Workaround: Use Anna's Archive which has better automation support

Example Debug Output (Z-Access Issue):

❌ Download button not found on book page
   Debug HTML saved to: ./uploads/book_name/debug-book-page.html
   Found 6 links on page
   First 5 links:
   - https://library-access.sk (Z-Access page)
   - mailto:[email protected] ([email protected])
   - https://www.reddit.com/r/zlibrary (https://www.reddit.com/r/zlibrary)

Recommended Alternative:

# Use Anna's Archive instead (more reliable for automation)
summary search "prompt engineering" --source anna

IP Bans from Anna's Archive

If you're getting blocked by Anna's Archive:

Enable proxy in your configuration:
```
summary setup
```
Use a USA-based proxy to avoid geo-location issues
Test your proxy before downloading:
```
node test-proxy.js B0BCTMXNVN
```
Run browser in visible mode to debug:
```
summary config --headless false
```

Proxy Configuration

The proxy is used for:

✅ Browser navigation (Puppeteer)
✅ File downloads (fetch with https-proxy-agent)
✅ All HTTP requests to Anna's Archive

Supported proxy formats:

http://proxy.example.com:8080
https://proxy.example.com:8080
socks5://proxy.example.com:1080
http://proxy.example.com:8080-session-<SESSION_ID> (sticky session)

Recommended Service: Webshare.io - Reliable USA-based proxies with free tier available.

Webshare Sticky Sessions: Add -session-<YOUR_SESSION_ID> to your proxy URL to maintain the same IP:

http://p.webshare.io:80-session-myapp123

CAPTCHA Solving

When downloading from Anna's Archive, you may encounter CAPTCHAs. To automatically solve them:

Sign up for 2Captcha: Get API key here
Add to configuration:
```
summary setup
```
Enter your 2Captcha API key when prompted

The tool will automatically detect and solve CAPTCHAs during downloads, making the process fully automated.

Limitations

Maximum PDF file size: No practical limit (intelligent chunking handles any size)
GPT-5 uses default temperature of 1 (not configurable)
Requires external tools: Calibre, Pandoc, XeLaTeX
CAPTCHA solving requires 2captcha.com API key (optional)
Very large PDFs (1000+ pages) may incur higher API costs due to multiple chunk processing
Anna's Archive may block IPs without proxy configuration
Chunked processing uses text extraction (images/diagrams described in text only)

Roadmap

[x] ISBN/ASIN lookup via Anna's Archive
[x] Automatic download from Anna's Archive with CAPTCHA solving
[x] Book title search via Rainforest API
[x] CLI with interactive mode
[x] ESM module for programmatic use
[x] Audio generation with ElevenLabs TTS
[x] Direct PDF upload to OpenAI vision API
[x] EPUB format prioritization (open standard)
[ ] Support for more input formats (MOBI, AZW3)
[ ] Chunked processing for very large books (>100MB)
[ ] Custom summary templates
[ ] Web interface
[ ] Multiple voice options for audio
[ ] Audio chapter markers
[ ] Batch processing multiple books

License

ISC

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

Summary Forge Module

Features

Installation

Global Installation (CLI)

Local Installation (Module)

Prerequisites

CLI Usage

First-Time Setup

Managing Configuration

Interactive Mode (Recommended)

Process a File

Process a Web Page URL

Search by Title

Look up by ISBN/ASIN

Help

Programmatic Usage

JSON API Format

Basic Example

Configuration Options

Search for Books

Using Amazon/Rainforest API

Using Anna's Archive Direct Search (No Rainforest API Required)

Using 1lib.sk Search (Faster, No DDoS Protection)

API Reference

Constructor Options

Methods

Processing Methods

Search Methods

Download Methods

Generation Methods

Utility Methods

Configuration

CLI Configuration (Recommended)

Environment Variables (Alternative)

Configuration Priority

Proxy Configuration (Recommended for Anna's Archive)

Audio Generation

Output

Example Workflow

How It Works

Intelligent Chunking for Large PDFs

Why Direct PDF Upload?

Testing

Run Tests

Test Coverage

Flashcard Generation

Printable PDF Flashcards

Individual Flashcard Images

Flashcard Extraction Formats

Examples

Troubleshooting

Rate Limiting (1lib.sk)

Download Button Not Found (1lib.sk)

IP Bans from Anna's Archive

Proxy Configuration

CAPTCHA Solving

Limitations

Roadmap

License

Contributing