anna-archieve

v1.0.0

Published

7 months ago

A powerful Node.js tool for searching and downloading books from Anna's Archive with Cloudflare bypass

📚 Anna's Archive Scraper

A powerful Node.js tool for searching and downloading books from Anna's Archive with built-in Cloudflare bypass capabilities.

✨ Features

🔍 Smart Search: Search by book title, author, ISBN, or any keyword
📥 Direct Downloads: Get direct download links for books in various formats (PDF, EPUB, DJVU, FB2, MOBI)
🛡️ Cloudflare Bypass: Automatically handles Cloudflare protection
🔄 Retry Mechanism: Built-in retry logic for reliable downloads
🎯 MD5 Hash Support: Direct access using MD5 hashes
📱 CLI Interface: Easy-to-use command line interface
🔧 Programmatic API: Use as a module in your own projects
⚡ Optimized Performance: Efficient scraping with minimal resource usage

🚀 Quick Start

Installation

# Install globally for CLI usage
npm install -g anna-archieve

# Or install locally for your project
npm install anna-archieve

Basic Usage

# Search and download the first result
anna-archieve "The Great Gatsby"

# Search by author and title
anna-archieve "George Orwell 1984"

# Direct download using MD5 hash
anna-archieve --hash a1b2c3d4e5f6789...

# Get help
anna-archieve --help

📖 Detailed Usage

Command Line Interface

# Basic search
node scraper.js "book title or author name"

# Examples
node scraper.js "The Catcher in the Rye"
node scraper.js "Stephen King"
node scraper.js "978-0134685991"  # ISBN search

# Direct hash lookup
node scraper.js --hash MD5_HASH_HERE

# Display help
node scraper.js --help

Programmatic Usage

const AnnasArchiveScraper = require('anna-archieve');

// Initialize scraper
const scraper = new AnnasArchiveScraper({
  headless: true,        // Run in headless mode
  timeout: 30000,        // Request timeout in ms
  retryAttempts: 3,      // Number of retry attempts
  waitTime: 8000         // Wait time for Cloudflare
});

// Search for books
async function searchBooks() {
  try {
    const books = await scraper.searchBooks('The Great Gatsby');
    console.log('Found books:', books);
    
    // books array contains:
    // [
    //   {
    //     title: "The Great Gatsby",
    //     md5: "a1b2c3d4e5f6...",
    //     url: "https://annas-archive.org/md5/a1b2c3d4e5f6..."
    //   }
    // ]
  } catch (error) {
    console.error('Search failed:', error);
  }
}

// Get download link
async function getDownloadLink() {
  try {
    const downloadUrl = await scraper.getDownloadLink('MD5_HASH_HERE');
    if (downloadUrl) {
      console.log('Download URL:', downloadUrl);
    }
  } catch (error) {
    console.error('Download failed:', error);
  }
}

// Search and download in one step
async function downloadBook() {
  try {
    const downloadUrl = await scraper.downloadBook('The Great Gatsby');
    if (downloadUrl) {
      console.log('Ready to download:', downloadUrl);
    }
  } catch (error) {
    console.error('Download failed:', error);
  }
}

🔧 Configuration Options

When initializing the scraper, you can pass various options:

const scraper = new AnnasArchiveScraper({
  headless: true,          // Run browser in headless mode (default: true)
  timeout: 30000,          // Page load timeout in milliseconds (default: 30000)
  retryAttempts: 3,        // Number of retry attempts (default: 3)
  waitTime: 8000,          // Wait time for Cloudflare bypass (default: 8000)
});

📚 API Reference

Class: AnnasArchiveScraper

Constructor

new AnnasArchiveScraper(options)

Parameters:

options (Object, optional): Configuration options

Methods

searchBooks(query)

Search for books on Anna's Archive.

Parameters:

query (string): Search query (title, author, ISBN, etc.)

Returns: Promise - Array of book objects

Example:

const books = await scraper.searchBooks('Machine Learning');

getDownloadLink(md5Hash)

Get download link for a specific book using its MD5 hash.

Parameters:

md5Hash (string): MD5 hash of the book

Returns: Promise<string|null> - Download URL or null if not found

Example:

const url = await scraper.getDownloadLink('a1b2c3d4e5f6...');

downloadBook(query, isHash)

Download a book with built-in retry mechanism.

Parameters:

query (string): Search query or MD5 hash
isHash (boolean, optional): Whether query is an MD5 hash (default: false)

Returns: Promise<string|null> - Download URL or null if failed

Example:

// Search and download
const url = await scraper.downloadBook('The Art of War');

// Direct hash download
const url = await scraper.downloadBook('a1b2c3d4e5f6...', true);

🛠️ Development

Prerequisites

Node.js >= 14.0.0
npm >= 6.0.0

Setup

# Clone the repository
git clone https://github.com/vaibhav1405/anna-archieve.git
cd anna-archieve

# Install dependencies
npm install

# Run the scraper
npm start "your search query"

Available Scripts

npm run start          # Run the scraper
npm run dev           # Run with nodemon for development
npm run lint          # Fix linting issues
npm run lint:check    # Check for linting issues
npm run example       # Run example search
npm run help          # Show help message
npm run clean         # Clean and reinstall dependencies

Project Structure

anna-archieve/
├── scraper.js          # Main scraper class and CLI
├── package.json        # Package configuration
├── README.md          # This file
├── LICENSE            # License file
└── node_modules/      # Dependencies

🚨 Important Notes

Legal Disclaimer

This tool is for educational purposes only. Users are responsible for:

Complying with their local laws and regulations
Respecting copyright and intellectual property rights
Using the tool ethically and responsibly

Rate Limiting

The scraper includes built-in delays to avoid overwhelming servers
Cloudflare protection may cause additional delays
Be respectful of the service and avoid excessive requests

Troubleshooting

Common Issues

Cloudflare blocks requests
- The scraper handles this automatically
- If issues persist, try increasing waitTime option
Timeout errors
- Increase the timeout option
- Check your internet connection
No download links found
- Try different search terms
- Some books may not have available downloads
- Verify the book exists on Anna's Archive
Installation issues
- Ensure Node.js >= 14.0.0 is installed
- Try clearing npm cache: npm cache clean --force

🤝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.

Development Guidelines

Fork the repository
Create a feature branch
Make your changes
Add tests if applicable
Ensure linting passes: npm run lint:check
Submit a pull request

📄 License

This project is licensed under the ISC License - see the LICENSE file for details.

🙏 Acknowledgments

Puppeteer for web automation
puppeteer-real-browser for Cloudflare bypass
Anna's Archive for providing free access to books

📞 Support

If you encounter any issues or have questions:

Check the Issues page
Create a new issue if your problem isn't already reported
Provide detailed information about the error and your environment

🔗 Links

⭐ If you find this tool useful, please consider giving it a star on GitHub!

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

📚 Anna's Archive Scraper

✨ Features

🚀 Quick Start

Installation

Basic Usage

📖 Detailed Usage

Command Line Interface

Programmatic Usage

🔧 Configuration Options

📚 API Reference

Class: AnnasArchiveScraper

Constructor

Methods

searchBooks(query)

getDownloadLink(md5Hash)

downloadBook(query, isHash)

🛠️ Development

Prerequisites

Setup

Available Scripts

Project Structure

🚨 Important Notes

Legal Disclaimer

Rate Limiting

Troubleshooting

Common Issues

🤝 Contributing

Development Guidelines

📄 License

🙏 Acknowledgments

📞 Support

🔗 Links