panini-scraper
v1.2.0
Published
A TypeScript library for scraping Panini Brasil product information
Maintainers
Readme
🧩 Panini Scraper
A powerful and elegant TypeScript library for scraping product information from Panini Brasil website. Built with Clean Architecture principles, comprehensive testing, and full TypeScript support.
🚀 Features
- Batch Scraping: Scrape multiple products in a single call with automatic error handling
- Clean Architecture: Organized codebase following SOLID principles and separation of concerns
- TypeScript First: Full type safety with comprehensive TypeScript definitions
- Well Tested: 95%+ test coverage with unit and integration tests
- High Performance: Efficient HTML parsing with Cheerio and HTTP requests with Axios
- Flexible Configuration: Configurable HTTP client with proxy support and custom headers
- Detailed Error Handling: Specific error types for different failure scenarios with partial results support
- Production Ready: Battle-tested with proper CI/CD pipeline
- Zero Dependencies on Runtime: Only requires Axios and Cheerio
📦 Installation
# NPM
npm install panini-scraper
# Yarn
yarn add panini-scraper
# pnpm
pnpm add panini-scraper🔧 Quick Start
Basic Usage
import { scrapePaniniProduct } from 'panini-scraper';
async function main() {
try {
const product = await scrapePaniniProduct('https://panini.com.br/wolverine-2025-05');
console.log(`Title: ${product.title}`);
console.log(`Price: R$ ${product.currentPrice}`);
console.log(`In Stock: ${product.inStock}`);
console.log(`Pre-order: ${product.isPreOrder}`);
} catch (error) {
console.error('Scraping failed:', error.message);
}
}
main();Batch Scraping (Multiple Products)
import { scrapePaniniProducts } from 'panini-scraper';
// Scrape multiple products with automatic error handling
const result = await scrapePaniniProducts([
'https://panini.com.br/wolverine-2025-05',
'https://panini.com.br/a-fabulosa-x-force',
'https://panini.com.br/batman-dark-knight'
]);
// Access results with success/failure separation
console.log(`✅ Successfully scraped: ${result.successCount}/${result.totalProcessed}`);
// Process successful results
result.successes.forEach(({ url, product }) => {
console.log(`${product.title}: R$ ${product.currentPrice}`);
});
// Handle failures gracefully
result.failures.forEach(({ url, message }) => {
console.error(`❌ Failed to scrape ${url}: ${message}`);
});Scraping Multiple Products (Advanced)
import { createPaniniScraper } from 'panini-scraper';
const scraper = createPaniniScraper({ timeout: 5000 });
// Efficiently scrape multiple products with the same configuration using Promise.all
const products = await Promise.all([
scraper('https://panini.com.br/wolverine-2025-05'),
scraper('https://panini.com.br/a-fabulosa-x-force'),
scraper('https://panini.com.br/batman-dark-knight')
]);
products.forEach(product => {
console.log(`${product.title}: R$ ${product.currentPrice}`);
});With Custom Configuration
import { scrapePaniniProduct } from 'panini-scraper';
const config = {
timeout: 15000,
headers: {
'User-Agent': 'MyApp/1.0'
},
proxy: {
host: 'proxy.example.com',
port: 8080,
auth: {
username: 'user',
password: 'pass'
}
}
};
const product = await scrapePaniniProduct(
'https://panini.com.br/spider-man',
config
);📊 Response Formats
Single Product Response
The library returns a Product object with the following structure:
interface Product {
title: string; // Product title
fullPrice: number; // Original price in BRL
currentPrice: number; // Current/discounted price in BRL
isPreOrder: boolean; // Whether product is in pre-order
inStock: boolean; // Stock availability
imageUrl: string; // Main product image URL
url: string; // Product page URL
format: string; // Product format (e.g., "Capa dura", "Brochura")
contributors: string[]; // List of authors, artists, translators, etc.
id: string; // Product identifier/SKU
}Example Single Product Response
{
"title": "Crise Final (Grandes Eventos DC)",
"fullPrice": 89.90,
"currentPrice": 89.90,
"isPreOrder": false,
"inStock": true,
"imageUrl": "https://d3ugyf2ht6aenh.cloudfront.net/stores/916/977/products/crise-final.jpg",
"url": "https://panini.com.br/crise-final-grandes-eventos-dc",
"format": "Capa dura",
"contributors": [
"Carlos Pacheco",
"Doug Mahnke",
"Grant Morrison",
"J.G. Jones",
"Matthew Clark"
],
"id": "AGECF001"
}Batch Scraping Response
The scrapePaniniProducts function returns a BatchScrapeResult with the following structure:
interface BatchScrapeResult {
successes: ScrapedProduct[]; // Successfully scraped products
failures: FailedProduct[]; // Failed scraping attempts
totalProcessed: number; // Total URLs processed
successCount: number; // Number of successes
failureCount: number; // Number of failures
}
interface ScrapedProduct {
url: string; // The URL that was scraped
product: Product; // The scraped product data
}
interface FailedProduct {
url: string; // The URL that failed
error: ProductScrapingError; // The error object
message: string; // Error message
}Example Batch Response
{
"successes": [
{
"url": "https://panini.com.br/wolverine-2025-05",
"product": {
"title": "Wolverine #05",
"fullPrice": 8.90,
"currentPrice": 8.90,
"isPreOrder": false,
"inStock": true,
"imageUrl": "https://...",
"url": "https://panini.com.br/wolverine-2025-05",
"format": "Brochura",
"contributors": ["Benjamin Percy", "Adam Kubert"],
"id": "WOL05"
}
}
],
"failures": [
{
"url": "https://panini.com.br/invalid-product",
"message": "Product not found or page structure has changed",
"error": { /* ProductNotFoundError object */ }
}
],
"totalProcessed": 2,
"successCount": 1,
"failureCount": 1
}🏗️ Advanced Usage
Batch Scraping with Configuration
import { scrapePaniniProducts } from 'panini-scraper';
const config = {
timeout: 15000,
headers: {
'User-Agent': 'MyApp/1.0'
}
};
const urls = [
'https://panini.com.br/wolverine-2025-05',
'https://panini.com.br/spider-man',
'https://panini.com.br/batman'
];
const result = await scrapePaniniProducts(urls, config);
// Process results based on success/failure
if (result.successCount > 0) {
console.log(`✅ Successfully scraped ${result.successCount} products`);
result.successes.forEach(({ product }) => {
console.log(`- ${product.title}: R$ ${product.currentPrice}`);
});
}
if (result.failureCount > 0) {
console.log(`\n❌ Failed to scrape ${result.failureCount} products`);
result.failures.forEach(({ url, message, error }) => {
console.log(`- ${url}`);
console.log(` Reason: ${message}`);
console.log(` Error type: ${error.name}`);
});
}Using Clean Architecture Components
For advanced use cases, you can use the underlying Clean Architecture components directly:
import {
ScrapeProductUseCase,
PaniniScraperService,
HttpConfig
} from 'panini-scraper';
// Create your own configured instance
const config: HttpConfig = {
timeout: 15000,
userAgent: 'MyApp/1.0'
};
const scraperService = new PaniniScraperService(config);
const useCase = new ScrapeProductUseCase(scraperService);
// Single product scraping
const product = await useCase.execute('https://panini.com.br/spider-man');
// Batch scraping
const result = await useCase.executeMany([
'https://panini.com.br/wolverine',
'https://panini.com.br/x-men'
]);Error Handling
Single Product Error Handling
The library provides specific error types for different scenarios:
import {
scrapePaniniProduct,
InvalidUrlError,
ProductNotFoundError,
ProductScrapingError
} from 'panini-scraper';
try {
const product = await scrapePaniniProduct(url);
console.log(product);
} catch (error) {
if (error instanceof InvalidUrlError) {
// URL is invalid or not from Panini Brasil
console.error('Invalid URL:', error.url);
} else if (error instanceof ProductNotFoundError) {
// Product not found or page structure changed
console.error('Product not found at:', error.url);
} else if (error instanceof ProductScrapingError) {
// General scraping error (network, parsing, etc.)
console.error('Scraping failed:', error.message);
console.error('Status code:', error.statusCode);
}
}Batch Scraping Error Handling
Batch scraping never throws errors. Instead, it returns a result with successes and failures:
import { scrapePaniniProducts, InvalidUrlError, ProductNotFoundError } from 'panini-scraper';
const result = await scrapePaniniProducts([
'https://panini.com.br/valid-product',
'https://panini.com.br/invalid-product',
'not-a-valid-url'
]);
// Batch scraping automatically categorizes errors
result.failures.forEach(({ url, error, message }) => {
if (error instanceof InvalidUrlError) {
console.error(`Invalid URL format: ${url}`);
} else if (error instanceof ProductNotFoundError) {
console.error(`Product not found: ${url}`);
} else {
console.error(`Scraping failed for ${url}: ${message}`);
}
});
// You can also check the overall success rate
const successRate = (result.successCount / result.totalProcessed) * 100;
console.log(`Success rate: ${successRate.toFixed(2)}%`);Working with Product Entity
The library also exports the ProductEntity class with additional utility methods:
import { ProductEntity } from 'panini-scraper';
// ProductEntity provides computed properties
const entity = new ProductEntity(
'Wolverine #05',
8.90, // fullPrice
5.90, // currentPrice
false, // isPreOrder
true, // inStock
'https://example.com/image.jpg',
'https://panini.com.br/wolverine-05',
'WOL05'
);
console.log(entity.hasDiscount); // true
console.log(entity.discountPercentage); // 34 (rounded)
console.log(entity.savingsAmount); // 3.00
console.log(entity.toJSON()); // Plain object⚙️ Configuration Options
interface HttpConfig {
/** Request timeout in milliseconds (default: 10000) */
timeout?: number;
/** Custom HTTP headers */
headers?: Record<string, string>;
/** Custom user agent string */
userAgent?: string;
/** Proxy configuration */
proxy?: {
host: string;
port: number;
auth?: {
username: string;
password: string;
};
};
}🧪 Testing
The library has comprehensive test coverage:
# Run all tests
npm test
# Run tests with coverage report
npm run test:coverage
# Run tests in watch mode
npm run test:watchTest Coverage
The library maintains high test coverage standards:
- Statements: 95%+
- Branches: 85%+
- Functions: 100%
- Lines: 95%+
🔒 Error Types
| Error Type | Description | When Thrown |
|------------|-------------|-------------|
| InvalidUrlError | Invalid or malformed URL | URL is not from panini.com.br domain |
| ProductNotFoundError | Product not found | Page structure changed or product doesn't exist |
| ProductScrapingError | General scraping error | Network issues, parsing errors, etc. |
📈 Performance
- Average Response Time: 1-3 seconds per product
- Batch Processing: Sequential processing (prevents overwhelming the server)
- Concurrent Requests: Supports parallel scraping with Promise.all for advanced use cases
- Memory Usage: ~50MB for typical usage, scales with batch size
- Rate Limiting: Implement your own rate limiting as needed
Performance Tips
- Batch Scraping: Use
scrapePaniniProductsfor multiple URLs - it processes sequentially and handles errors gracefully - Parallel Scraping: For maximum speed (at your own risk), use
Promise.allwithcreatePaniniScraper - Error Recovery: Batch scraping returns partial results, so you don't lose successful scrapes if some URLs fail
🏛️ Architecture
This library follows Clean Architecture principles:
src/
├── domain/ # Business entities and interfaces
│ ├── product.entity.ts
│ └── product.repository.ts
├── usecases/ # Application business rules
│ └── scrapeProduct.usecase.ts
└── infrastructure/ # External implementations
├── httpClient.ts
└── paniniScraper.service.tsLayer Responsibilities
- Domain: Core business logic and entities (no external dependencies)
- Use Cases: Application-specific business rules
- Infrastructure: External concerns (HTTP, parsing, etc.)
🤝 Contributing
We welcome contributions! Please see our contributing guidelines:
How to Contribute
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Make your changes following our coding standards
- Add tests for your changes
- Run the test suite (
npm test) - Ensure test coverage remains above 80%
- Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
Code Quality Standards
- Follow TypeScript best practices
- Write comprehensive JSDoc comments
- Maintain test coverage above 80%
- Use meaningful variable and function names
- Keep functions small and focused
- Run
npm run lint:fixbefore committing
Running Tests Locally
# Install dependencies
npm install
# Run tests
npm test
# Run tests with coverage
npm run test:coverage
# Run linting
npm run lint
# Build the project
npm run build📄 API Reference
Main Functions
scrapePaniniProduct(url, config?)
Scrapes a single product from Panini Brasil.
Parameters:
url(string): Product URLconfig(HttpConfig, optional): HTTP configuration
Returns: Promise<Product>
Throws:
InvalidUrlError: Invalid URLProductNotFoundError: Product not foundProductScrapingError: Scraping failed
scrapePaniniProducts(urls, config?) ✨ NEW
Scrapes multiple products in a single call with automatic error handling.
Parameters:
urls(string[]): Array of product URLsconfig(HttpConfig, optional): HTTP configuration
Returns: Promise<BatchScrapeResult>
Features:
- Sequential Processing: URLs are processed one by one to avoid overwhelming the server
- Partial Results: Returns both successful and failed scrapes
- Never Throws: All errors are captured and returned in the
failuresarray - Detailed Errors: Each failure includes the URL, error object, and message
Example:
const result = await scrapePaniniProducts([
'https://panini.com.br/product-1',
'https://panini.com.br/product-2'
]);
console.log(`Success: ${result.successCount}, Failed: ${result.failureCount}`);createPaniniScraper(config?)
Creates a reusable scraper function.
Parameters:
config(HttpConfig, optional): HTTP configuration
Returns: (url: string) => Promise<Product>
Classes
ScrapeProductUseCase
Use case for scraping products with support for both single and batch operations.
const useCase = new ScrapeProductUseCase(repository);
// Single product
const product = await useCase.execute(url);
// Batch products
const result = await useCase.executeMany([url1, url2, url3]);PaniniScraperService
Infrastructure service implementing the repository interface.
const service = new PaniniScraperService(config);
const product = await service.scrapeProduct(url);ProductEntity
Domain entity with validation and computed properties.
const entity = new ProductEntity(...);
console.log(entity.hasDiscount);
console.log(entity.discountPercentage);🐛 Known Issues
- Some product pages may have different HTML structures
- Image URLs might be placeholders for new products
- Pre-order detection depends on Portuguese text patterns
- Stock status is inferred from page content
📝 Changelog
See CHANGELOG.md for version history and changes.
📄 License
This project is licensed under the MIT License - see the LICENSE file for details.
👨💻 Author
Emanuel Ozorio Dias
- GitHub: @itsManeka
- Email: [email protected]
🙏 Acknowledgments
- Cheerio - Fast, flexible HTML parsing
- Axios - Promise-based HTTP client
- Jest - Delightful JavaScript testing
- TypeScript - JavaScript with syntax for types
📞 Support
If you need help or have questions:
- Check the documentation
- Browse existing issues
- Create a new issue
- For urgent matters: [email protected]
⚖️ Legal Disclaimer
This tool is for educational and personal use only. Please respect Panini Brasil's robots.txt and terms of service. The authors are not responsible for misuse of this library. Always implement appropriate rate limiting and consider the impact on target servers.
Made with ❤️ and TypeScript
