n8n-nodes-scraping-dog
v0.4.2
Published
A custom n8n node for integrating with ScrapingDog to perform web scraping tasks.
Readme
ScrapingDog Node for n8n
This is a custom n8n node that integrates with the ScrapingDog API, providing web scraping and search capabilities. The node has been converted to use n8n's declarative style for better maintainability and reliability.
Key Highlights
- ✅ 6 Different APIs: URL Scraping, Google Search, Google Images, Google News, Bing Search, and Amazon Search
- ✅ Declarative Style: Modern n8n node implementation for better performance
- ✅ Comprehensive Filtering: Advanced search options for all supported APIs
- ✅ Multi-language Support: Support for multiple languages and regions
- ✅ Rate Limit Handling: Built-in error handling for API limitations
- ✅ TypeScript: Fully typed for better development experience
Features
- 🌐 URL Scraping: Extract content from any webpage
- 🔍 Google Search: Perform Google searches with advanced options
- 🖼️ Google Images: Search Google Images with advanced filtering options
- 📰 Google News: Search Google News with time-based and geographic filtering
- 🔎 Bing Search: Search Bing with customizable filters
- 🛍️ Amazon Search: Search products on Amazon
Directory Structure
nodes/ScrapingDog/
├── ScrapingDog.node.ts # Main node implementation (Declarative Style)
├── ScrapingDog.node.json # Node metadata
├── scrappingDog.svg # Node icon
├── types/
│ └── index.ts # TypeScript interfaces and types
├── resources/
│ ├── index.ts # Resource exports
│ ├── scrapeUrl.ts # Scrape URL resource
│ ├── googleSearch.ts # Google Search resource
│ ├── googleImages.ts # Google Images resource
│ ├── googleNews.ts # Google News resource
│ ├── bingSearch.ts # Bing Search resource
│ ├── amazonSearch.ts # Amazon Search resource
│ └── staticResource.ts # Static data (countries, etc.)Recent Updates
Version 0.4.2 - LinkedIn API Removal & Documentation Updates
- 🗑️ REMOVED: LinkedIn Profile and Jobs APIs - Removed LinkedIn Profile and LinkedIn Jobs functionality
- Simplified Interface: Streamlined resource options for better focus on core web scraping features
- Reduced Dependencies: Cleaner codebase with fewer external integrations
- Enhanced Documentation: Added comprehensive parameter documentation for all remaining APIs
- Updated Examples: Improved usage examples and API response documentation
Version 0.4.1 - Google News API Integration
- 📰 NEW: Google News Search - Added comprehensive Google News API support
- Time-Based Filtering: Filter news by hour, day, week, month, or year
- Geographic Targeting: Country-specific news results with custom domain support
- Language Support: 18 different languages for news results
- Advanced Filtering: Geographic location (UULE), language restrictions, safe search
- Auto-Correct Control: Option to exclude auto-corrected query results
Version 0.4.0 - Google Images API Integration
- 🖼️ NEW: Google Images Search - Added comprehensive Google Images API support
- Advanced Image Filtering: Size, type, color, usage rights, time period filters
- Multi-language Support: 13 different languages for search results
- Safe Search Controls: Strict, moderate, or off filtering options
- Geographic Targeting: Country-specific image search results
Previous Updates
Converted to Declarative Style
- Removed manual HTTP request handling
- Added routing configuration for each resource
- Improved error handling
- Better parameter validation
Resource Improvements
- Added proper parameter descriptions
- Fixed boolean parameter descriptions to start with "Whether"
- Sorted option lists alphabetically
- Improved type safety
API Integration
- Fixed API key handling in requests
- Proper HTML content handling for scraping
- Improved response processing
Usage Examples
1. Scrape URL
{
"resource": "scrapeUrl",
"operation": "get",
"parameters": {
"url": "https://example.com",
"dynamic": false,
"markdown": false,
"premium": false,
"superProxy": false,
"additionalFields": {
"aiQuery": "Extract all product names",
"aiExtractRules": "Find prices and descriptions"
}
}
}2. Google Search
{
"resource": "googleSearch",
"operation": "search",
"parameters": {
"query": "n8n automation",
"advance": true,
"page": "1",
"location": "US",
"results": "10"
}
}3. Google Images
{
"resource": "googleImages",
"operation": "search",
"parameters": {
"query": "sunset landscape",
"results": "20",
"country": "US",
"language": "en",
"imgsz": "l",
"imageType": "photo",
"imageColor": "bw",
"licenses": "f",
"tbs": "qdr:d",
"safe": "1"
}
}4. Google News
{
"resource": "googleNews",
"operation": "search",
"parameters": {
"query": "artificial intelligence technology",
"results": "50",
"additionalFields": {
"country": "us",
"page": 0,
"domain": "google.com",
"language": "en",
"tbs": "qdr:d",
"safe": "active",
"nfpr": false
}
}
}5. Bing Search
{
"resource": "bingSearch",
"operation": "search",
"parameters": {
"query": "web scraping tools",
"page": 1,
"country": "US",
"results": 20,
"filters": "ex1:'answerType:CalculatorPageType'"
}
}6. Amazon Search
{
"resource": "amazonSearch",
"operation": "search",
"parameters": {
"query": "wireless headphones",
"country": "US",
"page": 1,
"domain": "amazon.com",
"postal_code": "10001"
}
}Resource Parameters
Scrape URL
url(Required): Target webpage URLdynamic: Whether to enable JavaScript renderingpremium: Whether to use premium proxysuperProxy: Whether to enable super proxymarkdown: Whether to get response in markdown formatwait: Wait time for dynamic rendering (ms)country: Geolocation for the requestaiQuery: AI-powered data extraction queryaiExtractRules: Custom extraction rules
Google Search
query(Required): Search keywordadvance: Whether to enable advanced featurespage: Result page number (1-10)location: Search location/countryresults: Number of results (10-100)
Google Images
query(Required): Search keyword for imagesresults: Number of results (10-100)country: Country for geotargeting (US, UK, CA, etc.)language: Search language (en, es, fr, de, etc.)imgsz: Filter by size (any, large, medium, icon)imageType: Filter by type (any, face, photo, clipart, lineart, animated)imageColor: Filter by color (any, color, gray, trans, red, orange, yellow, green, teal, blue, purple, pink, white, black, brown)licenses: Filter by usage rights (any, labeled-for-reuse, labeled-for-reuse-with-modification, etc.)tbs: Filter by time (any, hour, day, week, month, year)safe: Safe search level (strict, moderate, off)
Google News
query(Required): Search keyword for news articlesresults: Number of results (1-100)country: Country for geotargeting (us, uk, in, ca, au, de, fr, jp, br, etc.)page: Page number (0 for first page, 1 for second page, etc.)domain: Google domain for local results (e.g., google.co.uk for UK)language: Search language (en, es, fr, de, it, pt, ru, ja, zh-cn, ko, ar, hi, etc.)lr: Language restriction in format lang_{language_code} (e.g., lang_en)uule: Geographic location parameter for tailored resultstbs: Time-based search filter (qdr:h for past hour, qdr:d for past day, qdr:w for past week, qdr:m for past month, qdr:y for past year)safe: Safe search filter (active/off)nfpr: Exclude auto-corrected queries (true to exclude, false to include)
Bing Search
query(Required): Search keywordpage: Page number for paginationcountry: Country for geotargetingfilters: Custom search filtersresults: Number of results to return
Amazon Search
query(Required): Product search keywordcountry: Country for Amazon marketplace (e.g., US, UK, CA, DE, FR, etc.)page: Page number for paginationdomain: Amazon domain (e.g., amazon.com, amazon.co.uk)postal_code: Postal code for location-based results
API Response Examples
Google Images Response
{
"time_taken": 685.692823,
"ads": [],
"images_results": [
{
"title": "Beautiful Sunset Landscape",
"image": "https://encrypted-tbn0.gstatic.com/images?q=tbn:...",
"source": "example.com",
"original": "https://example.com/images/sunset.jpg",
"link": "https://example.com/sunset-photography",
"original_height": 1080,
"original_width": 1920,
"original_size": "245KB",
"know_more_link": "https://www.google.com/search/about-this-image?...",
"is_product": false,
"rank": 1
}
]
}Google News Response
{
"time_taken": 542.123456,
"ads": [],
"news_results": [
{
"rank": 1,
"title": "Breakthrough in Artificial Intelligence Technology",
"link": "https://example-news.com/ai-breakthrough-2024",
"source": "TechNews",
"published_date": "3 hours ago",
"snippet": "Researchers have developed a new AI system that can understand complex human emotions and respond appropriately, marking a significant milestone in artificial intelligence development.",
"thumbnail": "https://encrypted-tbn0.gstatic.com/images?q=tbn:...",
"rank_page": 1
},
{
"rank": 2,
"title": "AI Companies Announce Major Partnership",
"link": "https://business-news.com/ai-partnership",
"source": "Business Daily",
"published_date": "5 hours ago",
"snippet": "Leading technology companies have formed a strategic alliance to accelerate AI research and development, focusing on ethical AI implementation.",
"thumbnail": "https://encrypted-tbn0.gstatic.com/images?q=tbn:...",
"rank_page": 1
}
]
}Installation
For n8n Users
- Install the package in your n8n instance:
npm install n8n-nodes-scraping-dogRestart your n8n instance to load the new node.
The ScrapingDog node will appear in the node palette under the "Transform" category.
Credentials Setup
- Create a ScrapingDog API account at ScrapingDog.com
- Get your API key from the dashboard
- In n8n, create new credentials of type "ScrapingDog API"
- Enter your API key in the credentials configuration
Development
- Install dependencies:
npm install- Build the node:
npm run build- Run with Docker:
make rebuild-and-logsTesting
The node includes proper error handling for:
- Invalid API key
- Rate limiting
- Invalid parameters
- Network errors
- Blocked requests
License
MIT License - see LICENSE file for details
