n8n-nodes-anakin-scraper
v1.1.0
Published
n8n node for Anakin API - web scraping, AI search, and intelligent data extraction
Downloads
99
Maintainers
Readme
n8n-nodes-anakin-scraper
This is an n8n community node that lets you use the Anakin API in your n8n workflows.
Anakin provides powerful web scraping, AI-powered search, and intelligent data extraction capabilities. This node handles all the complexity of job submission and polling automatically.
Features
- 🔐 Simple authentication with API key
- 🌐 Web Scraping: Scrape any website and extract structured data
- 🔍 AI Search: Perform intelligent searches powered by Perplexity AI
- 🤖 Agentic Search: Advanced multi-stage pipeline that searches, scrapes, and extracts structured data automatically
- ⏳ Automatic polling for async operations
- 🎯 Configurable polling intervals and timeouts
- 🌍 Support for country-specific proxy routing
- ♻️ Cache control with force fresh option
Installation
Community Nodes (Recommended)
- Go to Settings > Community Nodes in your n8n instance
- Select Install
- Enter
n8n-nodes-anakin-scraperin the Package name field - Click Install
Manual Installation
Navigate to your n8n installation folder and run:
npm install n8n-nodes-anakin-scraperThen restart n8n.
Setup
1. Configure Credentials
Before using the Anakin node, you need to set up your API credentials:
- In n8n, go to Credentials > New
- Search for Anakin API
- Fill in:
- API Key: Your Anakin Scraper API authentication token
- Base URL: The API endpoint (default:
https://api.anakin.io)
- Click Save
2. Use in Workflow
- Add the Anakin node to your workflow
- Connect it to your trigger or previous node
- Select your credentials
- Enter the URL you want to scrape
- (Optional) Configure additional options
Usage
The Anakin node supports three powerful operations:
1. Scrape URL
Extract content and structured data from any website.
Trigger → Anakin (Scrape URL) → Process DataConfiguration:
- URL:
https://example.com/product-page - Country Code:
us(optional) - Force Fresh:
false(optional) - Max Wait Time:
300seconds (optional) - Poll Interval:
3seconds (optional)
Output:
{
"success": true,
"operation": "scrapeUrl",
"request_id": "req_123456",
"url": "https://example.com/product-page",
"status": "completed",
"html": "...",
"markdown": "...",
"generatedJson": {
// Structured data extracted from the page
}
}2. Search
Perform AI-powered searches using Perplexity AI. Get instant answers with citations.
Trigger → Anakin (Search) → Process ResultsConfiguration:
- Search Query:
What are the latest trends in AI? - Max Results:
5(optional, default: 5)
Output:
{
"success": true,
"operation": "search",
"query": "What are the latest trends in AI?",
"answer": "Based on recent developments...",
"results": [
{
"title": "AI Trends 2026",
"url": "https://example.com/ai-trends",
"content": "Summary of the article...",
"score": 0.95
}
],
"count": 5
}Use Cases:
- Research and fact-checking
- Competitive intelligence
- Content research
- Real-time information gathering
3. Agentic Search
Advanced multi-stage AI pipeline that automatically:
- Searches for relevant information
- Identifies and scrapes citation sources
- Extracts structured data using AI
- Generates a comprehensive summary
Trigger → Anakin (Agentic Search) → Process Structured DataConfiguration:
- Search Prompt:
Find the pricing plans for top 5 CRM software - Use Browser:
true(optional, more reliable) - Max Wait Time:
600seconds (optional) - Poll Interval:
5seconds (optional)
Output:
{
"success": true,
"operation": "agenticSearch",
"job_id": "job_789",
"status": "completed",
"query": "Find the pricing plans for top 5 CRM software",
"perplexity_answer": "Here are the top CRM solutions...",
"citations": [
{"url": "https://salesforce.com/pricing", "title": "Salesforce Pricing", "source_index": 0}
],
"scraped_data": [
{
"source_url": "https://salesforce.com/pricing",
"source_index": 0,
"data": {
"plans": [...],
"features": [...]
}
}
],
"chatgpt_schema": {
"type": "object",
"properties": {...}
},
"chatgpt_structured_data": {
"crm_platforms": [...]
},
"chatgpt_summary": "Comprehensive analysis of CRM pricing..."
}Use Cases:
- Market research with structured data
- Competitive analysis
- Lead generation with enriched data
- Automated data collection for reports
Configuration Options by Operation
Scrape URL Options
| Option | Description | Default | |--------|-------------|---------| | URL | The website URL to scrape | Required | | Max Wait Time | Maximum seconds to wait for completion | 300 | | Poll Interval | Seconds between status checks | 3 | | Country Code | Proxy country code (e.g., us, uk, de) | us | | Force Fresh | Bypass cache and force fresh scrape | false |
Search Options
| Option | Description | Default | |--------|-------------|---------| | Search Query | The question or query to search | Required | | Max Results | Maximum number of results to return | 5 |
Agentic Search Options
| Option | Description | Default | |--------|-------------|---------| | Search Prompt | The search prompt for analysis | Required | | Use Browser | Use browser for scraping (more reliable) | true | | Max Wait Time | Maximum seconds to wait for completion | 600 | | Poll Interval | Seconds between status checks | 5 |
How It Works
Scrape URL & Agentic Search (Async)
- Submit: The node submits your request to the Anakin API
- Poll: Automatically checks the job status every few seconds
- Return: Once complete, returns the data to your workflow
Search (Synchronous)
- Submit: The node sends your query to the Anakin API
- Return: Immediately returns search results with answers and citations
The node handles all the complexity of:
- Job submission and request management
- Automatic polling for async operations
- Intelligent error handling
- Timeout management
- Response parsing and formatting
Error Handling
The node will throw an error if:
- The scraping job fails
- The job doesn't complete within the max wait time
- The API returns an error
You can enable Continue on Fail in the node settings to handle errors gracefully.
API Endpoints Used
Scrape URL
POST /v1/request- Submit scraping jobGET /v1/request/{id}- Check job status
Search
POST /v1/search- Perform AI search (synchronous)
Agentic Search
POST /v1/agentic-search- Submit agentic search jobGET /v1/agentic-search/{jobId}- Check agentic search status
Development
Prerequisites
- Node.js >= 16
- n8n installed locally
Setup
# Clone the repository
git clone https://github.com/Anakin-Inc/blueprint-scribe-35.git
cd blueprint-scribe-35/external-integrations/n8n
# Install dependencies
npm install
# Build the node
npm run build
# Link for local development
npm link
cd ~/.n8n/custom
npm link n8n-nodes-anakin-scraperProject Structure
n8n-nodes-anakin-scraper/
├── credentials/
│ └── AnakinScraperApi.credentials.ts
├── nodes/
│ └── AnakinScraper/
│ └── AnakinScraper.node.ts
├── package.json
└── README.mdSupport
For issues, questions, or contributions:
License
MIT
Changelog
1.1.0
- Added AI-powered Search operation (Perplexity integration)
- Added Agentic Search operation (multi-stage AI pipeline)
- Improved error handling and logging
- Updated node display name to "Anakin"
1.0.0
- Initial release
- URL scraping with automatic polling
- Configurable polling intervals and timeouts
- Country-specific proxy support
- Cache control options
