n8n-nodes-crawl4ai_naf
v1.4.0
Published
Advanced web crawling, data extraction, and interaction nodes for n8n with LLM capabilities
Maintainers
Readme
Crawl4AI n8n Nodes
Advanced web crawling, data extraction, and interaction nodes for n8n with LLM capabilities.
Installation
npm install n8n-nodes-crawl4ai_nafFeatures
Main Crawl4ai Node
- Basic Crawling: Simple web page crawling with markdown/HTML extraction
- CSS Extraction: Extract structured data using CSS selectors
- LLM Extraction: Use LLM for complex data extraction
- Batch Processing: Process multiple URLs concurrently
- Anti-Detection: Undetected browser mode, stealth mode, CAPTCHA bypass
Crawl4ai Interaction Node
- Element Interaction: Click buttons, fill forms, handle dropdowns
- Authentication: Login form handling and session management
- LLM Prompts: Automate interactions using natural language prompts
- Multi-Step Workflows: Complex interaction sequences
Usage Examples
Basic Crawling
{
"nodes": [
{
"parameters": {
"operation": "basic_crawl",
"urlConfig": {
"urls": [
{
"url": "https://example.com"
}
]
},
"browserConfig": {
"settings": {
"headless": true,
"viewportWidth": 1920,
"viewportHeight": 1080
}
}
},
"name": "Crawl4ai",
"type": "n8n-nodes-crawl4ai_naf.crawl4ai",
"typeVersion": 1,
"position": [250, 300]
}
]
}Advanced Crawling with Authentication
{
"nodes": [
{
"parameters": {
"operation": "css_extraction",
"urlConfig": {
"urls": [
{
"url": "https://protected.example.com/dashboard"
},
{
"url": "https://protected.example.com/reports"
}
]
},
"browserConfig": {
"settings": {
"headless": true,
"viewportWidth": 1920,
"viewportHeight": 1080
}
},
"antiDetection": {
"settings": {
"undetected": true,
"stealth": true,
"captchaBypass": "2captcha"
}
},
"authConfig": {
"authSettings": {
"enableAuth": true,
"authType": "form",
"username": "your_username",
"password": "your_password",
"loginUrl": "https://protected.example.com/login"
}
},
"advancedConfig": {
"advancedSettings": {
"maxRetries": 3,
"timeout": 30000,
"concurrentRequests": 2,
"debugMode": true
}
}
},
"name": "Crawl4ai",
"type": "n8n-nodes-crawl4ai_naf.crawl4ai",
"typeVersion": 1,
"position": [250, 300]
}
]
}LLM Extraction Example
{
"nodes": [
{
"parameters": {
"operation": "llm_extraction",
"urlConfig": {
"urls": [
{
"url": "https://complex-data.example.com"
}
]
},
"browserConfig": {
"settings": {
"headless": true
}
}
},
"name": "Crawl4ai",
"type": "n8n-nodes-crawl4ai_naf.crawl4ai",
"typeVersion": 1,
"position": [250, 300]
}
]
}LLM Prompt Interaction
{
"nodes": [
{
"parameters": {
"interactionType": "llm_prompt",
"llmPromptConfig": {
"promptSettings": {
"promptText": "Find the login form, fill username with 'testuser' and password with 'testpass', then click the submit button",
"provider": "openai/gpt-4",
"maxTokens": 1000
}
}
},
"name": "Crawl4aiInteraction",
"type": "n8n-nodes-crawl4ai_naf.crawl4aiInteraction",
"typeVersion": 1,
"position": [250, 300]
}
]
}Element Click Interaction
{
"nodes": [
{
"parameters": {
"interactionType": "element_click",
"elementConfig": {
"clickSettings": {
"selector": "#submit-button",
"waitAfterClick": 2000
}
}
},
"name": "Crawl4aiInteraction",
"type": "n8n-nodes-crawl4ai_naf.crawl4aiInteraction",
"typeVersion": 1,
"position": [450, 300]
}
]
}Complete Workflow Example
{
"nodes": [
{
"parameters": {
"operation": "basic_crawl",
"urlConfig": {
"urls": [
{
"url": "https://example.com/login"
}
]
}
},
"name": "Crawl4ai",
"type": "n8n-nodes-crawl4ai_naf.crawl4ai",
"typeVersion": 1,
"position": [250, 300]
},
{
"parameters": {
"interactionType": "authentication",
"authConfig": {
"authSettings": {
"username": "[email protected]",
"password": "password123",
"loginUrl": "https://example.com/login"
}
}
},
"name": "Crawl4aiInteraction",
"type": "n8n-nodes-crawl4ai_naf.crawl4aiInteraction",
"typeVersion": 1,
"position": [450, 300]
},
{
"parameters": {
"operation": "css_extraction",
"urlConfig": {
"urls": [
{
"url": "https://example.com/dashboard"
}
]
}
},
"name": "Crawl4ai2",
"type": "n8n-nodes-crawl4ai_naf.crawl4ai",
"typeVersion": 1,
"position": [650, 300]
}
],
"connections": {
"Crawl4ai": {
"main": [
[
{
"node": "Crawl4aiInteraction",
"type": "main",
"index": 0
}
]
]
},
"Crawl4aiInteraction": {
"main": [
[
{
"node": "Crawl4ai2",
"type": "main",
"index": 0
}
]
]
}
}
}Configuration
Browser Configuration
- Headless Mode: Run browser in headless mode (default: true)
- Viewport: Set browser viewport dimensions (default: 1920x1080)
- User Agent: Custom user agent string
- Proxy Support: Configure proxy settings
Anti-Detection Settings
- Undetected Mode: Enable undetected browser mode
- Stealth Mode: Enable stealth mode with fingerprint masking
- CAPTCHA Bypass: Configure CAPTCHA bypass strategies (2Captcha, Anti-Captcha, Custom)
- Behavioral Simulation: Simulate human-like interactions
Authentication Options
- Basic Auth: Username/password authentication
- Form Auth: Form-based authentication with login URL
- OAuth2: OAuth2 token-based authentication
- API Key: API key authentication
- Session Cookie: Session cookie authentication
Advanced Configuration
- Max Retries: Maximum number of retry attempts (default: 3)
- Timeout: Request timeout in milliseconds (default: 30000)
- Concurrent Requests: Number of concurrent requests (default: 5)
- Debug Mode: Enable debug logging (default: false)
Development
Prerequisites
- Node.js 18+
- npm 9+
- TypeScript 5+
Building
npm install
npm run buildTesting
npm run testPublishing
npm publishError Handling
Both nodes include comprehensive error handling and validation:
- Input data validation
- URL format validation
- Configuration parameter validation
- Authentication credential validation
- Proper error messages and timestamps
Support
For issues, questions, or contributions, please contact: [email protected]
License
MIT
