mcp-research-scraper-combo
v1.1.0
Published
MCP Stealth Research Scraper - 10MB batch target for 8-personality research workflows
Maintainers
Readme
🕷️ MCP Research Scraper Combo v1.1.0
Stealth Web Scraping MCP Server for 8-Personality Research Workflows
Target: 10MB per personality batch (80MB total for all 8 personalities)
Features
- 55 Tools for comprehensive web scraping
- Real Puppeteer Browser Automation - Full headless Chrome with stealth plugin
- 8-Personality Batch System - Neko-Arc, Mario, Noel, Glam, Hannibal, Tetora, Amaniya, Miwa
- Anti-Detection - puppeteer-extra-plugin-stealth, fingerprint spoofing, bot check detection
- MongoDB Atlas Export - Real MongoDB connection with batch exports
- RULE 67 & 71 Compliant - Research Batch Standard
Tool Categories
| Category | Tools | Description | |----------|-------|-------------| | Configuration | 5 | Stealth, proxy, fingerprint, user-agent | | Navigation | 5 | Goto, wait, scroll, back/forward | | Data Extraction | 15 | CSS, XPath, regex, tables, links, images, JSON-LD, meta, Shadow DOM, iframes, PDFs | | Pagination | 5 | Click, URL pattern, infinite scroll, load more, API | | Interaction | 5 | Click, type, select, hover, form submit | | Data Management | 8 | Store, get, clear, list, export JSON/CSV, merge, transform | | Anti-Detection | 5 | Bot detection, CAPTCHA, evasion, delays, mouse movement | | Network | 5 | Intercept, mock, log, API extraction, fetch | | Session | 5 | Cookies, storage, session export/import | | Batch & Personality | 7 | Init, add, progress, finalize, MongoDB export | | Utility | 5 | Screenshot, evaluate, page info, close, health |
Installation
npm install
npm run buildClaude Code Configuration
Add to ~/.claude.json:
{
"mcpServers": {
"research-scraper": {
"command": "node",
"args": ["/home/wakibaka/Documents/github/mcp-research-scraper-combo/dist/index.js"],
"env": {
"MONGODB_URI": "your-mongodb-uri"
}
}
}
}Usage Example
// 1. Initialize all 8 personality batches
scraper_init_all_batches({ topic: "Pinochet Threat Actor", targetSizeMB: 10 })
// 2. Configure stealth mode
scraper_config({ stealthMode: true, rateLimit: 2000 })
scraper_fingerprint_config({ canvas: true, webgl: true, audio: true })
// 3. Navigate and extract
scraper_goto({ url: "https://example.com/research" })
scraper_detect_bot_checks()
scraper_extract_css({ selector: ".article", multiple: true, property: "articles" })
scraper_extract_table({ selector: "table.data", property: "tables" })
// 4. Add to personality batches
scraper_add_to_batch({ personality: "neko-arc", data: {...}, source: "url" })
scraper_add_to_batch({ personality: "hannibal", data: {...}, source: "url" })
// 5. Check progress toward 10MB target
scraper_batch_progress()
// 6. Finalize and export
scraper_finalize_batch({ personality: "neko-arc" })
scraper_export_to_mongodb({ personality: "neko-arc", database: "research-db" })
// Or export all at once
scraper_export_all_batches({ database: "research-db" })Batch Progress Output
═══════════════════════════════════════════════════════════════
📊 BATCH PROGRESS
═══════════════════════════════════════════════════════════════
🐾 neko-arc: ████████░░ 78.5% (7.85 MB / 10 MB) 156 items
🎭 mario: ██████░░░░ 62.3% (6.23 MB / 10 MB) 124 items
🗡️ noel: █████████░ 91.2% (9.12 MB / 10 MB) 182 items
🎸 glam: ███░░░░░░░ 34.1% (3.41 MB / 10 MB) 68 items
🧠 hannibal: ██████████ 100% (10.0 MB / 10 MB) 200 items ✓
🎯 tetora: ████████░░ 82.7% (8.27 MB / 10 MB) 165 items
🔍 amaniya: ███████░░░ 71.9% (7.19 MB / 10 MB) 143 items
🔪 miwa: █████░░░░░ 55.6% (5.56 MB / 10 MB) 111 items
═══════════════════════════════════════════════════════════════
TOTAL: 57.63 MB / 80 MB (72.0%)
═══════════════════════════════════════════════════════════════Version
- 1.1.0 - Real Puppeteer browser automation + MongoDB exports
- Added puppeteer-extra-plugin-stealth for anti-detection
- Real navigation (goto, wait, scroll, back, forward)
- Real extraction (CSS, XPath, regex, tables, links, images, JSON-LD, meta)
- Real interaction (click, type, select, hover, form submit)
- Real MongoDB Atlas batch exports
- Session management (cookies, storage, export/import)
- Screenshot and evaluate tools
- Bot detection scanning
- 1.0.0 - Initial release with 55 tools (mock handlers)
- RULE 67 & 71 Compliant
- Private Repository
Author
wakibaka / Neko-Arc Defense System
