deepcrawler-mcp
v1.0.2
Published
Production-ready MCP server for Hyperbrowser integration with AI-powered API discovery, browser automation, and web scraping
Downloads
27
Maintainers
Readme
Hyperbrowser MCP Server with DeepCrawler
Production-ready Model Context Protocol (MCP) server for Hyperbrowser integration with AI-powered API discovery, advanced web scraping, and browser automation.
Features
🔍 API Discovery (DeepCrawler)
- Discover APIs: Find hidden APIs on any website using AI agents
- Network Analysis: Analyze network traffic for API endpoints
- JavaScript Analysis: Extract APIs from JavaScript code
- CAPTCHA Solving: Detect and solve all CAPTCHA types
- OpenAPI Generation: Generate OpenAPI specifications automatically
- WebSocket Analysis: Analyze WebSocket connections and messages
🌐 Web Automation (Hyperbrowser)
- Link Extraction: Extract all hyperlinks from webpages with context
- Web Crawling: Crawl multiple pages with intelligent navigation
- Data Extraction: Extract structured data from webpages
- Browser Automation: Full browser control with Playwright
- Web Search: Search the web using Bing integration
⚙️ Production Features
- Retry Logic: Automatic exponential backoff retry on rate limits and server errors
- Credential Sanitization: API keys masked in all logs
- Zero-Config Execution: Works via
npxwithout prior installation - Multi-location Config: Supports
.envfiles in multiple locations - Full Type Safety: Complete TypeScript support with exported types
- Comprehensive Testing: 100% test coverage with unit and integration tests
- AI Assistant Support: Works with 20+ AI coding assistants
Installation
NPM (Recommended)
npm install -g deepcrawler-mcpNPX (Zero-Config)
npx deepcrawler-mcpPyPI
pip install deepcrawler-mcpQuick Start
1. Get Your API Keys
- OpenRouter API Key: Get one at https://openrouter.ai (for DeepCrawler AI agents)
- Hyperbrowser API Key: Get one at https://hyperbrowser.ai (for browser automation)
2. Set Up Environment
Create a .env file in your project root:
OPENROUTER_API_KEY=sk_live_your_key_here
HYPERBROWSER_API_KEY=your_hyperbrowser_key_here
LOG_LEVEL=infoOr set environment variables:
export OPENROUTER_API_KEY=sk_live_your_key_here
export HYPERBROWSER_API_KEY=your_hyperbrowser_key_here2. Start the Server
# Using npx
npx deepcrawler-mcp
# Using npm
npm run start
# Using Python
python -m jaegis_hyperbrowser_mcp3. List Available Tools
npx deepcrawler-mcp --list-toolsConfiguration
Environment Variables
| Variable | Required | Default | Description |
|----------|----------|---------|-------------|
| HYPERBROWSER_API_KEY | Yes | - | Your Hyperbrowser API key |
| HYPERBROWSER_BASE_URL | No | https://api.hyperbrowser.ai | API base URL |
| HYPERBROWSER_TIMEOUT | No | 30000 | Request timeout in ms |
| HYPERBROWSER_RETRY_ATTEMPTS | No | 3 | Number of retry attempts |
| HYPERBROWSER_RETRY_DELAY | No | 1000 | Initial retry delay in ms |
| LOG_LEVEL | No | info | Log level: debug, info, warn, error |
Configuration File Locations
The server checks for .env files in this order:
./.env(project root)~/.mcp/.env(user home)~/.env(user home)/etc/mcp/.env(system-wide)
AI Assistant Configuration
This MCP server works with 20+ AI coding assistants. See SETUP_GUIDE.md for detailed configuration instructions for:
- Augment Code -
config-examples/augment-config.json - Claude Desktop -
config-examples/claude_desktop_config.json - Cursor -
config-examples/cursor-config.json - Cline -
config-examples/cline-config.json - GitHub Copilot -
config-examples/github-copilot-config.json - Tabnine -
config-examples/tabnine-config.json - Cody -
config-examples/cody-config.json - And 13+ more...
Quick Configuration Example
For most assistants, add this to your MCP configuration:
{
"mcp_server": {
"command": "npx",
"args": ["deepcrawler-mcp"],
"env": {
"OPENROUTER_API_KEY": "sk_live_your_key_here",
"HYPERBROWSER_API_KEY": "your_hyperbrowser_key_here"
}
}
}Then restart your AI assistant and verify tools are available.
Tools
DeepCrawler Tools (AI-Powered API Discovery)
discover_apis
Discover hidden APIs on a website using AI agents.
Parameters:
url(string, required): Target website URLdepth(number, optional): Crawl depth (1-5, default: 2)mode(string, optional): 'direct' or 'crew' (default: 'direct')include_websockets(boolean, optional): Include WebSocket analysis (default: true)include_static_analysis(boolean, optional): Include JavaScript analysis (default: true)timeout(number, optional): Timeout in ms (default: 300000)
Example:
{
"url": "https://example.com",
"depth": 2,
"mode": "direct"
}analyze_network_traffic
Analyze network traffic for API endpoints.
Parameters:
url(string, required): Target website URLduration(number, required): Analysis duration in seconds (1-300)filter_by_type(string, optional): Filter by request type
analyze_javascript_code
Extract APIs from JavaScript code.
Parameters:
url(string, required): Target website URLinclude_comments(boolean, optional): Include code commentsinclude_strings(boolean, optional): Include string literals
solve_captcha
Detect and solve CAPTCHAs.
Parameters:
url(string, required): Target website URLcaptcha_type(string, required): Type of CAPTCHA (recaptcha_v2, recaptcha_v3, hcaptcha, image, audio)
generate_openapi_schema
Generate OpenAPI specifications from discovered APIs.
Parameters:
endpoints(array, required): List of API endpointsbase_url(string, required): API base URLtitle(string, optional): API titleversion(string, optional): API version
analyze_websockets
Analyze WebSocket connections.
Parameters:
url(string, required): Target website URLduration(number, required): Analysis duration in seconds (1-300)filter_by_type(string, optional): Filter by message type
Hyperbrowser Tools (Web Automation)
scrape_links
Extract all hyperlinks from a webpage.
Parameters:
url(string, required): Target webpage URLinclude_markdown(boolean, optional): Also return page content in markdown formatinclude_tags(array, optional): CSS selectors to includeexclude_tags(array, optional): CSS selectors to excludeonly_main_content(boolean, optional): Extract only main content links
Example:
{
"url": "https://example.com",
"include_markdown": true,
"only_main_content": true
}Response:
{
"links": [
{
"url": "https://example.com/page1",
"text": "Page 1",
"context": "Navigation link"
}
],
"markdown": "# Example\n\nContent here...",
"metadata": {
"total_links": 42,
"unique_links": 38,
"scraped_at": "2025-01-15T10:30:00Z"
}
}crawl_webpages
Crawl multiple linked pages from a starting URL.
Parameters:
url(string, required): Starting webpage URL to crawlfollowLinks(boolean, optional): Whether to follow links to other pages (default: false)maxPages(number, optional): Maximum number of pages to crawl, 1-100 (default: 10)outputFormat(array, optional): Desired output formats: markdown, html, links, screenshot
Example:
{
"url": "https://example.com",
"followLinks": true,
"maxPages": 5,
"outputFormat": ["markdown", "links"]
}Response:
{
"pages": [
{
"url": "https://example.com",
"title": "Example",
"content": "Page content...",
"links": [
{
"url": "https://example.com/page1",
"text": "Page 1"
}
]
}
],
"metadata": {
"total_pages": 5,
"crawled_at": "2025-01-15T10:30:00Z",
"duration_ms": 2500
}
}extract_structured_data
Extract structured data from webpages using JSON schemas.
Parameters:
urls(array, required): List of URLs to extract data fromschema(object, required): JSON schema defining the structure of data to extractprompt(string, optional): Custom prompt for extraction guidance
Example:
{
"urls": ["https://example.com/product1", "https://example.com/product2"],
"schema": {
"title": { "type": "string" },
"price": { "type": "string" },
"description": { "type": "string" }
},
"prompt": "Extract product information"
}Response:
{
"results": [
{
"url": "https://example.com/product1",
"data": {
"title": "Product 1",
"price": "$99.99",
"description": "Great product"
},
"success": true
}
],
"metadata": {
"total_urls": 2,
"successful": 2,
"failed": 0,
"extracted_at": "2025-01-15T10:30:00Z"
}
}browser_use_agent
Execute advanced browser automation tasks with step-by-step execution.
Parameters:
task(string, required): Description of the browser task to executeurl(string, optional): Starting URL for the taskmaxSteps(number, optional): Maximum number of steps to execute, 1-100 (default: 10)returnStepInfo(boolean, optional): Whether to return detailed step information (default: false)
Example:
{
"task": "Click the submit button and wait for confirmation",
"url": "https://example.com/form",
"maxSteps": 5,
"returnStepInfo": true
}Response:
{
"result": "Task completed successfully",
"steps": [
{
"action": "navigate",
"result": "Navigated to page",
"timestamp": "2025-01-15T10:30:00Z"
},
{
"action": "click",
"result": "Clicked submit button",
"timestamp": "2025-01-15T10:30:01Z"
}
],
"metadata": {
"total_steps": 2,
"completed_at": "2025-01-15T10:30:02Z",
"success": true
}
}search_with_bing
Perform web searches using Bing search engine.
Parameters:
query(string, required): Search query stringnumResults(number, optional): Number of results to return, 1-50 (default: 10)
Example:
{
"query": "TypeScript best practices",
"numResults": 5
}Response:
{
"results": [
{
"title": "TypeScript Best Practices",
"url": "https://example.com/typescript-best-practices",
"snippet": "Learn the best practices for writing TypeScript code..."
}
],
"metadata": {
"query": "TypeScript best practices",
"total_results": 5,
"searched_at": "2025-01-15T10:30:00Z"
}
}AI Assistant Configuration
Augment Code
{
"mcpServers": {
"hyperbrowser": {
"command": "npx",
"args": ["deepcrawler-mcp"],
"env": {
"HYPERBROWSER_API_KEY": "${HYPERBROWSER_API_KEY}"
}
}
}
}Claude Desktop
{
"mcpServers": {
"hyperbrowser": {
"command": "npx",
"args": ["deepcrawler-mcp"],
"env": {
"HYPERBROWSER_API_KEY": "${HYPERBROWSER_API_KEY}"
}
}
}
}Cursor
{
"mcpServers": {
"hyperbrowser": {
"command": "npx",
"args": ["deepcrawler-mcp"],
"env": {
"HYPERBROWSER_API_KEY": "${HYPERBROWSER_API_KEY}"
}
}
}
}Development
Build
npm run buildTest
npm test
npm run test:coverageLint
npm run lint
npm run formatTroubleshooting
"Invalid API Key" Error
Ensure HYPERBROWSER_API_KEY is set correctly:
echo $HYPERBROWSER_API_KEYRate Limiting (429 Errors)
The server automatically retries with exponential backoff. Adjust retry settings:
HYPERBROWSER_RETRY_ATTEMPTS=5
HYPERBROWSER_RETRY_DELAY=2000Connection Timeouts
Increase the timeout value:
HYPERBROWSER_TIMEOUT=60000Debug Logging
Enable debug logging:
LOG_LEVEL=debugAPI Reference
See Hyperbrowser API Documentation
License
MIT - See LICENSE file for details
Support
- NPM Package: https://www.npmjs.com/package/deepcrawler-mcp
- GitHub: TBD - To be configured
- Issues: TBD - To be configured
- Hyperbrowser Docs: https://docs.hyperbrowser.ai
