@modular-intelligence/osint

v1.0.2

Published

2 months ago

MCP server for OSINT reconnaissance (theHarvester, GitHub, HIBP, DNS)

0High
0Medium
0Low

OSINT MCP Server

A comprehensive open-source intelligence (OSINT) reconnaissance platform that provides automated data gathering and intelligence collection capabilities. This MCP (Model Context Protocol) server enables Claude to perform domain reconnaissance, breach searches, email verification, GitHub analysis, and passive DNS lookups using multiple data sources and APIs.

Overview

This server provides access to multiple OSINT data sources and techniques through a unified interface:

theHarvester - Domain reconnaissance via CLI tool (subdomains, emails, IPs)
GitHub API - Organization/user/repository reconnaissance and secret discovery
Google Dork - Query generation for advanced Google search techniques
MX Records - Email domain validation and MX record lookup
Social Media - Username lookup across 25+ social platforms
Certificate Transparency - Passive DNS via crt.sh and SecurityTrails
Have I Been Pwned - Email breach detection and data class reporting

Perfect for security research, reconnaissance, incident response, threat intelligence, and OSINT investigations.

Tools

| Tool | Method | Description | |------|--------|-------------| | theharvester_search | theHarvester CLI | Domain reconnaissance (subdomains, emails, IPs) from 25+ sources | | github_recon | GitHub API | Organization/user/repository analysis with secret pattern detection | | google_dork | Query Generation | Generate Google dork queries for reconnaissance (manual execution) | | email_verify | DNS MX Lookup | Verify email domains via MX record resolution | | social_lookup | URL Generation | Generate social media profile URLs across 25 platforms | | passivedns_lookup | crt.sh + SecurityTrails | Passive DNS history and subdomain enumeration via CT logs | | breach_search | Have I Been Pwned API | Email breach detection with data class reporting |

theHarvester Domain Reconnaissance

Perform comprehensive domain OSINT reconnaissance using theHarvester CLI tool. Discovers emails, subdomains, and IP addresses from multiple data sources including DNS, certificate authorities, search engines, and threat intelligence feeds.

Input Parameters:

{
  domain: string                    // Target domain name
  sources: string[]                 // Data sources: anubis, baidu, bing, bufferoverun, censys, certspotter, crtsh, dnsdumpster, github-code, google, hunter, intelx, linkedin, netcraft, otx, rapiddns, securitytrail, shodan, sublist3r, threatcrowd, threatminer, twitter, urlscan, virustotal, yahoo (default: crtsh, dnsdumpster)
  limit: number                     // Maximum results (1-500, default: 100)
}

Example Request:

{
  "domain": "example.com",
  "sources": ["crtsh", "dnsdumpster", "bufferoverun"],
  "limit": 100
}

Example Output:

{
  "domain": "example.com",
  "sources": ["crtsh", "dnsdumpster", "bufferoverun"],
  "emails": [
    "[email protected]",
    "[email protected]",
    "[email protected]",
    "[email protected]"
  ],
  "hosts": [
    "mail.example.com",
    "www.example.com",
    "api.example.com",
    "staging.example.com",
    "cdn.example.com",
    "vpn.example.com"
  ],
  "ips": [
    "192.0.2.1",
    "192.0.2.2",
    "192.0.2.3"
  ],
  "total_emails": 4,
  "total_hosts": 6,
  "total_ips": 3,
  "raw_output": "..."
}

GitHub Organization/User/Repository Reconnaissance

Analyze GitHub targets for public information and potential exposed secrets. Searches for configuration files, credentials, API keys, and sensitive patterns across repositories.

Input Parameters:

{
  target: string          // GitHub username, organization name, or repository (owner/repo)
  target_type: string     // Type: "org", "user", or "repo"
  max_results: number     // Maximum results to return (1-100, default: 20)
}

Example Request:

{
  "target": "microsoft",
  "target_type": "org",
  "max_results": 20
}

Example Output:

{
  "target": "microsoft",
  "target_type": "org",
  "info": {
    "login": "microsoft",
    "name": "Microsoft",
    "description": "The home of Microsoft Open Source Software",
    "created_at": "2010-02-25T12:53:47Z",
    "updated_at": "2024-01-15T10:30:00Z",
    "public_repos": 3847,
    "followers": 234567,
    "following": 0,
    "blog": "https://opensource.microsoft.com",
    "email": null,
    "location": "Redmond, WA",
    "html_url": "https://github.com/microsoft"
  },
  "potential_secrets": [
    {
      "pattern": "filename:.env",
      "count": 12,
      "samples": [
        {
          "name": ".env",
          "path": "src/.env",
          "repository": "microsoft/repo1",
          "url": "https://github.com/microsoft/repo1/blob/main/src/.env"
        },
        {
          "name": ".env.example",
          "path": "config/.env.example",
          "repository": "microsoft/repo2",
          "url": "https://github.com/microsoft/repo2/blob/main/config/.env.example"
        }
      ]
    },
    {
      "pattern": "filename:credentials",
      "count": 5,
      "samples": [
        {
          "name": "credentials",
          "path": "docs/credentials",
          "repository": "microsoft/repo3",
          "url": "https://github.com/microsoft/repo3/blob/main/docs/credentials"
        }
      ]
    }
  ],
  "warning": "Note: GitHub API has rate limits. Unauthenticated: 60 req/hour, Authenticated: 5000 req/hour"
}

Google Dork Query Generation

Generate Google dork queries for manual reconnaissance of a target domain. Returns search query strings that can be executed manually in Google Search to find files, admin panels, login pages, configuration, databases, and error messages.

Input Parameters:

{
  domain: string              // Target domain name
  dork_type: string           // Query type: "files", "admin", "login", "config", "database", "errors", or "all"
  custom_dork: string         // Optional custom dork pattern to append
}

Example Request:

{
  "domain": "example.com",
  "dork_type": "all",
  "custom_dork": "password"
}

Example Output:

{
  "domain": "example.com",
  "dork_type": "all",
  "total_queries": 38,
  "queries": [
    {
      "category": "files",
      "query": "site:example.com filetype:pdf",
      "description": "Find PDF documents"
    },
    {
      "category": "files",
      "query": "site:example.com filetype:doc OR filetype:docx",
      "description": "Find Word documents"
    },
    {
      "category": "files",
      "query": "site:example.com filetype:sql",
      "description": "Find SQL dump files"
    },
    {
      "category": "admin",
      "query": "site:example.com inurl:admin",
      "description": "Find admin panels"
    },
    {
      "category": "admin",
      "query": "site:example.com intitle:\"admin panel\"",
      "description": "Find admin panels in title"
    },
    {
      "category": "login",
      "query": "site:example.com inurl:login",
      "description": "Find login pages"
    },
    {
      "category": "config",
      "query": "site:example.com filetype:env",
      "description": "Find .env configuration files"
    },
    {
      "category": "database",
      "query": "site:example.com inurl:phpmyadmin",
      "description": "Find phpMyAdmin installations"
    },
    {
      "category": "errors",
      "query": "site:example.com intitle:\"Error\" OR intitle:\"Warning\"",
      "description": "Find error and warning pages"
    },
    {
      "category": "custom",
      "query": "site:example.com password",
      "description": "Custom dork query"
    }
  ],
  "note": "These are Google dork query strings. Execute them manually in Google Search. Do not automate queries as it may violate Google's Terms of Service."
}

Email Domain Verification

Verify email address domains by performing MX (Mail Exchange) record lookups. Determines if a domain can receive email and returns mail server information.

Input Parameters:

{
  email: string  // Email address to verify
}

Example Request:

{
  "email": "[email protected]"
}

Example Output:

{
  "email": "[email protected]",
  "domain": "example.com",
  "has_mx_records": true,
  "mx_records": [
    {
      "exchange": "mail1.example.com",
      "priority": 10
    },
    {
      "exchange": "mail2.example.com",
      "priority": 20
    }
  ],
  "is_valid_domain": true,
  "primary_mx": "mail1.example.com",
  "total_mx_records": 2,
  "note": "MX records indicate the domain can receive email, but this does not verify if the specific email address exists."
}

Social Media Username Lookup

Search for a username across 25+ social media and online platforms. Returns probable profile URLs for manual verification.

Input Parameters:

{
  username: string            // Username to search
  platforms: string[]         // Optional list of specific platforms to check
}

Example Request:

{
  "username": "john_doe",
  "platforms": ["GitHub", "Twitter/X", "LinkedIn"]
}

Example Output:

{
  "username": "john_doe",
  "total_platforms": 3,
  "profiles": [
    {
      "platform": "GitHub",
      "url": "https://github.com/john_doe",
      "profile_url_pattern": "https://github.com/{username}"
    },
    {
      "platform": "Twitter/X",
      "url": "https://twitter.com/john_doe",
      "profile_url_pattern": "https://twitter.com/{username}"
    },
    {
      "platform": "LinkedIn",
      "url": "https://www.linkedin.com/in/john_doe",
      "profile_url_pattern": "https://www.linkedin.com/in/{username}"
    }
  ],
  "note": "These are probable profile URLs. They may not all exist. Manual verification or automated checking (respecting rate limits) is required to confirm existence."
}

Passive DNS History Lookup

Retrieve passive DNS history and subdomain enumeration via Certificate Transparency logs or SecurityTrails API. Discovers subdomains without active scanning.

Input Parameters:

{
  domain: string  // Target domain name
  ip: string      // Optional IP address to lookup
}

Example Request:

{
  "domain": "example.com"
}

Example Output:

{
  "domain": "example.com",
  "source": "crt.sh",
  "subdomains": [
    "api.example.com",
    "app.example.com",
    "cdn.example.com",
    "dev.example.com",
    "mail.example.com",
    "staging.example.com",
    "www.example.com"
  ],
  "total_subdomains": 7,
  "total_certificates": 45,
  "note": "Data from crt.sh Certificate Transparency logs. This is free but may be less comprehensive than SecurityTrails."
}

Email Breach Detection

Search Have I Been Pwned for email addresses in known data breaches. Returns breach information including data classes exposed and verification status.

Input Parameters:

{
  email: string  // Email address to search
}

Example Request:

{
  "email": "[email protected]"
}

Example Output (Breached):

{
  "email": "[email protected]",
  "breached": true,
  "total_breaches": 3,
  "total_exposed_accounts": 1250000,
  "unique_data_classes": [
    "Email addresses",
    "Passwords",
    "Physical addresses",
    "Phone numbers"
  ],
  "breaches": [
    {
      "name": "ExampleBreak",
      "title": "Example Data Breach",
      "domain": "example.com",
      "breach_date": "2023-06-15",
      "added_date": "2023-07-01T00:00:00Z",
      "pwn_count": 500000,
      "description": "A large breach affecting customer data",
      "data_classes": ["Email addresses", "Passwords", "Physical addresses"],
      "is_verified": true,
      "is_sensitive": false,
      "is_fabricated": false,
      "is_retired": false
    },
    {
      "name": "SampleLeaks",
      "title": "Sample Database Leaks",
      "domain": "sample.com",
      "breach_date": "2023-05-20",
      "added_date": "2023-05-25T00:00:00Z",
      "pwn_count": 750000,
      "description": "Credential database exposure",
      "data_classes": ["Email addresses", "Passwords", "Phone numbers"],
      "is_verified": true,
      "is_sensitive": true,
      "is_fabricated": false,
      "is_retired": false
    }
  ],
  "message": "Warning! This email address has been found in 3 data breach(es).",
  "recommendation": "Consider changing passwords for affected services and enabling two-factor authentication."
}

Example Output (Not Breached):

{
  "email": "[email protected]",
  "breached": false,
  "total_breaches": 0,
  "breaches": [],
  "message": "Good news! This email address has not been found in any known data breaches."
}

Configuration

Environment Variables

The server requires API keys for some services. Optional services (with fallbacks) enhance functionality but are not required:

# Required
export HIBP_API_KEY="your-have-i-been-pwned-api-key"

# Optional (enhances passive DNS with more comprehensive data)
export SECURITYTRAILS_API_KEY="your-securitytrails-api-key"

Getting API Keys

Have I Been Pwned (HIBP)

Register at https://haveibeenpwned.com/API/v3
Request API access and receive key via email
Free tier provides breach search access
Rate limit: 1 request per 1500ms
Documentation: https://haveibeenpwned.com/API/v3

SecurityTrails (Optional)

Sign up at https://securitytrails.com
Navigate to Account -> API Key
Free tier provides limited subdomain queries
Rate limit: 100 requests per month (free tier)
Documentation: https://docs.securitytrails.com

Rate Limits Summary

| Service | Tier | Rate Limit | |---------|------|-----------| | HIBP | Free | 1 req/1500ms | | SecurityTrails | Free | 100 req/month | | GitHub API | Unauthenticated | 60 req/hour | | GitHub API | Authenticated | 5000 req/hour | | crt.sh | Unlimited | No documented limit |

Prerequisites

Required

Bun runtime (version 1.x or later) OR Node.js 18+
HIBP_API_KEY environment variable (for breach search functionality)

Optional

theHarvester - Install via pip for domain reconnaissance:
```
pip install theHarvester
```
- Required only if using theharvester_search tool
- Learn more: https://github.com/laramies/theHarvester
SecurityTrails API Key - For enhanced passive DNS capabilities (crt.sh is free fallback)

Installation

Steps

Clone or download this repository:

git clone <repo-url>
cd osint

Install dependencies:

bun install

Build the project:

bun run build

Set environment variables:

export HIBP_API_KEY="your-api-key"
export SECURITYTRAILS_API_KEY="your-api-key"  # Optional

Run the server:

bun run start

The server will start listening on stdio transport.

Usage

Running the Server

Start the server with Bun:

bun run src/index.ts

The server implements the Model Context Protocol (MCP) and communicates via stdio transport. It can be integrated with Claude or other MCP clients.

Claude Desktop Configuration

Add the server to your Claude Desktop configuration at ~/Library/Application Support/Claude/claude_desktop_config.json:

{
  "mcpServers": {
    "osint": {
      "command": "bun",
      "args": [
        "run",
        "/path/to/osint/src/index.ts"
      ],
      "env": {
        "HIBP_API_KEY": "your-api-key",
        "SECURITYTRAILS_API_KEY": "your-api-key"
      }
    }
  }
}

Claude Code MCP Settings

Configure the server in Claude Code's MCP settings (typically in .mcp.json or via settings UI):

{
  "servers": {
    "osint": {
      "transport": "stdio",
      "command": "bun",
      "args": ["run", "/path/to/osint/src/index.ts"],
      "env": {
        "HIBP_API_KEY": "your-api-key",
        "SECURITYTRAILS_API_KEY": "your-api-key"
      }
    }
  }
}

Example Usage in Claude

Once configured, you can use the tools directly in conversations with Claude:

Request: "Perform OSINT reconnaissance on example.com. Get emails, subdomains, and check for breaches."

Claude will call:

{
  "tool": "theharvester_search",
  "input": {
    "domain": "example.com",
    "sources": ["crtsh", "dnsdumpster"],
    "limit": 100
  }
}

Request: "Search for potential secrets in the Microsoft GitHub organization."

Claude will call:

{
  "tool": "github_recon",
  "input": {
    "target": "microsoft",
    "target_type": "org",
    "max_results": 20
  }
}

Request: "Check if [email protected] has been in any data breaches."

Claude will call:

{
  "tool": "breach_search",
  "input": {
    "email": "[email protected]"
  }
}

Request: "Generate Google dork queries to find exposed configuration files on example.com"

Claude will call:

{
  "tool": "google_dork",
  "input": {
    "domain": "example.com",
    "dork_type": "config",
    "custom_dork": "credentials"
  }
}

Security

This server implements comprehensive input validation and security measures to prevent injection attacks and ensure responsible use:

Input Validation

Domain Validation

Requires valid domain name format (RFC-compliant)
Maximum length: 253 characters
Validates character set (alphanumeric, dots, hyphens)
Rejects invalid TLDs

Email Validation

Requires valid email format
Validates domain portion independently
Supports standard email formats

Username Validation

Alphanumeric with dots, hyphens, underscores only
Maximum length: 50 characters
Prevents injection via special characters

Query Validation

Maximum query length: 500 characters
Blocks shell injection characters: ;, &, |, backticks, $, (), {}
Prevents command injection via query parameters

API Security

API keys are never logged or exposed
Secure environment variable usage
HTTPS-only API communications
Proper error handling without credential leakage
Rate limit awareness and handling

What Gets Blocked

The server rejects:

Malformed domains and non-domain strings
Invalid email formats
Usernames with special characters
Queries containing shell metacharacters
Missing or invalid API keys
Oversized inputs

Error Handling

Invalid inputs return descriptive error messages
API errors are caught and reported with status codes
Missing API keys trigger helpful configuration messages
Network timeouts are handled gracefully
theHarvester errors provide installation guidance

License

ISC License - see LICENSE file for details

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

OSINT MCP Server

Overview

Tools

theHarvester Domain Reconnaissance

GitHub Organization/User/Repository Reconnaissance

Google Dork Query Generation

Email Domain Verification

Social Media Username Lookup

Passive DNS History Lookup

Email Breach Detection

Configuration

Environment Variables

Getting API Keys

Rate Limits Summary

Prerequisites

Required

Optional

Installation

Steps

Usage

Running the Server

Claude Desktop Configuration

Claude Code MCP Settings

Example Usage in Claude

Security

Input Validation

API Security

What Gets Blocked

Error Handling

License