@ignidor/web-search-mcp

v1.3.0

Published

2 months ago

Local, unlimited web-search MCP server with BM25 ranking, Playwright crawling, and smart YouTube transcript extraction. DISCOVERY MODE: Get chapter outlines first, then extract specific sections. Perfect for long videos & bug fix workflows. No Docker, no

@ignidor/web-search-mcp

Local, unlimited web-search MCP server with BM25 ranking, Playwright crawling, and YouTube transcripts.

🔍 No API keys - Uses free DuckDuckGo HTML search
🚀 No rate limits - Unlimited searches, 24/7
🐳 No Docker - Direct Playwright integration (optional)
📊 Smart ranking - BM25 + hybrid scoring with freshness
📄 Full extraction - 1000+ words per page (not 200-word snippets)
🎬 YouTube transcripts - Fast, robust extraction with yt-dlp
💰 100% Free - Outperforms Brave Search, Tavily, commercial alternatives

Features

| Tool | Description | |------|-------------| | search | Fast web search with BM25 ranking (DuckDuckGo) | | crawl_and_extract | Extract full content from URLs using Playwright | | search_and_crawl | Search + extract top results (one-stop research) | | get_youtube_transcript | Get YouTube video transcript (yt-dlp, 1-5sec) ⭐ NEW | | capture_screenshot | Screenshot any webpage (base64 PNG) | | generate_pdf | Convert webpage to PDF (base64) | | extract_structured | CSS selector-based data extraction | | execute_js | Run custom JavaScript on webpages | | extract_regex | Extract emails, phones, URLs, dates (21 patterns) |

Quick Start

Installation (via npx)

npx @ignidor/web-search-mcp

Claude Desktop / Cursor / Windsurf Config

For npx usage (recommended):

{
  "mcpServers": {
    "web-search": {
      "command": "npx",
      "args": ["-y", "@ignidor/web-search-mcp"]
    }
  }
}

For local/SSH usage:

{
  "mcpServers": {
    "web-search": {
      "command": "node",
      "args": ["/path/to/dist/index.js"]
    }
  }
}

Tool Examples

1. Search with BM25 Ranking

// Search for anything - unlimited queries, no API key
{
  "name": "search",
  "arguments": {
    "query": "Rust programming language tutorial",
    "limit": 10,
    "rankingMode": "hybrid"  // 'bm25' or 'hybrid'
  }
}

2. Search + Extract Full Content

// Best for deep research - gets full articles, not snippets
{
  "name": "search_and_crawl",
  "arguments": {
    "query": "AWS DynamoDB batchWrite bug fix",
    "extractTopN": 5,
    "rerankAfterExtract": true
  }
}

Result: 8,000+ words of detailed content including:

Root cause analysis
Step-by-step fixes
Complete code examples
Common pitfalls

3. Get YouTube Transcript ⭐ NEW

// Fast, reliable transcript extraction (1-5 seconds)
{
  "name": "get_youtube_transcript",
  "arguments": {
    "url": "https://www.youtube.com/watch?v=dQw4w9WgXcQ",
    "language": "en",
    "includeTimestamps": false,
    "includeMetadata": true
  }
}

Features:

Works with any video length (1 min or 10 hours - same speed!)
Fetches existing captions (no audio processing)
Multiple language support (en, es, fr, de, ja, ko, etc.)
Optional timestamps: [00:15] Text here
Metadata: title, duration, word count
Uses yt-dlp (gold standard, 85k+ GitHub stars)

Requirements:

Install yt-dlp: brew install yt-dlp (macOS) or pip install yt-dlp

Supported URL formats:

Full URL: https://www.youtube.com/watch?v=VIDEO_ID
Short URL: https://youtu.be/VIDEO_ID
Shorts: https://www.youtube.com/shorts/VIDEO_ID
Video ID only: VIDEO_ID

4. Extract Structured Data

// Scrape product listings, articles, etc.
{
  "name": "extract_structured",
  "arguments": {
    "url": "https://example.com/products",
    "baseSelector": ".product",
    "fields": [
      { "name": "title", "selector": "h2", "type": "text" },
      { "name": "price", "selector": ".price", "type": "text" },
      { "name": "link", "selector": "a", "type": "attribute", "attribute": "href" }
    ]
  }
}

5. Execute JavaScript

// Great for dynamic content, debugging
{
  "name": "execute_js",
  "arguments": {
    "url": "https://example.com",
    "scripts": [
      "return document.title",
      "return document.links.length",
      "return document.URL"
    ]
  }
}

6. Screenshot

{
  "name": "capture_screenshot",
  "arguments": {
    "url": "https://example.com",
    "waitFor": 2  // seconds
  }
}

6. Regex Extraction

// Extract emails, phones, URLs, etc.
{
  "name": "extract_regex",
  "arguments": {
    "url": "https://example.com/contact",
    "patterns": ["email", "phone_intl", "url"]
  }
}

21 built-in patterns: email, phone_intl, phone_us, url, ipv4, ipv6, uuid, currency, percentage, number, date_iso, date_us, time_24h, postal_us, postal_uk, hex_color, twitter_handle, hashtag, mac_addr, iban, credit_card, all

Playwright Setup (Optional but Recommended)

For full functionality (crawling, screenshots, PDFs, JS execution), install Playwright browsers:

npx playwright install chromium

Without Playwright: Only search tool works (DuckDuckGo results only).

With Playwright: All 11 tools work with full content extraction.

Why This Over Brave Search?

| Feature | Brave Free | This MCP | |---------|-----------|----------| | Cost | Free tier only | 100% Free | | Rate Limits | 2,000 requests/month | Unlimited | | Content Depth | ~200 words snippet | 1,000+ words | | Ranking | Black-box | Transparent BM25 | | Infrastructure | Cloud API | Local control | | API Key | Required | Not needed |

Architecture

┌─────────────────────────────────────────────────────────────────┐
│                    Claude Desktop / Cursor                      │
└───────────────────────────────┬─────────────────────────────────┘
                                │ MCP (stdio)
                                ▼
┌─────────────────────────────────────────────────────────────────┐
│                   @ignidor/web-search-mcp                       │
│  ┌───────────────────────────────────────────────────────────┐  │
│  │  Tool Router                                              │  │
│  │  • search              → DuckDuckGo + BM25 ranking         │  │
│  │  • crawl_and_extract   → Playwright → Markdown            │  │
│  │  • search_and_crawl     → Combined (search + extract)     │  │
│  │  • capture_screenshot  → Playwright → base64 PNG          │  │
│  │  • generate_pdf        → Playwright → base64 PDF          │  │
│  │  • extract_structured  → Playwright → CSS extraction      │  │
│  │  • execute_js          → Playwright → JS results          │  │
│  │  • extract_regex       → Playwright → 21 patterns         │  │
│  └───────────────────────────┬───────────────────────────────┘  │
│                              │                                  │
│  ┌───────────────────────────▼───────────────────────────────┐  │
│  │              Ranking Engine (BM25 + Hybrid)                │  │
│  │  • fast-bm25 package for scoring                           │  │
│  │  • Freshness scoring (exponential decay)                   │  │
│  │  • Domain authority heuristics                             │  │
│  └───────────────────────────────────────────────────────────┘  │
└─────────────────────────────────────────────────────────────────┘
                                │
                                ▼
┌─────────────────────────────────────────────────────────────────┐
│                  Playwright (optional)                         │
│  • Chromium browser for dynamic content                         │
│  • Screenshot, PDF generation                                    │
│  • JavaScript execution                                          │
└─────────────────────────────────────────────────────────────────┘

Development

# Clone repo
git clone https://github.com/JayaBigDataIsCool/ignidor-web-search-mcp.git
cd ignidor-web-search-mcp

# Install dependencies
npm install

# Install Playwright (optional but recommended)
npx playwright install chromium

# Build
npm run build

# Run locally
npm start

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

@ignidor/web-search-mcp

Features

Quick Start

Installation (via npx)

Claude Desktop / Cursor / Windsurf Config

Tool Examples

1. Search with BM25 Ranking

2. Search + Extract Full Content

3. Get YouTube Transcript ⭐ NEW

4. Extract Structured Data

5. Execute JavaScript

6. Screenshot

6. Regex Extraction

Playwright Setup (Optional but Recommended)

Why This Over Brave Search?

Architecture

Development

License

Links