@ignidor/web-search-mcp
v1.3.0
Published
Local, unlimited web-search MCP server with BM25 ranking, Playwright crawling, and smart YouTube transcript extraction. DISCOVERY MODE: Get chapter outlines first, then extract specific sections. Perfect for long videos & bug fix workflows. No Docker, no
Maintainers
Readme
@ignidor/web-search-mcp
Local, unlimited web-search MCP server with BM25 ranking, Playwright crawling, and YouTube transcripts.
- 🔍 No API keys - Uses free DuckDuckGo HTML search
- 🚀 No rate limits - Unlimited searches, 24/7
- 🐳 No Docker - Direct Playwright integration (optional)
- 📊 Smart ranking - BM25 + hybrid scoring with freshness
- 📄 Full extraction - 1000+ words per page (not 200-word snippets)
- 🎬 YouTube transcripts - Fast, robust extraction with yt-dlp
- 💰 100% Free - Outperforms Brave Search, Tavily, commercial alternatives
Features
| Tool | Description |
|------|-------------|
| search | Fast web search with BM25 ranking (DuckDuckGo) |
| crawl_and_extract | Extract full content from URLs using Playwright |
| search_and_crawl | Search + extract top results (one-stop research) |
| get_youtube_transcript | Get YouTube video transcript (yt-dlp, 1-5sec) ⭐ NEW |
| capture_screenshot | Screenshot any webpage (base64 PNG) |
| generate_pdf | Convert webpage to PDF (base64) |
| extract_structured | CSS selector-based data extraction |
| execute_js | Run custom JavaScript on webpages |
| extract_regex | Extract emails, phones, URLs, dates (21 patterns) |
Quick Start
Installation (via npx)
npx @ignidor/web-search-mcpClaude Desktop / Cursor / Windsurf Config
For npx usage (recommended):
{
"mcpServers": {
"web-search": {
"command": "npx",
"args": ["-y", "@ignidor/web-search-mcp"]
}
}
}For local/SSH usage:
{
"mcpServers": {
"web-search": {
"command": "node",
"args": ["/path/to/dist/index.js"]
}
}
}Tool Examples
1. Search with BM25 Ranking
// Search for anything - unlimited queries, no API key
{
"name": "search",
"arguments": {
"query": "Rust programming language tutorial",
"limit": 10,
"rankingMode": "hybrid" // 'bm25' or 'hybrid'
}
}2. Search + Extract Full Content
// Best for deep research - gets full articles, not snippets
{
"name": "search_and_crawl",
"arguments": {
"query": "AWS DynamoDB batchWrite bug fix",
"extractTopN": 5,
"rerankAfterExtract": true
}
}Result: 8,000+ words of detailed content including:
- Root cause analysis
- Step-by-step fixes
- Complete code examples
- Common pitfalls
3. Get YouTube Transcript ⭐ NEW
// Fast, reliable transcript extraction (1-5 seconds)
{
"name": "get_youtube_transcript",
"arguments": {
"url": "https://www.youtube.com/watch?v=dQw4w9WgXcQ",
"language": "en",
"includeTimestamps": false,
"includeMetadata": true
}
}Features:
- Works with any video length (1 min or 10 hours - same speed!)
- Fetches existing captions (no audio processing)
- Multiple language support (en, es, fr, de, ja, ko, etc.)
- Optional timestamps:
[00:15] Text here - Metadata: title, duration, word count
- Uses yt-dlp (gold standard, 85k+ GitHub stars)
Requirements:
- Install yt-dlp:
brew install yt-dlp(macOS) orpip install yt-dlp
Supported URL formats:
- Full URL:
https://www.youtube.com/watch?v=VIDEO_ID - Short URL:
https://youtu.be/VIDEO_ID - Shorts:
https://www.youtube.com/shorts/VIDEO_ID - Video ID only:
VIDEO_ID
4. Extract Structured Data
// Scrape product listings, articles, etc.
{
"name": "extract_structured",
"arguments": {
"url": "https://example.com/products",
"baseSelector": ".product",
"fields": [
{ "name": "title", "selector": "h2", "type": "text" },
{ "name": "price", "selector": ".price", "type": "text" },
{ "name": "link", "selector": "a", "type": "attribute", "attribute": "href" }
]
}
}5. Execute JavaScript
// Great for dynamic content, debugging
{
"name": "execute_js",
"arguments": {
"url": "https://example.com",
"scripts": [
"return document.title",
"return document.links.length",
"return document.URL"
]
}
}6. Screenshot
{
"name": "capture_screenshot",
"arguments": {
"url": "https://example.com",
"waitFor": 2 // seconds
}
}6. Regex Extraction
// Extract emails, phones, URLs, etc.
{
"name": "extract_regex",
"arguments": {
"url": "https://example.com/contact",
"patterns": ["email", "phone_intl", "url"]
}
}21 built-in patterns: email, phone_intl, phone_us, url, ipv4, ipv6, uuid, currency, percentage, number, date_iso, date_us, time_24h, postal_us, postal_uk, hex_color, twitter_handle, hashtag, mac_addr, iban, credit_card, all
Playwright Setup (Optional but Recommended)
For full functionality (crawling, screenshots, PDFs, JS execution), install Playwright browsers:
npx playwright install chromiumWithout Playwright: Only search tool works (DuckDuckGo results only).
With Playwright: All 11 tools work with full content extraction.
Why This Over Brave Search?
| Feature | Brave Free | This MCP | |---------|-----------|----------| | Cost | Free tier only | 100% Free | | Rate Limits | 2,000 requests/month | Unlimited | | Content Depth | ~200 words snippet | 1,000+ words | | Ranking | Black-box | Transparent BM25 | | Infrastructure | Cloud API | Local control | | API Key | Required | Not needed |
Architecture
┌─────────────────────────────────────────────────────────────────┐
│ Claude Desktop / Cursor │
└───────────────────────────────┬─────────────────────────────────┘
│ MCP (stdio)
▼
┌─────────────────────────────────────────────────────────────────┐
│ @ignidor/web-search-mcp │
│ ┌───────────────────────────────────────────────────────────┐ │
│ │ Tool Router │ │
│ │ • search → DuckDuckGo + BM25 ranking │ │
│ │ • crawl_and_extract → Playwright → Markdown │ │
│ │ • search_and_crawl → Combined (search + extract) │ │
│ │ • capture_screenshot → Playwright → base64 PNG │ │
│ │ • generate_pdf → Playwright → base64 PDF │ │
│ │ • extract_structured → Playwright → CSS extraction │ │
│ │ • execute_js → Playwright → JS results │ │
│ │ • extract_regex → Playwright → 21 patterns │ │
│ └───────────────────────────┬───────────────────────────────┘ │
│ │ │
│ ┌───────────────────────────▼───────────────────────────────┐ │
│ │ Ranking Engine (BM25 + Hybrid) │ │
│ │ • fast-bm25 package for scoring │ │
│ │ • Freshness scoring (exponential decay) │ │
│ │ • Domain authority heuristics │ │
│ └───────────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ Playwright (optional) │
│ • Chromium browser for dynamic content │
│ • Screenshot, PDF generation │
│ • JavaScript execution │
└─────────────────────────────────────────────────────────────────┘Development
# Clone repo
git clone https://github.com/JayaBigDataIsCool/ignidor-web-search-mcp.git
cd ignidor-web-search-mcp
# Install dependencies
npm install
# Install Playwright (optional but recommended)
npx playwright install chromium
# Build
npm run build
# Run locally
npm startLicense
MIT © Ignidor Team
