miyami-websearch-mcp

v1.6.0

Published

3 months ago

MCP server providing web search and content extraction for LLMs

0High
0Medium
0Low

mcp model-context-protocol llm ai search web-search claude anthropic searxng internet-access web-scraping content-extraction

Miyami WebSearch MCP

Connect your LLM to the internet! Search the web and extract content from any webpage using the Model Context Protocol.

🌟 Features

🔍 Web Search - Search across Google, DuckDuckGo, Bing, Brave, Wikipedia
🧠 Deep Research - Multi-query parallel research with compiled reports
🌐 Site Crawl - Depth-limited crawling with Trafilatura extraction
🎬 YouTube Transcripts - Fetch captions/subtitles from any YouTube video - NEW!
🛡️ FREE Stealth Mode - Anti-bot bypass (Cloudflare, DataDome, etc.)
⏰ Time-Range Filters - Filter results by recency (day, week, month, year)
📄 Enhanced Content Extraction - Trafilatura-powered (Firecrawl-quality) extraction
📝 Markdown Output - Get structured markdown from webpages
🎯 Rich Metadata - Automatically extract authors, dates, site names
⚡ Fast & Easy - One-line installation, zero configuration
🤖 LLM Optimized - Formatted responses perfect for AI consumption
🆓 100% Free - No API keys, no signup, no configuration needed
🔒 Privacy-First - No tracking, no data collection

📦 Installation

Option 1: Use with npx (Recommended - No Installation)

Add to your Claude Desktop config (~/Library/Application Support/Claude/claude_desktop_config.json):

{
  "mcpServers": {
    "miyami-websearch": {
      "command": "npx",
      "args": ["-y", "miyami-websearch-mcp"]
    }
  }
}

Option 2: Global Installation

npm install -g miyami-websearch-mcp

Then configure Claude Desktop:

{
  "mcpServers": {
    "miyami-websearch": {
      "command": "miyami-websearch-mcp"
    }
  }
}

That's it! Restart Claude Desktop and you're ready to search the web! 🎉

🚀 Quick Start

After adding to Claude Desktop config and restarting, try these prompts:

Search for the latest news about AI

Search for Python tutorials and summarize the top result

Fetch the content from https://example.com and summarize it

🛠️ Available Tools

1. `web_search`

Search the web using multiple search engines with optional time-range filtering.

Parameters:

query (required) - Your search query
categories (optional) - general, news, images, videos, science
language (optional) - Language code (default: en)
page (optional) - Page number (default: 1)
time_range (optional) - NEW! Filter by recency: day, week, month, year

Examples:

Search for "quantum computing breakthroughs" in news category

Search for AI news from the past 24 hours with time_range: day

Find recent Python tutorials from the past week with time_range: week

2. `fetch_webpage`

Extract clean content from any webpage using Trafilatura (Firecrawl-quality extraction).

Parameters:

url (required) - The webpage URL
include_links (optional) - Include links (default: true)
include_images (optional) - Include images (default: true)
max_content_length (optional) - Max length in characters (default: 50000)
format (optional) - Output format: text, markdown (default), html
extraction_mode (optional) - Engine: trafilatura (default, best quality), readability (faster)
stealth_mode (optional) - NEW! Anti-bot bypass: off, low, medium, high (default: off)
auto_bypass (optional) - NEW! Auto-escalate stealth if bot protection detected (default: false)

Enhanced Features:

📝 Markdown output - Get structured markdown like Firecrawl
🎯 Rich metadata - Authors, dates, site names automatically extracted
📊 Extraction stats - Word count, content length, format info
🛡️ Stealth mode - Bypass Cloudflare, DataDome, Akamai, etc.

Example:

Fetch and summarize https://en.wikipedia.org/wiki/Artificial_intelligence in markdown format

3. `search_and_fetch` ⭐ RECOMMENDED

Search and automatically fetch full content from top results with Trafilatura-quality extraction.

Parameters:

query (required) - Your search query
num_results (optional) - How many results to fetch (1-5, default: 3)
categories (optional) - Search categories
time_range (optional) - Filter by recency: day, week, month, year
format (optional) - Output format: text, markdown (default), html
stealth_mode (optional) - NEW! Anti-bot bypass: off, low, medium, high (default: off)
auto_bypass (optional) - NEW! Auto-escalate stealth if bot protection detected (default: false)

What it does:

✅ Searches for your query (with optional time filter)
✅ Gets top N results
✅ Automatically fetches full content (parallel)
✅ Uses Trafilatura for Firecrawl-quality extraction
✅ Returns both search snippets AND full webpage content
✅ FREE stealth mode for protected sites

Examples:

Research "climate change solutions" and give me detailed info from top 3 sources

Get recent AI breakthroughs from past 24 hours with full articles (time_range: day, num_results: 5)

Research recent web development tutorials from past week (time_range: week, format: markdown)

4. `deep_research` 🧠

Perform comprehensive parallel research across multiple topics at once with AI-powered reranking and compiled markdown reports.

Parameters:

queries (required) - Comma-separated list of research queries (max 10)
breadth (optional) - Results to fetch per query (1-5, default: 3)
time_range (optional) - Filter by recency: day, week, month, year
max_content_length (optional) - Max content per result (default: 30000)
stealth_mode (optional) - Anti-bot bypass: off, low, medium, high (default: off)
auto_bypass (optional) - Auto-escalate stealth if bot protection detected (default: false)

What it does:

✅ Process up to 10 queries in parallel for speed
✅ AI reranking for better relevance (always enabled)
✅ Auto-generates compiled markdown report
✅ Rich metadata extraction (author, date, source)
✅ Server-side caching (30 minutes)
✅ Aggregated statistics across all queries
✅ FREE stealth mode for protected sites

Examples:

Research "AI trends 2024,machine learning basics,ChatGPT use cases" with deep_research

Deep research on "React vs Vue,Next.js features,frontend trends" from past month

Comprehensive research: "climate solutions,renewable energy,carbon capture" with breadth: 5

5. `crawl_site` 🌐 NEW!

Depth-limited site crawler powered by Scrapy + Trafilatura. Returns structured pages with content, metadata, links, and word counts. Supports FREE stealth mode.

Parameters:

start_url (required) - Starting URL to crawl
max_pages (optional) - Max pages to crawl (1-200, default: 50)
max_depth (optional) - Link depth (0-5, default: 2)
format (optional) - Output format: text, markdown (default), html
include_links (optional) - Include extracted links (default: true)
include_images (optional) - Include image URLs (default: true)
url_patterns (optional) - Comma-separated regex to include (e.g. /blog/,/docs/)
exclude_patterns (optional) - Comma-separated regex to exclude
stealth_mode (optional) - Anti-bot bypass: off, low, medium, high (default: off)
obey_robots (optional) - Respect robots.txt (default: true; set false to bypass)

What it does:

✅ Depth-limited recursive crawling (Scrapy subprocess)
✅ Trafilatura extraction with metadata + word counts
✅ Include/exclude URL filtering
✅ FREE stealth mode with optional auto-escalation on server
✅ 15-minute crawl timeout and 30-minute cache

Examples:

Crawl docs site: start_url=https://docs.example.com max_depth=3 url_patterns=/api/,/guides/

Bypass robots on a small crawl: start_url=https://site.com max_pages=5 obey_robots=false stealth_mode=high

Filter sections: start_url=https://blog.example.com url_patterns=/2024/,/tech/ exclude_patterns=/archive/

6. `yt_transcript` 🎬 NEW!

Fetch transcripts/captions from YouTube videos for LLM consumption. Supports multiple formats, language selection, translation, and time-range slicing.

Parameters:

video (required) - YouTube video URL or 11-character video ID (supports all formats: full URL, youtu.be, embed, shorts)
format (optional) - Output format: text (default), json (with timestamps), srt (subtitles)
lang (optional) - Preferred language code (e.g., en, es, hi, fr). Default: auto
translate (optional) - Translate transcript to target language code
start (optional) - Start time in seconds for trimming
end (optional) - End time in seconds for trimming
list_langs (optional) - List available transcript languages instead of fetching (default: false)

What it does:

✅ Extract transcripts from any YouTube video with captions
✅ Multiple output formats (plain text, JSON with timestamps, SRT subtitles)
✅ Language selection for multilingual videos
✅ Translation to any supported language (via YouTube)
✅ Time-range slicing for specific segments
✅ List available transcript languages
✅ Stats: word count, segment count, duration
✅ 1-hour server-side caching

Examples:

Get transcript from YouTube video: video=dQw4w9WgXcQ format=text

Get transcript with timestamps: video=https://www.youtube.com/watch?v=dQw4w9WgXcQ format=json

Get Spanish transcript: video=dQw4w9WgXcQ lang=es

Translate to French: video=dQw4w9WgXcQ translate=fr

Get specific time range (60-120 seconds): video=dQw4w9WgXcQ start=60 end=120

List available languages: video=dQw4w9WgXcQ list_langs=true

💡 Usage Examples

Research a Topic

Use search_and_fetch to research "artificial general intelligence latest developments" 
from the top 3 results and give me a comprehensive summary

Get Recent News (Time-Range Filter)

Search for AI breakthroughs from the past 24 hours using time_range: day

Recent Tutorials (Time-Range Filter)

Find Python tutorials from the past week using search with time_range: week

Fetch with Markdown Output

Fetch this article in markdown format: https://example.com/article

Research Recent Developments

Use search_and_fetch to research "quantum computing" from the past week 
with time_range: week and get full article content in markdown

Find Specific Information

Search for "best restaurants in Tokyo" and show me the top 5 results

Multi-step Research

1. Search for "Python web scraping libraries"
2. Fetch the documentation page in markdown format
3. Explain how to use it with examples

🔧 Configuration

No configuration needed! 🎉

This MCP server connects to a free public API automatically. Just add it to your Claude Desktop config and it works immediately.

If you're looking for advanced configuration options, there aren't any - we've kept it simple on purpose!

🐛 Troubleshooting

"MCP server not appearing in Claude Desktop"

Check your claude_desktop_config.json is valid JSON
Restart Claude Desktop completely (Quit and reopen)
Check Console.app (macOS) for error messages

"First search is slow (30-60 seconds)"

This is normal! The free tier API sleeps after inactivity. Subsequent requests are fast.

"Connection timeout"

The backend API is on Render free tier and may be waking up. Wait 60 seconds and retry.

"Tools not working"

Ensure you have Node.js 18+ installed: node --version
Try global install instead of npx
Check GitHub Issues

📡 API Backend

This MCP server connects to a free public API:

URL: https://websearch.miyami.tech (hardcoded, no config needed)
Cost: 100% Free - no API keys or signup required
Privacy: No logging, no tracking, no data collection
Engines: Google, DuckDuckGo, Bing, Brave, Wikipedia, Startpage
Stealth Mode: FREE anti-bot bypass (Cloudflare, DataDome, Akamai, etc.)

🤝 Contributing

Contributions welcome! Please:

Fork the repository
Create a feature branch
Make your changes
Submit a pull request

Report Issues

Found a bug? Open an issue

📄 License

MIT License - see LICENSE file for details

🌟 Star History

If this tool helps you, please star the repo! ⭐

Made with ❤️ for the LLM community

Connect your AI to the internet in seconds, not hours.

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

Miyami WebSearch MCP

🌟 Features

📦 Installation

Option 1: Use with npx (Recommended - No Installation)

Option 2: Global Installation

🚀 Quick Start

🛠️ Available Tools

1. web_search

2. fetch_webpage

3. search_and_fetch ⭐ RECOMMENDED

4. deep_research 🧠

5. crawl_site 🌐 NEW!

6. yt_transcript 🎬 NEW!

💡 Usage Examples

Research a Topic

Get Recent News (Time-Range Filter)

Recent Tutorials (Time-Range Filter)

Fetch with Markdown Output

Research Recent Developments

Find Specific Information

Multi-step Research

🔧 Configuration

🐛 Troubleshooting

"MCP server not appearing in Claude Desktop"

"First search is slow (30-60 seconds)"

"Connection timeout"

"Tools not working"

📡 API Backend

🤝 Contributing

Report Issues

📄 License

🌟 Star History

1. `web_search`

2. `fetch_webpage`

3. `search_and_fetch` ⭐ RECOMMENDED

4. `deep_research` 🧠

5. `crawl_site` 🌐 NEW!

6. `yt_transcript` 🎬 NEW!