@theosunxyzw/web-search-mcp
v0.1.3
Published
MCP web search server using HTTP + cheerio, Bing, and SQLite caching with technical site prioritization
Maintainers
Readme
MCP Web Search Server
A Model Context Protocol (MCP) server for web search using HTTP + Cheerio with Bing international search and DuckDuckGo fallback, featuring intelligent technical site prioritization and local SQLite caching.
Features
- HTTP-based Search: Uses native fetch + Cheerio for fast, reliable web scraping (no browser needed)
- Dual Search Engines: Bing with
ensearch=1for English international results, DuckDuckGo HTML API as fallback - Technical Site Prioritization: Automatically prioritizes GitHub, Stack Overflow, MDN, npm, PyPI, and other developer resources
- Smart Caching: SQLite with FTS5 fuzzy search, auto-expire after 30 days, skip fetch if cached
- Smart Content Extraction: Uses Mozilla Readability to extract main article content, converting to clean Markdown
- Parallel Fetching: Concurrent page loading (5 pages at a time) for faster results
- Content Cleaning: Removes scripts, styles, ads, nav, footer, base64 images from content
Quick Start
# Install dependencies
bunx @theosunxyzw/web-search-mcpMCP Tools
web_search
Search the web and get summarized results.
Parameters: | Parameter | Type | Default | Description | |-----------|------|---------|-------------| | query | string | required | Search query | | items | number | 10 | Max results to return (1-50) | | timeout | number | 10000 | Page load timeout in ms (5000-60000) |
Features:
- Fuzzy searches local cache first
- Fetches and prioritizes technical sites
- Returns truncated content (500 words) with full content reference
web_search_get_full_content
Get full content for a cached search result.
Parameters: | Parameter | Type | Default | Description | |-----------|------|---------|-------------| | index | number | required | Result index from previous search |
Configuration
Environment variables:
| Variable | Default | Description |
|----------|---------|-------------|
| WEB_SEARCH_CACHE_TTL_DAYS | 30 | Cache time-to-live in days |
| WEB_SEARCH_TIMEOUT | 10000 | Default page timeout (ms) |
| WEB_SEARCH_MAX_CONCURRENT | 5 | Max parallel page fetches |
| WEB_SEARCH_DEFAULT_ITEMS | 10 | Default result count |
| WEB_SEARCH_DEBUG | false | Enable debug logging |
Cache
Search results are cached in ~/.mcp/web_search/cache.db with:
- Full markdown content
- 30-day TTL with lazy cleanup
- FTS5-powered fuzzy search for cache hits
Priority Sites
Results are automatically prioritized from these technical sites:
- github.com
- stackoverflow.com
- learn.microsoft.com
- *.stackexchange.com
- devdocs.io
- docs.*.io
- readthedocs.io
- npmjs.com
- pypi.org
- crates.io
- typescriptlang.org
- developer.mozilla.org
- nodejs.org
- bun.sh
Development
bun dev # Watch mode
bun test # Run tests
bun typecheck # Type check
bun check # Lint and formatProject Structure
src/
├── index.ts # Entry point
├── server.ts # MCP server configuration
├── config.ts # Environment configuration
├── browser/
│ └── search.ts # Web search (Bing + DuckDuckGo fallback)
├── cache/
│ └── index.ts # SQLite cache with FTS5
├── content/
│ ├── index.ts # Content fetcher (parallel)
│ ├── markdown.ts # HTML to Markdown (turndown)
│ └── truncate.ts # 500-word truncation
├── prioritizer/
│ └── index.ts # Technical site ranking
├── tools/
│ ├── index.ts # Tool exports
│ ├── web_search.ts # Main search tool
│ └── web_search_get_full_content.ts
└── types/
└── index.ts # TypeScript typesUsing with OpenCode
Add to your OpenCode config:
{
"mcp": {
"servers": {
"web-search": {
"type": "local",
"command": ["bunx", "@theosunxyzw/web-search-mcp"],
}
}
}
}Dependencies
@modelcontextprotocol/sdk- MCP SDKcheerio- HTML parsing@mozilla/readability- Content extractionjsdom- DOM parsingturndown- HTML to Markdownzod- Schema validationbun:sqlite- SQLite (built into Bun)
License
MIT
