@theosunxyzw/web-search-mcp

v0.1.3

Published

4 months ago

MCP web search server using HTTP + cheerio, Bing, and SQLite caching with technical site prioritization

0High
0Medium
0Low

theosunzzz

mcp model-context-protocol ai llm web-search bing http cheerio sqlite

MCP Web Search Server

A Model Context Protocol (MCP) server for web search using HTTP + Cheerio with Bing international search and DuckDuckGo fallback, featuring intelligent technical site prioritization and local SQLite caching.

Features

HTTP-based Search: Uses native fetch + Cheerio for fast, reliable web scraping (no browser needed)
Dual Search Engines: Bing with ensearch=1 for English international results, DuckDuckGo HTML API as fallback
Technical Site Prioritization: Automatically prioritizes GitHub, Stack Overflow, MDN, npm, PyPI, and other developer resources
Smart Caching: SQLite with FTS5 fuzzy search, auto-expire after 30 days, skip fetch if cached
Smart Content Extraction: Uses Mozilla Readability to extract main article content, converting to clean Markdown
Parallel Fetching: Concurrent page loading (5 pages at a time) for faster results
Content Cleaning: Removes scripts, styles, ads, nav, footer, base64 images from content

Quick Start

# Install dependencies
bunx @theosunxyzw/web-search-mcp

MCP Tools

`web_search`

Search the web and get summarized results.

Parameters: | Parameter | Type | Default | Description | |-----------|------|---------|-------------| | query | string | required | Search query | | items | number | 10 | Max results to return (1-50) | | timeout | number | 10000 | Page load timeout in ms (5000-60000) |

Features:

Fuzzy searches local cache first
Fetches and prioritizes technical sites
Returns truncated content (500 words) with full content reference

`web_search_get_full_content`

Get full content for a cached search result.

Parameters: | Parameter | Type | Default | Description | |-----------|------|---------|-------------| | index | number | required | Result index from previous search |

Configuration

Environment variables:

| Variable | Default | Description | |----------|---------|-------------| | WEB_SEARCH_CACHE_TTL_DAYS | 30 | Cache time-to-live in days | | WEB_SEARCH_TIMEOUT | 10000 | Default page timeout (ms) | | WEB_SEARCH_MAX_CONCURRENT | 5 | Max parallel page fetches | | WEB_SEARCH_DEFAULT_ITEMS | 10 | Default result count | | WEB_SEARCH_DEBUG | false | Enable debug logging |

Cache

Search results are cached in ~/.mcp/web_search/cache.db with:

Full markdown content
30-day TTL with lazy cleanup
FTS5-powered fuzzy search for cache hits

Priority Sites

Results are automatically prioritized from these technical sites:

github.com
stackoverflow.com
learn.microsoft.com
*.stackexchange.com
devdocs.io
docs.*.io
readthedocs.io
npmjs.com
pypi.org
crates.io
typescriptlang.org
developer.mozilla.org
nodejs.org
bun.sh

Development

bun dev              # Watch mode
bun test             # Run tests
bun typecheck        # Type check
bun check            # Lint and format

Project Structure

src/
├── index.ts              # Entry point
├── server.ts             # MCP server configuration
├── config.ts             # Environment configuration
├── browser/
│   └── search.ts         # Web search (Bing + DuckDuckGo fallback)
├── cache/
│   └── index.ts          # SQLite cache with FTS5
├── content/
│   ├── index.ts          # Content fetcher (parallel)
│   ├── markdown.ts       # HTML to Markdown (turndown)
│   └── truncate.ts       # 500-word truncation
├── prioritizer/
│   └── index.ts          # Technical site ranking
├── tools/
│   ├── index.ts          # Tool exports
│   ├── web_search.ts     # Main search tool
│   └── web_search_get_full_content.ts
└── types/
    └── index.ts          # TypeScript types

Using with OpenCode

Add to your OpenCode config:

{
  "mcp": {
    "servers": {
      "web-search": {
        "type": "local",
        "command": ["bunx", "@theosunxyzw/web-search-mcp"],
      }
    }
  }
}

Dependencies

@modelcontextprotocol/sdk - MCP SDK
cheerio - HTML parsing
@mozilla/readability - Content extraction
jsdom - DOM parsing
turndown - HTML to Markdown
zod - Schema validation
bun:sqlite - SQLite (built into Bun)

License

MIT

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme