npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

@zhafron/mcp-web-search

v1.3.0

Published

MCP server: Multi-provider web search (DuckDuckGo, Bing, SearXNG) with automatic fallback, and URL content extraction — no API keys required.

Readme

MCP Web Search

npm version npm downloads license

MCP server for web search and URL/resource loading. It works without API keys by default and stays local-first: search uses free providers, fetch_url extracts useful content from URLs, and binary/media downloads only happen when explicitly requested.

Features

  • search_web - multi-provider web search with automatic fallback across DuckDuckGo, Bing, and SearXNG.
  • fetch_url - universal URL/resource loader for HTML, PDF, text, Markdown, JSON, XML, CSV, media metadata, and supported site-specific URLs.
  • Clean normalized output with one content field plus metadata, pagination, links, media, attachments, and warnings.
  • Reddit thread extraction through Reddit JSON endpoints instead of brittle Reddit HTML scraping.
  • Long-resource pagination with max_length, start_index, and next_start_index.
  • Optional HTML link/media summaries.
  • Optional local download artifacts with download: true.
  • SSRF protection for localhost, private IPs, link-local ranges, IPv6 private ranges, and unsafe redirects.
  • No paid API required.

Requirements

  • Node.js 18+
  • Chrome/Chromium only if you use the Bing provider

MCP Configuration

Claude Code

{
  "mcpServers": {
    "web-search": {
      "command": "npx",
      "args": ["-y", "@zhafron/mcp-web-search"]
    }
  }
}

OpenCode

{
  "mcp": {
    "web-search": {
      "type": "local",
      "command": ["npx", "@zhafron/mcp-web-search"]
    }
  }
}

Custom Configuration

{
  "mcpServers": {
    "web-search": {
      "command": "npx",
      "args": ["-y", "@zhafron/mcp-web-search"],
      "env": {
        "DEFAULT_SEARCH_PROVIDER": "duckduckgo",
        "SEARXNG_URL": "http://localhost:8099"
      }
    }
  }
}

Tools

search_web

Search the web through one provider or through the fallback chain.

Input:

{
  "q": "openai codex reddit review",
  "limit": 10,
  "lang": "en",
  "provider": "duckduckgo"
}

Options:

| Option | Description | | ---------- | ----------------------------------------------------- | | q | Search query | | limit | Number of results, 1-50 | | lang | Search language, default en | | provider | Optional provider: duckduckgo, bing, or searxng |

Output:

{
  "items": [
    {
      "title": "Example Result",
      "url": "https://example.com",
      "snippet": "Result summary...",
      "source": "duckduckgo"
    }
  ],
  "providerUsed": "duckduckgo",
  "fallbackUsed": false,
  "triedProviders": ["duckduckgo"]
}

Fallback order:

  • DuckDuckGo → SearXNG → Bing
  • SearXNG → DuckDuckGo → Bing
  • Bing → DuckDuckGo → SearXNG

fetch_url

Fetch a URL and return extracted content plus metadata in a normalized envelope.

Input:

{
  "url": "https://example.com/article",
  "format": "markdown",
  "max_length": 8000,
  "start_index": 0,
  "include_links": true,
  "include_media": true
}

Options:

| Option | Description | | ---------------------- | ------------------------------------------------------------------------------------------------- | | url | URL to fetch | | format | markdown, text, html, json, raw, or metadata | | max_length | Maximum returned content characters, default 25000 | | start_index | Start content from this character index | | engine | auto, http, or browser; browser fallback is reserved for future optional support | | include_links | Include extracted links for HTML pages | | include_media | Include extracted image/video/audio references for HTML pages | | include_comments | Include comments for site adapters that support comments, default true for Reddit | | comment_limit | Maximum comments for comment-capable adapters, max 100 | | comment_sort | top, best, new, or controversial | | max_depth | Maximum comment nesting depth | | timeout_ms | Request timeout override | | fresh | Bypass in-memory cache | | download | Save original fetched bytes to a managed local file and return it in attachments; default false | | download_dir | Optional output directory for downloads; defaults to the system temp directory | | download_ttl_seconds | Cleanup TTL for managed downloads, default 86400 seconds | | max_download_bytes | Response/download byte cap override, additionally capped by MAX_BYTES |

Output:

{
  "url": "https://example.com/article",
  "final_url": "https://example.com/article",
  "title": "Example Article",
  "content_type": "text/html",
  "resource_type": "html",
  "format": "markdown",
  "content": "# Example Article\n\n...",
  "metadata": {
    "status": 200,
    "content_type": "text/html",
    "byte_length": 12345,
    "extractor": "html",
    "fetched_at": "2026-05-03T00:00:00.000Z"
  },
  "links": [],
  "media": {
    "images": [],
    "videos": [],
    "audio": []
  },
  "truncated": false,
  "original_length": 1200,
  "start_index": 0,
  "next_start_index": null,
  "warnings": []
}

Supported Resources

| Resource | Behavior | | ------------------------- | ------------------------------------------------------------------------------------------------------ | | HTML pages | Extracts readable article content, title, metadata, optional links, and optional media references | | Text and Markdown | Returns text directly with pagination support | | JSON | Pretty-prints JSON when format is json or text-like when requested | | XML and CSV-like text | Returns as text/data content | | PDF | Extracts text and PDF metadata | | Images | Returns metadata by default; saves the file only with download: true | | Audio and video | Returns metadata by default; saves the file only with download: true | | Archives and binary files | Returns metadata by default; downloads only when explicitly requested; archives are not auto-extracted | | Reddit threads | Uses Reddit JSON endpoints and can include comments with limits |

Local Downloads

fetch_url does not download binary/media files to disk by default. This avoids surprise disk usage and persistent local copies of arbitrary web content.

Use download: true when you need the original file available to another tool:

{
  "url": "https://httpbin.org/image/png",
  "format": "metadata",
  "download": true,
  "download_ttl_seconds": 86400
}

Download attachments look like this:

{
  "kind": "download",
  "path": "/tmp/mcp-web-search/downloads/mcp-fetch-id-image.png",
  "filename": "mcp-fetch-id-image.png",
  "original_filename": "image.png",
  "content_type": "image/png",
  "resource_type": "image",
  "byte_length": 8090,
  "sha256": "...",
  "expires_at": "2026-05-04T00:00:00.000Z"
}

Download safety behavior:

  • Downloads are opt-in only.
  • Files are written with 0600 permissions.
  • Filenames are sanitized and prefixed with a managed artifact ID.
  • SHA-256 is returned for verification.
  • Expired managed artifacts are cleaned up through sidecar metadata.
  • Cleanup only touches managed artifacts inside the configured download directory.
  • Archives are never auto-extracted.

Reddit Thread Extraction

Reddit thread URLs are handled by a site adapter and fetched through Reddit JSON endpoints.

Input example:

{
  "url": "https://www.reddit.com/r/codex/comments/abc123/gpt55_is_so_good/",
  "include_comments": true,
  "comment_limit": 30,
  "comment_sort": "top",
  "max_depth": 2
}

The output uses resource_type: "site" and metadata.extractor: "reddit-thread".

Reddit public JSON can still rate-limit or return 403/429 depending on Reddit, subreddit rules, and request frequency. When that happens, retry later or reduce request frequency.

Providers

| Provider | API Key Required | Notes | | ---------- | ---------------- | --------------------------------------------- | | DuckDuckGo | No | Default, simple, no browser required | | Bing | No | Uses Chrome/Chromium through Puppeteer | | SearXNG | No | Best option for self-hosted high-volume usage |

Environment Variables

| Variable | Default | Description | | ------------------------- | ----------------------- | -------------------------------------------------------------------------------------------- | | DEFAULT_SEARCH_PROVIDER | duckduckgo | Default search provider: duckduckgo, bing, or searxng | | SEARXNG_URL | http://localhost:8099 | SearXNG instance URL | | HTTP_TIMEOUT | 15000 | Request timeout in milliseconds | | MAX_BYTES | 20971520 | Maximum fetched response/download size | | MCP_COMPAT_MODE | unset | Set to legacy to simplify tools/list schemas for MCP clients with weak discovery parsers |

SearXNG Setup

SearXNG is a free self-hosted meta-search engine.

Quick setup with Docker:

mkdir -p ~/docker/searxng

Create ~/docker/searxng/settings.yml with JSON enabled, then run the SearXNG container. The important setting is search.formats containing both html and json.

Example relevant setting:

search:
  formats:
    - html
    - json

Then set:

export SEARXNG_URL="http://localhost:8099"

Chrome Setup for Bing Provider

| OS | Command | | ------------- | ----------------------------------- | | Ubuntu/Debian | sudo apt install chromium-browser | | Fedora | sudo dnf install chromium | | Arch | sudo pacman -S chromium | | macOS | brew install --cask google-chrome |

Custom path:

export CHROME_PATH="/path/to/chrome"

MCP Discovery Compatibility

Some MCP clients have weak schema parsers and fail during discovery on array-valued JSON Schema nodes such as enum or required.

If discovery fails, set:

export MCP_COMPAT_MODE="legacy"

This only simplifies advertised tools/list schemas. Tool execution behavior stays the same.

URL Safety

fetch_url blocks unsafe targets before fetching and before following redirects.

Blocked targets include:

  • localhost hostnames
  • .localhost and .local hostnames
  • private IPv4 ranges
  • IPv4 loopback, link-local, carrier-grade NAT, benchmark, multicast, and selected special-use ranges
  • IPv4-mapped IPv6 addresses that resolve to blocked IPv4 ranges
  • IPv6 loopback, unspecified, unique-local, multicast, and link-local ranges
  • redirects that resolve to blocked addresses

The HTTP transport resolves and validates addresses before connecting, then connects to the vetted address while preserving the original host/SNI for normal HTTPS behavior.

Repository Structure

  • src/server.ts - MCP server and tool schemas
  • src/providers/ - search providers
  • src/fetch/ - URL/resource loading pipeline
  • src/fetch/content/ - shared content helpers such as Markdown conversion and readability fallback
  • src/fetch/extractors/ - resource extractors for HTML, text/data, PDF, and media metadata
  • src/fetch/site-adapters/ - domain-specific extractors such as Reddit threads
  • src/utils/ - shared utilities
  • test/ - Node test runner tests

Troubleshooting

| Issue | Solution | | ----------------------------------------------------------- | ----------------------------------------------------------------------------------------- | | Chrome not found | Install Chrome/Chromium or set CHROME_PATH | | SearXNG 403 | Enable JSON API in settings.yml | | Timeout | Increase HTTP_TIMEOUT or pass timeout_ms | | MCP discovery error: 'list' object has no attribute 'get' | Set MCP_COMPAT_MODE=legacy | | Reddit 429 or 403 | Reddit rate limited or blocked the JSON endpoint; retry later or reduce request frequency | | Download missing from output | Set download: true; downloads are disabled by default | | Download rejected as too large | Increase max_download_bytes within the server cap or raise MAX_BYTES |

License

MIT