npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

smart-web-mcp

v0.15.2

Published

Portable MCP server for web search, direct-URL fetch, site crawling, and docs export.

Readme

smart-web

Local MCP for agentic web retrieval.

smart-web bundles three tools in one stdio server:

  • smartfetch for direct URLs
  • smartsearch for discovery
  • smartcrawl for same-site multi-page traversal

It is designed for agents and MCP hosts, not for manual browser automation. The job is to return structured retrieval output first, then signal when a browser-task runtime would be a better next step.

If a page is login-gated, keep the login UI, session persistence, and capture flow in a companion runtime instead of moving that stateful behavior into the smart-web core.

If smart-web returns reproducible incorrect or misleading output, open a GitHub issue with the exact input, tool arguments, observed output, expected behavior, and version. Skip obvious transient network, auth, or rate-limit failures unless the smart-web classification itself looks wrong.

Highlights

  • one local MCP instead of separate search, fetch, and crawl servers
  • explicit routing: URL → smartfetch, query → smartsearch, known site → smartcrawl
  • up-front smartfetch acquisition lanes instead of broad direct → browser retry chains
  • optional Jina Reader relay for weak generic/article fetches when you explicitly enable it, with JSON mode and alternate-link preservation
  • shared document extraction for article-like pages keeps browser primary content separate from raw shell HTML, preserving headings, links, blocks, and Markdown when requested
  • budgeted smartfetch projection records explicit truncation metadata instead of scattering silent hardcoded caps
  • weak generic pages can recover through JSON-LD or Next.js payload extraction and same-origin RSS/Atom feed discovery before falling back to a handoff
  • paywalled article pages prefer lawful archive recovery and search-indexed reference previews before telling the host to escalate elsewhere
  • LinkedIn authwall handling prefers compliant public fallbacks such as archives and search-indexed reference metadata before giving up
  • legal paper fallback for academic URLs: DOI-aware OpenAlex, Unpaywall, Semantic Scholar, CORE discovery, and Europe PMC enrichment plus bioRxiv/medRxiv API fallback when a direct paper page is thin or blocked
  • site-native public search fallbacks cover Reddit, GitHub repository discovery, npm, Hacker News, Stack Exchange, and Velog when a matching site: query is available
  • structured output with assessment and pipeline fields for host-side decision making
  • local-first defaults with privacy-aware search/fetch behavior
  • useful normalization for common public surfaces instead of raw HTML dumps

Tool routing

  • smartfetch first when the user already gave a URL or shortlink
  • smartsearch when the user gave a topic, keywords, or a site: query
  • smartcrawl when you already know the site and need multiple relevant pages

| Situation | Tool | | ------------------------------------------------- | ------------- | | User already gave a URL or shortlink | smartfetch | | User gave a topic, keywords, or site: query | smartsearch | | You already know the site and need multiple pages | smartcrawl |

smart-web is a retrieval layer. It is not a general-purpose click/type/form-fill browser agent.[^1]

What the tools return

All three tools are shaped for agent consumption. Common high-signal fields include:

  • assessment: confidence, block/auth hints, recommended handoff
  • pipeline: resolve/acquire/normalize/assess stages, plus policy metadata for relay-style usage and authwall likelihood
  • budget: output projection metadata showing which fields were truncated and by how much
  • errors: structured failure reasons
  • post, thread, comments: normalized content surfaces
  • assets and download: file-like outputs when present

This lets hosts decide whether deterministic retrieval was good enough or whether they should escalate to an auth-aware companion flow or a broader browser-task runtime.[^2]

For smartfetch, MCP structuredContent always carries the projected JSON result. The human-readable content[0].text defaults to compact text instead of duplicating that JSON payload; set format: "json" only when a host needs JSON repeated in the text channel. format: "markdown" keeps the compact response shape and requests Markdown extraction when available.

Supported high-signal surfaces

  • communities: Reddit, DCInside, LinkedIn public posts with public fallback recovery, Algumon
  • search-native surfaces: Reddit, GitHub repository discovery, npm, Hacker News, Stack Exchange, Velog
  • competitive-programming surfaces: solved.ac problem/search/profile routes, BOJ problem/workbook/user pages, Codeforces problem/profile pages, AtCoder task/contest pages, QOJ problem pages, Jungol problem pages
  • local map place pages: Naver Map, Kakao Map, and redirected naver.me place-share links
  • reference/article pages: NamuWiki, Wikipedia, Naver Blog, Tistory, Velog
  • media/social: YouTube, X, Threads, Instagram, Telegram
  • commerce pages: Amazon, Coupang, Danawa, Aladin, AliExpress
  • archives/reference wrappers: Wayback snapshots, archive.md lookup pages, arXiv abstract pages with direct paper links
  • academic paper surfaces: arXiv, PubMed, PMC, bioRxiv, medRxiv, plus legal DOI-linked OA copies surfaced from OpenAlex, Unpaywall, Semantic Scholar, CORE, and Europe PMC when a publisher page is thin or paywalled

When a source stays blocked or under-specified, smart-web prefers partial but honest output over pretending to have verified content.

Install

npx -y smart-web-mcp

When smartfetch or smartcrawl needs Playwright-backed page loading, smart-web checks whether the bundled Chromium revision is available. If the local Playwright browser cache is missing or outdated, smart-web warns at startup and will try npx playwright install chromium automatically on first browser use before surfacing a structured error.

Host setup

Claude Code

claude mcp add smart-web -- npx -y smart-web-mcp

Codex

codex mcp add smart-web -- npx -y smart-web-mcp

Or via ~/.codex/config.toml (or project-scoped .codex/config.toml):

[mcp_servers.smart-web]
command = "npx"
args = ["-y", "smart-web-mcp"]

OpenCode

{
  "$schema": "https://opencode.ai/config.json",
  "mcp": {
    "smart-web": {
      "type": "local",
      "command": ["npx", "-y", "smart-web-mcp"],
      "enabled": true
    }
  }
}

Custom settings file

To use a non-default settings path, append --settings-file to the command:

npx -y smart-web-mcp --settings-file /absolute/path/to/smart-web.settings.json

For Claude Code:

claude mcp add smart-web -- npx -y smart-web-mcp --settings-file /absolute/path/to/smart-web.settings.json

For OpenCode, add "args" after "command":

{
  "mcp": {
    "smart-web": {
      "type": "local",
      "command": ["npx", "-y", "smart-web-mcp", "--settings-file", "/absolute/path/to/smart-web.settings.json"],
      "enabled": true
    }
  }
}

Configuration

Settings path

Default: ~/.config/smart-web/settings.json

Print a template:

npx -y smart-web-mcp --print-settings-example

Initialize the default file:

npx -y smart-web-mcp --init-settings

smart-web treats the settings file as the single runtime config surface.

Full settings reference

{
  // "balanced" (default) or "private"
  // "private" disables relay-style providers and public search helpers
  "profile": "balanced",

  "runtime": {
    // Optional override for staging/export temp files
    // Default: platform cache root (e.g. ~/.cache/smart-web/tmp)
    "tempDir": ""
  },

  "search": {
    // API keys — leave empty to skip that provider
    "exaApiKey": "",
    "braveSearchApiKey": "",
    // Self-hosted SearXNG instance URL
    "searxngBaseUrl": "",
    "enableSearxng": true
  },

  "fetch": {
    // Browser overrides — most users leave these empty
    "chromeChannel": "",
    "chromePath": "",
    // Auto-install Playwright Chromium when missing (default: true)
    "autoInstallPlaywright": true,
    // Jina Reader relay — opt-in third-party path for weak article pages
    "enableJinaReader": false,
    "jinaReaderBaseUrl": "https://r.jina.ai/",
    // Academic fallback — legal OA enrichment for paper URLs
    "enableAcademicFallback": true,
    "enableOpenAlex": true,
    "enableEuropePmc": true,
    "enableBiorxivApi": true,
    // Unpaywall — requires a contact email
    "enableUnpaywall": true,
    "unpaywallEmail": "",
    // Semantic Scholar — optional API key for higher-rate access
    "enableSemanticScholar": true,
    "semanticScholarApiKey": "",
    // CORE — optional API key for richer search-backed enrichment
    "enableCoreDiscovery": true,
    "coreApiKey": "",
    // FxTwitter — transparent x.com → fxtwitter redirect
    "enableFxTwitter": true,
    // Undetected-chromedriver — optional Python Selenium fallback for tough anti-bot pages
    "enableUndetectedChromedriver": true,
    "undetectedChromedriverPython": "python3",
    // Site-specific fetches
    "enableRedditJson": true,
    "enableYoutubeTranscript": true,
    // Archive fallback — Wayback and archive.md recovery
    "enableArchiveFallback": true,
    "enableWayback": true,
    "enableArchiveMd": true,
    // Output projection budget for smartfetch responses
    "outputBudget": {
      "maxPostTextChars": 24000,
      "maxPostMarkdownChars": 32000,
      "maxThreadItems": 40,
      "maxCommentItems": 40,
      "maxOutboundLinks": 120,
      "maxAssets": 30,
      "maxBlocks": 60,
      "maxBlockTextChars": 1000,
      "maxItemTextChars": 1200
    },
    // Optional default named browser profile for authenticated smartfetch calls.
    // Names are 1-64 characters: letters, numbers, dots, underscores, or hyphens;
    // they must start with a letter or number and cannot end with a dot.
    "browserProfile": ""
  },

  "network": {
    // Allow localhost/private/reserved fetch targets (default: false)
    // Applies to smartfetch, smartcrawl, and Korean public API helpers.
    "allowPrivateHosts": false
  }
}

Common setups

Default — no config file needed

Out of the box smart-web works with sensible defaults. Create a settings file only when you need to change something.

Minimal: profile only

{
  "profile": "balanced"
}

Private / local-first

{
  "profile": "private",
  "search": {
    "searxngBaseUrl": "http://localhost:8080"
  }
}

In private mode, relay-style provider requests are blocked, Exa and Brave default off unless you explicitly re-enable them, public no-key search fallbacks are disabled, and SearXNG bases must resolve to localhost or private addresses.

Private with Exa allowed back in

{
  "profile": "private",
  "search": {
    "exaApiKey": "...",
    "enableExa": true
  }
}

With Exa and Brave API keys

{
  "profile": "balanced",
  "search": {
    "exaApiKey": "exa-...",
    "braveSearchApiKey": "BSA-..."
  }
}

Disable Playwright auto-install

{
  "fetch": {
    "autoInstallPlaywright": false
  }
}

The server will return an actionable playwright_browsers_missing or playwright_browsers_outdated error instead.

Opt in to Jina Reader for weak generic article pages

{
  "fetch": {
    "enableJinaReader": true
  }
}

This stays opt-in because it is a third-party relay path. When enabled, it only applies to weak generic/article direct fetches and prefers Jina JSON mode when that improves the page. profile: "private" hard-disables relay-style providers.

Relay eligibility is provider-policy based. Generic and document-like providers can opt in, while browser/auth/social/commerce providers stay off unless their provider declares support. pipeline.policy.relay_style_allowed and pipeline.policy.relay_style_used tell hosts whether a relay path was allowed and whether it was used.

Tune smartfetch output budgets

{
  "fetch": {
    "outputBudget": {
      "maxPostTextChars": 16000,
      "maxThreadItems": 25,
      "maxOutboundLinks": 80
    }
  }
}

Extraction runs before budgeting so providers can work from complete page content. The budget applies to returned smartfetch projections and records truncation under budget.fields.

Force archive fallback off

{
  "fetch": {
    "enableArchiveFallback": false
  }
}

Legal OA DOI recovery for paywalled paper pages

{
  "fetch": {
    "enableUnpaywall": true,
    "unpaywallEmail": "[email protected]",
    "enableSemanticScholar": true,
    "enableCoreDiscovery": true
  }
}

Enable undetected-chromedriver from a dedicated virtualenv

{
  "fetch": {
    "enableUndetectedChromedriver": true,
    "undetectedChromedriverPython": "/absolute/path/to/venv/bin/python"
  }
}

This helper is optional. smart-web still works without it, but when installed and reachable it can act as a second browser engine for stubborn pages where Playwright alone is not enough.

Custom temp directory

{
  "runtime": {
    "tempDir": "/absolute/path/to/smart-web-tmp"
  }
}

Advanced overrides

These keys live inside settings.json when you need to force one provider on or off:

Search

  • search.enableExa
  • search.enableBrave
  • search.enableSearxng
  • search.enableDuckDuckGo
  • search.enableBraveHtml

Fetch

  • fetch.enableFxTwitter
  • fetch.enableXOembed
  • fetch.autoInstallPlaywright
  • fetch.enableJinaReader
  • fetch.jinaReaderBaseUrl
  • fetch.enableAcademicFallback
  • fetch.enableOpenAlex
  • fetch.enableEuropePmc
  • fetch.enableBiorxivApi
  • fetch.enableUnpaywall
  • fetch.unpaywallEmail
  • fetch.enableSemanticScholar
  • fetch.semanticScholarApiKey
  • fetch.enableCoreDiscovery
  • fetch.coreApiKey
  • fetch.enableUndetectedChromedriver
  • fetch.undetectedChromedriverPython
  • fetch.enableRedditJson
  • fetch.enableYoutubeTranscript
  • fetch.enableArchiveFallback
  • fetch.enableWayback
  • fetch.enableArchiveMd
  • fetch.outputBudget.maxPostTextChars
  • fetch.outputBudget.maxPostMarkdownChars
  • fetch.outputBudget.maxThreadItems
  • fetch.outputBudget.maxCommentItems
  • fetch.outputBudget.maxOutboundLinks
  • fetch.outputBudget.maxAssets
  • fetch.outputBudget.maxBlocks
  • fetch.outputBudget.maxBlockTextChars
  • fetch.outputBudget.maxItemTextChars

Compatibility

  • network.localOnly: advanced override that forces local-only behavior regardless of profile

Quick verification

Sanity-check the server with a few real calls:

  • smartfetch on a normal article URL
  • smartfetch on a Medium or other member-only article URL — confirm it returns either archive-backed content or an honest reference_only preview
  • smartfetch on a Naver Map, Kakao Map, or naver.me place-share URL
  • smartfetch on an arXiv abstract URL
  • smartfetch on a PubMed, PMC, or bioRxiv/medRxiv paper URL
  • smartfetch on a known product URL
  • smartsearch on a site: query
  • smartcrawl on a docs site or board you actually use

Reporting issues

If smart-web returns reproducible incorrect or misleading output, open a GitHub issue with:

  • exact input and tool arguments
  • observed output
  • expected behavior
  • smart-web-mcp version

Skip obvious transient network, auth, or rate-limit failures unless the classification itself looks wrong.

License

All rights reserved. See LICENSE for terms.

This software is licensed under a proprietary license that permits installation and use as a local MCP server but prohibits modification, redistribution, reverse engineering, or use in competing products.

References

[^1]: Model Context Protocol, "Tools" specification — MCP tools are a retrieval and integration surface, not a requirement to emulate full browser-automation flows inside one tool.

[^2]: Model Context Protocol Blog, "Tool Annotations as Risk Vocabulary: What Hints Can and Can't Do" (2026-03-16) — structured tool output and accurate hints improve host-side routing and escalation decisions.