smart-web-mcp
v0.7.12
Published
Portable MCP server for web search, direct-URL fetch, site crawling, and docs export.
Maintainers
Readme
smart-web
Local MCP for agentic web retrieval.
smart-web bundles three tools in one stdio server:
smartfetchfor direct URLssmartsearchfor discoverysmartcrawlfor same-site multi-page traversal
It is designed for agents and MCP hosts, not for manual browser automation. The job is to return structured retrieval output first, then signal when a browser-task runtime would be a better next step.
- npm:
https://www.npmjs.com/package/smart-web-mcp - MCP Registry:
io.github.rich-jojo/smart-web - GitHub:
https://github.com/rich-jojo/smart-web
If smart-web returns reproducible incorrect or misleading output, open a GitHub issue with the exact input, tool arguments, observed output, expected behavior, and version. Skip obvious transient network, auth, or rate-limit failures unless the smart-web classification itself looks wrong.
Highlights
- one local MCP instead of separate search, fetch, and crawl servers
- explicit routing: URL ->
smartfetch, query ->smartsearch, known site ->smartcrawl - structured output with
assessmentandpipelinefields for host-side decision making - local-first defaults with privacy-aware search/fetch behavior
- useful normalization for common public surfaces instead of raw HTML dumps
Tool routing
Use the tools like this:
smartfetchfirst when the user already gave a URL or shortlinksmartsearchwhen the user gave a topic, keywords, or asite:querysmartcrawlwhen you already know the site and need multiple relevant pages
| Situation | Tool |
| --- | --- |
| User already gave a URL or shortlink | smartfetch |
| User gave a topic, keywords, or site: query | smartsearch |
| You already know the site and need multiple pages | smartcrawl |
smart-web is a retrieval layer. It is not a general-purpose click/type/form-fill browser agent.[^1]
What the tools return
All three tools are shaped for agent consumption.
Common high-signal fields include:
assessment: confidence, block/auth hints, recommended handoffpipeline: resolve/acquire/normalize/assess stageserrors: structured failure reasonspost,thread,comments: normalized content surfacesassetsanddownload: file-like outputs when present
This lets hosts decide whether deterministic retrieval was good enough or whether they should escalate to a browser-task runtime.[^2]
Supported high-signal surfaces
- communities: Reddit, DCInside, LinkedIn public posts, Algumon
- reference/article pages: NamuWiki, Wikipedia, Naver Blog, Tistory, Velog
- media/social: YouTube, X, Threads, Instagram, Telegram
- commerce pages: Amazon, Coupang, Danawa, Aladin, AliExpress
- archives/reference wrappers: Wayback snapshots, archive.md lookup pages, arXiv abstract pages with direct paper links
When a source stays blocked or under-specified, smart-web prefers partial but honest output over pretending to have verified content.
Install
npx -y smart-web-mcpnpm is the canonical package-manager path for this repo and package.
When smartfetch or smartcrawl needs Playwright-backed page loading, smart-web now checks whether the bundled Chromium revision is available. If the local Playwright browser cache is missing or outdated, smart-web warns at startup and will try npx playwright install chromium automatically on first browser use before surfacing a structured error.
Host setup
Claude Code
claude mcp add smart-web -- npx -y smart-web-mcpThe repo also ships a small Claude plugin wrapper for marketplace-style distribution.
OpenCode
{
"$schema": "https://opencode.ai/config.json",
"mcp": {
"smart-web": {
"type": "local",
"command": ["npx", "-y", "smart-web-mcp"],
"enabled": true
}
}
}Configuration
Default settings path:
~/.config/smart-web/settings.jsonPrint a template:
npx -y smart-web-mcp --print-settings-exampleInitialize the default file:
npx -y smart-web-mcp --init-settingsUse a non-default path:
{
"env": {
"SMART_WEB_SETTINGS_FILE": "/absolute/path/to/smart-web.settings.json"
}
}Recommended steady state: keep most runtime configuration in one settings file instead of a long MCP env block.
Most installations only need:
profile: "balanced"profile: "private"search.searxngBaseUrlsearch.exaApiKeysearch.braveSearchApiKey
If you do not want smart-web to auto-install Playwright browsers, set fetch.autoInstallPlaywright to false (or SMARTFETCH_AUTO_INSTALL_PLAYWRIGHT=false) and the server will return an actionable playwright_browsers_missing or playwright_browsers_outdated error instead.
Quick verification
Sanity-check the server with a few real calls:
smartfetchon a normal article URLsmartfetchon an arXiv abstract URLsmartfetchon a known product URLsmartsearchon asite:querysmartcrawlon a docs site or board you actually use
Examples
These examples use /absolute/path/to/... placeholders for local checkouts.
Docs
- Getting started
- Configuration
- Crawling
- Docs export
- Providers
- Architecture
- Long-term design
- Contributing
- CHANGELOG
Development
npm install
npm run test:file -- src/mcp.test.ts
npm run checknpm install also installs local git hooks for fast TypeScript checks on commit and the full verification suite on push.
Dev MCP without restarting the host
npm run dev:watch
smart-web-devsmart-web-dev hot-swaps rebuilt dist/*.js modules on each tool call. New implementations update after rebuilds, but brand-new MCP tools or schema changes still require a client reconnect because MCP registration happens at process start.
By default, staging snapshots live under the platform cache root (for example ~/.cache/smart-web/tmp on Linux). Set SMART_WEB_TMP_DIR if you need an explicit override.
License
MIT
References
[^1]: Model Context Protocol, "Tools" specification — MCP tools are a retrieval and integration surface, not a requirement to emulate full browser-automation flows inside one tool. [^2]: Model Context Protocol Blog, "Tool Annotations as Risk Vocabulary: What Hints Can and Can't Do" (2026-03-16) — structured tool output and accurate hints improve host-side routing and escalation decisions.
