oremus-web-search
v0.1.6
Published
MCP server that combines SearXNG web search with Trafilatura extraction
Readme
oremus-web-search
An MCP server that exposes:
web_search: web search via a configurable SearXNG instance (JSON API).fetch_and_extract: main-content extraction via a configurable Trafilatura MCP server (Streamable HTTP).rotate_vpn: asks Trafilatura to rotate its VPN/proxy egress.
This is designed to be run with npx as an MCP server (stdio transport).
Session resilience
fetch_and_extract automatically re-initializes the upstream Trafilatura MCP session and retries once when it receives common stale-session errors (400 missing/no valid session id or 404 session not found).
Client setup (Codex / Claude / others)
- Codex CLI: see “Use in Codex CLI” below.
- Claude Code: see
web-search-mcp/CLAUDE.md:1or copyweb-search-mcp/.mcp.json.example:1to your project as.mcp.json. - Copilot instructions: see
web-search-mcp/.github/copilot-instructions.md:1. - Gemini instructions: see
web-search-mcp/GEMINI.md:1.
Why this exists
- SearXNG is great for finding URLs.
- Trafilatura is great at extracting clean article text and metadata.
- This server provides a single MCP endpoint that combines both.
Install / Run
Option A (recommended): no-token install via GitHub Release tarball
This avoids GitHub Packages auth requirements and “just works” with npx:
SEARXNG_URL="https://search.oremuslabs.app" \\
TRAFILATURA_MCP_URL="https://trafilatura.oremuslabs.app/mcp" \\
npx -y https://github.com/Oremus-Labs/web-search-mcp/releases/latest/download/web-search-mcp.tgzIf you want a pinned version, use the versioned asset under the tag, e.g.:
npx -y https://github.com/Oremus-Labs/web-search-mcp/releases/download/v0.1.1/oremus-labs-web-search-mcp-0.1.1.tgzOption B: npm (no token required)
Once published to the public npm registry, this should work without any auth:
SEARXNG_URL="https://search.oremuslabs.app" \\
TRAFILATURA_MCP_URL="https://trafilatura.oremuslabs.app/mcp" \\
npx -y [email protected]Option C: GitHub Packages
GitHub Packages’ npm registry typically requires authentication (read:packages) to install.
Configuration
Required environment variables:
SEARXNG_URL- Base URL for your SearXNG instance.
- The server calls
${SEARXNG_URL}/search?format=json&.... - You may also set
SEARXNG_URLto the full/searchendpoint.
TRAFILATURA_MCP_URL- Full MCP endpoint URL for Trafilatura (must include the MCP path), e.g.
http://...:8090/mcp.
- Full MCP endpoint URL for Trafilatura (must include the MCP path), e.g.
Optional environment variables:
USER_AGENT(default:oremus-web-search)TRAFILATURA_BEARER_TOKEN(addsAuthorization: Bearer ...when calling Trafilatura MCP)
Tools
web_search
Input (matches the common SearXNG MCP shape):
query(string, required)pageno(number, optional)time_range(day|month|year, optional)language(string, optional)safesearch(0|1|2, optional)
Output:
- A single
textblock formatted as:Title: ...Description: ...URL: ...Relevance Score: ...
fetch_and_extract
Input:
url(string, required)include_comments(boolean, optional)include_tables(boolean, optional)use_proxy(boolean, optional)max_chars(number, optional): cap returned text fieldsstart_char(number, optional): paging offset used withmax_charsplain_text_fallback(boolean, optional): if text/plain + extraction is empty, return raw body as textrewrite_github_blob_to_raw(boolean, optional): rewrite GitHub.../blob/...URLs toraw.githubusercontent.com/...fetch_timeout_seconds(number, optional): per-attempt HTTP timeoutmax_fetch_bytes(number, optional): cap download size (may truncate HTML)max_total_seconds(number, optional): best-effort overall time budget
Output:
- Pass-through of the Trafilatura MCP server tool result (typically a single
textblock containing JSON).
rotate_vpn
Input:
- none
Output:
- Pass-through of the Trafilatura MCP server tool result.
Notes:
- This tool is intentionally exposed through Trafilatura (in-cluster) so you don't need to expose a public REST endpoint for VPN rotation.
- Rotation is disruptive to in-flight requests; only call it when you’re getting blocked/rate-limited.
Local development
cd web-search-mcp
npm install
npm run build
npm run smoke:session
SEARXNG_URL="http://127.0.0.1:18080" TRAFILATURA_MCP_URL="http://127.0.0.1:18090/mcp" npm run inspectorKubernetes access (typical)
If your Trafilatura MCP server is only exposed as an in-cluster Service, run it through a port-forward:
kubectl -n searxng port-forward svc/searxng-trafilatura-mcp 18090:8090Then set:
TRAFILATURA_MCP_URL=http://127.0.0.1:18090/mcp
Use in Codex CLI
Add a server entry to ~/.codex/config.toml:
[mcp_servers.web_search]
command = "npx"
args = ["-y", "https://github.com/Oremus-Labs/web-search-mcp/releases/latest/download/web-search-mcp.tgz"]
env = { "SEARXNG_URL" = "https://search.oremuslabs.app", "TRAFILATURA_MCP_URL" = "https://trafilatura.oremuslabs.app/mcp" }
startup_timeout_sec = 30
tool_timeout_sec = 120If you published to npm and want the simplest setup:
[mcp_servers.web_search]
command = "npx"
args = ["-y", "[email protected]"]
env = { "SEARXNG_URL" = "https://search.oremuslabs.app", "TRAFILATURA_MCP_URL" = "https://trafilatura.oremuslabs.app/mcp" }
startup_timeout_sec = 30
tool_timeout_sec = 120Restart Codex CLI after editing.
Use in Claude Code
Add a server entry to your Claude Code MCP config (commonly .mcp.json in your project root, or wherever you keep your Claude configuration):
Option A (Release tarball)
{
"mcpServers": {
"web-search": {
"command": "npx",
"args": [
"-y",
"https://github.com/Oremus-Labs/web-search-mcp/releases/latest/download/web-search-mcp.tgz"
],
"env": {
"SEARXNG_URL": "https://search.oremuslabs.app",
"TRAFILATURA_MCP_URL": "https://trafilatura.oremuslabs.app/mcp"
}
}
}
}Option B (npm)
{
"mcpServers": {
"web-search": {
"command": "npx",
"args": ["-y", "[email protected]"],
"env": {
"SEARXNG_URL": "https://search.oremuslabs.app",
"TRAFILATURA_MCP_URL": "https://trafilatura.oremuslabs.app/mcp"
}
}
}
}Notes
- This server uses stdio transport (default) so it works with MCP clients that launch subprocesses.
- Trafilatura is called through its MCP Streamable HTTP endpoint; this repo’s Kubernetes deployment exposes it as
svc/searxng-trafilatura-mcpin namespacesearxng.
