mcp-researchpowerpack

v7.1.3

Published

2 months ago

HTTP-first MCP research server: start-research plus raw/smart search and scrape tools — built on mcp-use.

0High
0Medium
0Low

mcp research reddit serper search ai model-context-protocol web-scraping claude anthropic scraping research-powerpack http-mcp

mcp-researchpowerpack

http mcp server for research. five tools, orientation-first, built for agents that run multi-pass research loops.

ships on mcp-use. tool-level provider and llm orchestration flows through effect ts service layers for typed concurrency, typed errors, scoped kernel cleanup, and step timeouts. provider clients remain small promise/sdk adapters behind those layers. no stdio — http only.

tools

| tool | what it does | needs | |------|-------------|-------| | start-research | returns a goal-tailored brief: primary_branch (reddit / web / both), exact first_call_sequence, 25–50 keyword seeds, iteration hints, gaps to watch, stop criteria. call first every session. | LLM_API_KEY + LLM_BASE_URL + LLM_MODEL for the non-degraded brief (optional — server falls back to a static playbook) | | raw-web-search | parallel search, up to 50 keywords per call. serper is primary; jina search is the fallback when serper is missing, fails, or returns empty. returns the raw ranked markdown list directly — no llm pass. use for broad discovery, audit trails, and reddit permalink probes via explicit site:reddit.com/r/.../comments. | SERPER_API_KEY or JINA_API_KEY | | smart-web-search | parallel search, up to 50 keywords per call, plus required extract. same provider order as raw. always runs llm classification and returns tiered markdown (HIGHLY_RELEVANT / MAYBE_RELEVANT / OTHER) + grounded synthesis + gaps + refine suggestions. supports scope: "web" \| "reddit" \| "both". | SERPER_API_KEY or JINA_API_KEY + llm env triple | | raw-scrape-links | fetch urls in parallel, return full markdown directly. reddit post permalinks route through the reddit api with threaded comments. non-reddit urls hit jina reader first, then jina reader through scrape.do proxy mode, then optional kernel browser rendering for web pages. pdf / docx / pptx / xlsx urls go straight through jina reader. | optional REDDIT_CLIENT_ID / REDDIT_CLIENT_SECRET, SCRAPEDO_API_KEY, JINA_API_KEY, KERNEL_API_KEY | | smart-scrape-links | same fetch stack as raw scrape, then per-url llm extraction with required extract. returns focused evidence packs with ## Source, ## Matches, ## Not found, and ## Follow-up signals. | raw scrape providers + llm env triple |

also exposes /health (simplified for proxies) and health://status (full json: planner/extractor reachability, consecutive failure counters, uptime, active sessions).

workflow

call start-research once per session with your goal. the server returns a brief that names the first tool to fire (reddit-first for sentiment/migration, web-first for spec/bug/pricing, both when opinion-heavy and you also need official sources), the keyword seeds to fan out, and stop criteria.

for search fan-out, think bad → better before calling raw-web-search or smart-web-search. turn broad phrases like <feature> support, <product> pricing, <library> bug fix, or <tool> reviews into source-aware probes like:

site:<official-docs-domain> "<feature>" "<platform-or-version>"
site:<vendor-domain> "<product>" pricing "enterprise" OR "free tier"
"<exact error text>" "<library-or-package>" "<version>" site:github.com
site:reddit.com/r/<community>/comments "<tool>" "migration" OR "regression"

pair the server with the run-research skill for the full agentic playbook:

npx -y skills add -y -g https://github.com/yigitkonur/skills-by-yigitkonur --skill /run-research

quickstart

# from npm
HOST=127.0.0.1 PORT=3000 npx -y mcp-researchpowerpack

# from source
git clone https://github.com/yigitkonur/mcp-researchpowerpack.git
cd mcp-researchpowerpack
pnpm install && pnpm dev

point your client at http://localhost:3000/mcp:

{
  "mcpServers": {
    "research-powerpack": {
      "url": "http://localhost:3000/mcp"
    }
  }
}

or skip the install entirely and hit the hosted deployment at https://research-mcp.yigitkonur.com/mcp.

config

copy .env.example, set only what you need. blank/whitespace keys are treated as absent. missing keys don't crash the server — they disable the affected capability with a clear error at call time.

server

| var | default | | |-----|---------|---| | PORT | 3000 | http port | | HOST | 127.0.0.1 | bind address; cloud runtimes that set PORT auto-switch to 0.0.0.0. public binds require ALLOWED_ORIGINS, MCP_URL, CSP_URLS, or FLY_APP_NAME | | ALLOWED_ORIGINS | unset | comma-separated origins for host validation / cors; merged with MCP_URL and platform CSP_URLS when present | | MCP_URL | unset | public mcp url; contributes its origin to host validation and well-known resource urls | | CSP_URLS | unset | platform-provided comma-separated public origins; also contributes to host validation, including the derived mcp-use --br-main host | | FLY_APP_NAME | unset | Fly runtime app name; when present, https://<app>.fly.dev is added to host validation for Manufact's deploy verifier | | NODE_ENV | unset | production also requires ALLOWED_ORIGINS, MCP_URL, CSP_URLS, or FLY_APP_NAME, even on a local bind | | DEBUG | unset | 1 or 2 to bump mcp-use debug verbosity |

providers

| var | enables | |-----|---------| | SERPER_API_KEY | primary raw/smart web search provider | | SCRAPEDO_API_KEY | scrape.do proxy-mode retry for jina reader (X-Proxy-Url) | | REDDIT_CLIENT_ID + REDDIT_CLIENT_SECRET | raw/smart scrape for reddit.com permalinks (threaded post + comments) | | JINA_API_KEY | jina search fallback and authenticated jina reader requests | | KERNEL_API_KEY | optional kernel browser-render fallback after jina direct + proxy fail | | KERNEL_PROJECT | optional kernel project scoping header for org-wide api keys | | LLM_API_KEY + LLM_BASE_URL + LLM_MODEL | goal-tailored brief, smart-web-search, smart-scrape-links |

llm

any openai-compatible endpoint. LLM_API_KEY, LLM_BASE_URL, and LLM_MODEL are required together. reasoning effort is hardcoded to low.

| var | required? | | |-----|-----------|---| | LLM_API_KEY | yes | api key for the endpoint | | LLM_BASE_URL | yes | base url for the openai-compatible endpoint (e.g. https://server.up.railway.app/v1) | | LLM_MODEL | yes | primary model (e.g. gpt-5.4-mini) | | LLM_FALLBACK_MODEL | no | model to use after primary exhausts retries — gets 3 more attempts (e.g. gpt-5.4). also receives oversized inputs that exceed the primary's context window |

concurrency

all optional. provider limits are clamped 1–200; kernel is clamped 1–20. malformed numeric values are rejected instead of partially parsed.

| var | default | controls | |-----|---------|----------| | CONCURRENCY_SEARCH | 50 | parallel serper / jina search queries | | CONCURRENCY_SCRAPER | 50 | parallel scrape.do (proxy mode) requests | | CONCURRENCY_JINA_READER | 50 | parallel jina reader fetches | | CONCURRENCY_REDDIT | 50 | parallel reddit api fetches | | CONCURRENCY_KERNEL | 3 | parallel kernel browser-render fallbacks | | LLM_CONCURRENCY | 50 | parallel llm extraction / classification calls |

evals

pnpm test:evals writes a json artifact to test-results/eval-runs/<timestamp>.json. when an openai api key is present, it runs a live responses-api + remote-mcp eval. without one, it exits successfully in explicit skip mode and records the skip in the artifact.

useful env vars:

EVAL_MCP_URL
EVAL_MODEL
EVAL_API_KEY or OPENAI_API_KEY

dev

pnpm install
pnpm dev          # watch mode, serves :3000/mcp
pnpm typecheck    # tsc --noEmit
pnpm test:unit    # deterministic unit tests
pnpm test:http    # local http / mcp integration test
pnpm test         # unit + http integration tests
pnpm build        # compile to dist/
pnpm inspect      # mcp-use inspector

prepublishOnly runs pnpm typecheck, pnpm test:unit, pnpm test:http, and pnpm build, matching the required publish gate.

deploy

deploy to manufact cloud via the mcp-use cli (github-backed):

pnpm deploy       # runs the package script: mcp-use deploy

the canonical hosted endpoint is https://research-mcp.yigitkonur.com/mcp. the raw manufact server slug is calm-wave-3gtvb.

or self-host anywhere with node 20.19+ / 22.12+:

HOST=0.0.0.0 ALLOWED_ORIGINS=https://app.example.com pnpm start
# or derive production origin protection from the public mcp url:
NODE_ENV=production MCP_URL=https://research.example.com/mcp pnpm start

architecture

index.ts                 server startup, cors, health, shutdown
src/
  config/                env parsing, capability detection, lazy proxy config
  effect/                typed service tags + Live layers; runExternalEffect()
                         only accepts fully-provided programs at the async
                         tool boundary
  clients/               provider api clients (serper, jina, kernel, reddit,
                         scrapedo) — wrapped by Live layers in src/effect/
  tools/
    registry.ts          registerAllTools() — wires the five tools
    start-research.ts    goal-tailored brief + static playbook + planner
                         circuit-breaker
    search.ts            raw/smart search handlers (ctr ranking + optional
                         llm classification)
    scrape.ts            raw/smart scrape handlers (reddit api, jina reader,
                         scrape.do proxy retry, optional kernel, optional
                         llm extraction)
    mcp-helpers.ts       markdown response builders
  services/
    llm-processor.ts     llm extraction, classification, brief generation —
                         primary + fallback model, always low reasoning,
                         oversized inputs route straight to fallback
    markdown-cleaner.ts  html/markdown cleanup (readability + turndown)
  schemas/               zod v4 input validation per tool
  utils/                 errors, retry, ctr aggregator, response builders,
                         logger (stderr-only, mcp-safe)

key patterns: capability detection at startup, description-led tool routing (no bootstrap gate), markdown-only mcp tool output for search/scrape, raw/smart tool split, tiered classified output in smart-web-search, reddit api routing in scrape tools, jina reader first for non-reddit urls, scrape.do proxy-mode retry through X-Proxy-Url, optional kernel browser-render fallback with scoped session cleanup, bounded concurrency via Effect.forEach, ctr-based url ranking, tools never throw (always return toolFailure), and structured errors with retry classification.

license

MIT

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

mcp-researchpowerpack

tools

workflow

quickstart

config

server

providers

llm

concurrency

evals

dev

deploy

architecture

license