mcp-researchpowerpack
v7.1.3
Published
HTTP-first MCP research server: start-research plus raw/smart search and scrape tools — built on mcp-use.
Maintainers
Readme
mcp-researchpowerpack
http mcp server for research. five tools, orientation-first, built for agents that run multi-pass research loops.
ships on mcp-use. tool-level provider
and llm orchestration flows through effect ts service layers for typed
concurrency, typed errors, scoped kernel cleanup, and step timeouts. provider
clients remain small promise/sdk adapters behind those layers. no stdio — http
only.
tools
| tool | what it does | needs |
|------|-------------|-------|
| start-research | returns a goal-tailored brief: primary_branch (reddit / web / both), exact first_call_sequence, 25–50 keyword seeds, iteration hints, gaps to watch, stop criteria. call first every session. | LLM_API_KEY + LLM_BASE_URL + LLM_MODEL for the non-degraded brief (optional — server falls back to a static playbook) |
| raw-web-search | parallel search, up to 50 keywords per call. serper is primary; jina search is the fallback when serper is missing, fails, or returns empty. returns the raw ranked markdown list directly — no llm pass. use for broad discovery, audit trails, and reddit permalink probes via explicit site:reddit.com/r/.../comments. | SERPER_API_KEY or JINA_API_KEY |
| smart-web-search | parallel search, up to 50 keywords per call, plus required extract. same provider order as raw. always runs llm classification and returns tiered markdown (HIGHLY_RELEVANT / MAYBE_RELEVANT / OTHER) + grounded synthesis + gaps + refine suggestions. supports scope: "web" \| "reddit" \| "both". | SERPER_API_KEY or JINA_API_KEY + llm env triple |
| raw-scrape-links | fetch urls in parallel, return full markdown directly. reddit post permalinks route through the reddit api with threaded comments. non-reddit urls hit jina reader first, then jina reader through scrape.do proxy mode, then optional kernel browser rendering for web pages. pdf / docx / pptx / xlsx urls go straight through jina reader. | optional REDDIT_CLIENT_ID / REDDIT_CLIENT_SECRET, SCRAPEDO_API_KEY, JINA_API_KEY, KERNEL_API_KEY |
| smart-scrape-links | same fetch stack as raw scrape, then per-url llm extraction with required extract. returns focused evidence packs with ## Source, ## Matches, ## Not found, and ## Follow-up signals. | raw scrape providers + llm env triple |
also exposes /health (simplified for proxies) and health://status (full
json: planner/extractor reachability, consecutive failure counters, uptime,
active sessions).
workflow
call start-research once per session with your goal. the server returns a
brief that names the first tool to fire (reddit-first for sentiment/migration,
web-first for spec/bug/pricing, both when opinion-heavy and you also need
official sources), the keyword seeds to fan out, and stop criteria.
for search fan-out, think bad → better before calling raw-web-search or
smart-web-search. turn broad phrases like <feature> support, <product>
pricing, <library> bug fix, or <tool> reviews into source-aware probes
like:
site:<official-docs-domain> "<feature>" "<platform-or-version>"site:<vendor-domain> "<product>" pricing "enterprise" OR "free tier""<exact error text>" "<library-or-package>" "<version>" site:github.comsite:reddit.com/r/<community>/comments "<tool>" "migration" OR "regression"
pair the server with the run-research
skill for the full agentic playbook:
npx -y skills add -y -g https://github.com/yigitkonur/skills-by-yigitkonur --skill /run-researchquickstart
# from npm
HOST=127.0.0.1 PORT=3000 npx -y mcp-researchpowerpack
# from source
git clone https://github.com/yigitkonur/mcp-researchpowerpack.git
cd mcp-researchpowerpack
pnpm install && pnpm devpoint your client at http://localhost:3000/mcp:
{
"mcpServers": {
"research-powerpack": {
"url": "http://localhost:3000/mcp"
}
}
}or skip the install entirely and hit the hosted deployment at
https://research-mcp.yigitkonur.com/mcp.
config
copy .env.example, set only what you need. blank/whitespace keys are treated
as absent. missing keys don't crash the server — they disable the affected
capability with a clear error at call time.
server
| var | default | |
|-----|---------|---|
| PORT | 3000 | http port |
| HOST | 127.0.0.1 | bind address; cloud runtimes that set PORT auto-switch to 0.0.0.0. public binds require ALLOWED_ORIGINS, MCP_URL, CSP_URLS, or FLY_APP_NAME |
| ALLOWED_ORIGINS | unset | comma-separated origins for host validation / cors; merged with MCP_URL and platform CSP_URLS when present |
| MCP_URL | unset | public mcp url; contributes its origin to host validation and well-known resource urls |
| CSP_URLS | unset | platform-provided comma-separated public origins; also contributes to host validation, including the derived mcp-use --br-main host |
| FLY_APP_NAME | unset | Fly runtime app name; when present, https://<app>.fly.dev is added to host validation for Manufact's deploy verifier |
| NODE_ENV | unset | production also requires ALLOWED_ORIGINS, MCP_URL, CSP_URLS, or FLY_APP_NAME, even on a local bind |
| DEBUG | unset | 1 or 2 to bump mcp-use debug verbosity |
providers
| var | enables |
|-----|---------|
| SERPER_API_KEY | primary raw/smart web search provider |
| SCRAPEDO_API_KEY | scrape.do proxy-mode retry for jina reader (X-Proxy-Url) |
| REDDIT_CLIENT_ID + REDDIT_CLIENT_SECRET | raw/smart scrape for reddit.com permalinks (threaded post + comments) |
| JINA_API_KEY | jina search fallback and authenticated jina reader requests |
| KERNEL_API_KEY | optional kernel browser-render fallback after jina direct + proxy fail |
| KERNEL_PROJECT | optional kernel project scoping header for org-wide api keys |
| LLM_API_KEY + LLM_BASE_URL + LLM_MODEL | goal-tailored brief, smart-web-search, smart-scrape-links |
llm
any openai-compatible endpoint. LLM_API_KEY, LLM_BASE_URL, and LLM_MODEL
are required together. reasoning effort is hardcoded to low.
| var | required? | |
|-----|-----------|---|
| LLM_API_KEY | yes | api key for the endpoint |
| LLM_BASE_URL | yes | base url for the openai-compatible endpoint (e.g. https://server.up.railway.app/v1) |
| LLM_MODEL | yes | primary model (e.g. gpt-5.4-mini) |
| LLM_FALLBACK_MODEL | no | model to use after primary exhausts retries — gets 3 more attempts (e.g. gpt-5.4). also receives oversized inputs that exceed the primary's context window |
concurrency
all optional. provider limits are clamped 1–200; kernel is clamped 1–20. malformed numeric values are rejected instead of partially parsed.
| var | default | controls |
|-----|---------|----------|
| CONCURRENCY_SEARCH | 50 | parallel serper / jina search queries |
| CONCURRENCY_SCRAPER | 50 | parallel scrape.do (proxy mode) requests |
| CONCURRENCY_JINA_READER | 50 | parallel jina reader fetches |
| CONCURRENCY_REDDIT | 50 | parallel reddit api fetches |
| CONCURRENCY_KERNEL | 3 | parallel kernel browser-render fallbacks |
| LLM_CONCURRENCY | 50 | parallel llm extraction / classification calls |
evals
pnpm test:evals writes a json artifact to test-results/eval-runs/<timestamp>.json.
when an openai api key is present, it runs a live responses-api + remote-mcp
eval. without one, it exits successfully in explicit skip mode and records
the skip in the artifact.
useful env vars:
EVAL_MCP_URLEVAL_MODELEVAL_API_KEYorOPENAI_API_KEY
dev
pnpm install
pnpm dev # watch mode, serves :3000/mcp
pnpm typecheck # tsc --noEmit
pnpm test:unit # deterministic unit tests
pnpm test:http # local http / mcp integration test
pnpm test # unit + http integration tests
pnpm build # compile to dist/
pnpm inspect # mcp-use inspectorprepublishOnly runs pnpm typecheck, pnpm test:unit, pnpm test:http,
and pnpm build, matching the required publish gate.
deploy
deploy to manufact cloud via the mcp-use cli (github-backed):
pnpm deploy # runs the package script: mcp-use deploythe canonical hosted endpoint is https://research-mcp.yigitkonur.com/mcp.
the raw manufact server slug is calm-wave-3gtvb.
or self-host anywhere with node 20.19+ / 22.12+:
HOST=0.0.0.0 ALLOWED_ORIGINS=https://app.example.com pnpm start
# or derive production origin protection from the public mcp url:
NODE_ENV=production MCP_URL=https://research.example.com/mcp pnpm startarchitecture
index.ts server startup, cors, health, shutdown
src/
config/ env parsing, capability detection, lazy proxy config
effect/ typed service tags + Live layers; runExternalEffect()
only accepts fully-provided programs at the async
tool boundary
clients/ provider api clients (serper, jina, kernel, reddit,
scrapedo) — wrapped by Live layers in src/effect/
tools/
registry.ts registerAllTools() — wires the five tools
start-research.ts goal-tailored brief + static playbook + planner
circuit-breaker
search.ts raw/smart search handlers (ctr ranking + optional
llm classification)
scrape.ts raw/smart scrape handlers (reddit api, jina reader,
scrape.do proxy retry, optional kernel, optional
llm extraction)
mcp-helpers.ts markdown response builders
services/
llm-processor.ts llm extraction, classification, brief generation —
primary + fallback model, always low reasoning,
oversized inputs route straight to fallback
markdown-cleaner.ts html/markdown cleanup (readability + turndown)
schemas/ zod v4 input validation per tool
utils/ errors, retry, ctr aggregator, response builders,
logger (stderr-only, mcp-safe)key patterns: capability detection at startup, description-led tool routing
(no bootstrap gate), markdown-only mcp tool output for search/scrape,
raw/smart tool split, tiered classified output in smart-web-search, reddit
api routing in scrape tools, jina reader first for non-reddit urls,
scrape.do proxy-mode retry through X-Proxy-Url, optional kernel
browser-render fallback with scoped session cleanup, bounded concurrency via
Effect.forEach, ctr-based url ranking, tools never throw (always return
toolFailure), and structured errors with retry classification.
license
MIT

