mcp-researchpowerpack
v3.9.5
Published
The ultimate research MCP toolkit: Reddit mining, web search with CTR aggregation, AI-powered deep research, and intelligent web scraping - all in one modular package
Maintainers
Readme
An MCP server that gives Claude, Cursor, Windsurf, and any MCP-compatible AI assistant a complete research toolkit. Google search, Reddit deep-dives, web scraping with AI extraction, and multi-model deep research — all as tools that chain into each other.
Zero config to start. Each API key you add unlocks more capabilities.
Tools
| Tool | What it does | Requires |
|:-----|:-------------|:---------|
| web_search | Parallel Google search across 3–100 keywords with CTR-weighted ranking and consensus detection | SERPER_API_KEY |
| search_reddit | Same search engine filtered to reddit.com — 10–50 queries in parallel | SERPER_API_KEY |
| get_reddit_post | Fetch 2–50 Reddit posts with full comment trees, smart comment budget allocation | REDDIT_CLIENT_ID + REDDIT_CLIENT_SECRET |
| scrape_links | Scrape 1–50 URLs with JS rendering fallback, HTML→Markdown, optional AI extraction | SCRAPEDO_API_KEY |
| deep_research | Send questions to research-capable models (Grok, Gemini) with web search, file attachments | OPENROUTER_API_KEY |
Tools are designed to chain: web_search → scrape_links → search_reddit → get_reddit_post → deep_research for synthesis. Each tool suggests the next logical step in its output.
Quick Start
Claude Desktop / Claude Code
Add to your MCP config (~/Library/Application Support/Claude/claude_desktop_config.json):
{
"mcpServers": {
"research-powerpack": {
"command": "npx",
"args": ["-y", "mcp-research-powerpack"],
"env": {
"SERPER_API_KEY": "your-key-here",
"OPENROUTER_API_KEY": "your-key-here"
}
}
}
}Cursor
Add to .cursor/mcp.json in your project:
{
"mcpServers": {
"research-powerpack": {
"command": "npx",
"args": ["-y", "mcp-research-powerpack"],
"env": {
"SERPER_API_KEY": "your-key-here"
}
}
}
}From Source
git clone https://github.com/yigitkonur/mcp-research-powerpack.git
cd mcp-research-powerpack
pnpm install && pnpm build
pnpm startHTTP Transport
MCP_TRANSPORT=http MCP_PORT=3000 npx mcp-research-powerpackExposes /mcp endpoint (POST/GET/DELETE with session headers) and /health.
API Keys
Each key unlocks a capability. Missing keys silently disable their tools — the server never crashes.
| Variable | Enables | Free Tier |
|:---------|:--------|:----------|
| SERPER_API_KEY | web_search, search_reddit | 2,500 searches/mo — serper.dev |
| REDDIT_CLIENT_ID + REDDIT_CLIENT_SECRET | get_reddit_post | Unlimited — reddit.com/prefs/apps (script type) |
| SCRAPEDO_API_KEY | scrape_links | 1,000 credits/mo — scrape.do |
| OPENROUTER_API_KEY | deep_research, LLM extraction | Pay-per-token — openrouter.ai |
| CEREBRAS_API_KEY | Cerebras LLM extraction | — |
| USE_CEREBRAS | Enable Cerebras for extraction (set true) | false |
Configuration
Optional tuning via environment variables:
| Variable | Default | Description |
|:---------|:--------|:------------|
| RESEARCH_MODEL | x-ai/grok-4-fast | Primary deep research model |
| RESEARCH_FALLBACK_MODEL | google/gemini-2.5-flash | Fallback when primary fails |
| LLM_EXTRACTION_MODEL | openai/gpt-oss-120b:nitro | Model for scrape/reddit AI extraction |
| DEFAULT_REASONING_EFFORT | high | Research depth: low, medium, high |
| DEFAULT_MAX_URLS | 100 | Max search results per research question (10–200) |
| API_TIMEOUT_MS | 1800000 | Request timeout in ms (default: 30 min) |
| MCP_TRANSPORT | stdio | Transport mode: stdio or http |
| MCP_PORT | 3000 | Port for HTTP mode |
| USE_CEREBRAS | false | Set to true to use Cerebras for extraction instead of OpenRouter |
| CEREBRAS_API_KEY | — | API key for Cerebras cloud — cloud.cerebras.ai |
Cerebras Support
When USE_CEREBRAS=true and CEREBRAS_API_KEY are set, the scrape_links tool uses Cerebras (Z.ai GLM 4.7) for AI content extraction instead of OpenRouter. This provides:
- Ultra-fast extraction — Cerebras inference is optimized for speed
- Independent from OpenRouter — extraction works even without
OPENROUTER_API_KEY - Automatic fallback — if Cerebras is not configured, falls back to OpenRouter
# Enable Cerebras for extraction
USE_CEREBRAS=true CEREBRAS_API_KEY=your-key npx mcp-research-powerpackNetwork Resilience
All LLM API calls include built-in stability protections:
- Request deadlines — hard timeout prevents calls from hanging indefinitely
- Stall detection — if no response arrives within a threshold, the request is aborted and retried
- Exponential backoff — transient failures (429, 5xx) retry with jitter to avoid thundering herd
- Connection loss recovery — network errors (ECONNRESET, ECONNREFUSED) trigger automatic retry
- Graceful degradation — all tools return structured errors instead of crashing
How It Works
Search Ranking
Results from multiple queries are deduplicated by normalized URL and scored using CTR-weighted position values (position 1 = 100.0, position 10 = 12.56). URLs appearing across multiple queries get a consensus marker. Frequency threshold starts at ≥3, falls back to ≥2, then ≥1 to ensure results.
Reddit Comment Budget
Global budget of 1,000 comments, max 200 per post. After the first pass, surplus from posts with fewer comments is redistributed to truncated posts in a second fetch pass.
Scraping Pipeline
Three-mode fallback per URL: basic → JS rendering → JS + US geo-targeting. Results go through HTML→Markdown conversion (Turndown), then optional AI extraction with a 100K char input cap and 8,000 token output per URL.
Deep Research
32,000 token budget divided across questions (1 question = 32K, 10 questions = 3.2K each). Gemini models get google_search tool access. Grok/Perplexity get search_parameters with citations. Primary model fails → automatic fallback to secondary model.
File Attachments
deep_research can read local files and include them as context. Files over 600 lines are smart-truncated (first 500 + last 100 lines). Line ranges supported. Line numbers preserved in output.
Concurrency
| Operation | Parallel Limit | |:----------|:---------------| | Web search keywords | 8 | | Reddit search queries | 8 | | Reddit post fetches per batch | 5 (batches of 10) | | URL scraping per batch | 10 (batches of 30) | | LLM extraction | 3 | | Deep research questions | 3 |
All clients use manual retry with exponential backoff and jitter. The OpenAI SDK's built-in retry is disabled (maxRetries: 0).
Architecture
src/
├── index.ts Entry point — STDIO + HTTP transport, graceful shutdown
├── worker.ts Cloudflare Workers entry (Durable Objects)
├── config/
│ ├── index.ts Env parsing, capability detection, lazy Proxy config
│ ├── loader.ts YAML → Zod → JSON Schema pipeline
│ └── yaml/tools.yaml Single source of truth for tool definitions
├── schemas/ Zod input validation (deep-research, scrape-links, web-search)
├── tools/
│ ├── registry.ts Tool lookup → capability check → validate → execute
│ ├── search.ts web_search handler
│ ├── reddit.ts search_reddit + get_reddit_post handlers
│ ├── scrape.ts scrape_links handler
│ └── research.ts deep_research handler
├── clients/
│ ├── search.ts Google Serper API client
│ ├── reddit.ts Reddit OAuth + comment tree parser
│ ├── scraper.ts Scrape.do client with fallback modes
│ └── research.ts OpenRouter client with model-specific handling
├── services/
│ ├── llm-processor.ts Shared LLM extraction (singleton OpenAI client)
│ ├── markdown-cleaner.ts HTML → Markdown via Turndown
│ └── file-attachment.ts Local file reading with line ranges
└── utils/
├── retry.ts Shared backoff + retry constants
├── concurrency.ts Bounded parallel execution (pMap, pMapSettled)
├── url-aggregator.ts CTR-weighted scoring + consensus detection
├── errors.ts Error classification + structured errors
├── logger.ts MCP logging protocol
└── response.ts Standardized 70/20/10 output formattingDeploy
Cloudflare Workers
npx wrangler deployUses Durable Objects with SQLite storage. YAML-based tool definitions are replaced with inline definitions since there's no filesystem in Workers.
npm
Published as mcp-research-powerpack. Binary names: mcp-research-powerpack, research-powerpack-mcp.
Development
pnpm install # Install dependencies
pnpm dev # Run with tsx (live TypeScript)
pnpm build # Compile to dist/
pnpm typecheck # Type-check without emitting
pnpm start # Run compiled outputTesting
pnpm test:web-search # Test web search tool
pnpm test:reddit-search # Test Reddit search
pnpm test:scrape-links # Test scraping
pnpm test:deep-research # Test deep research
pnpm test:all # Run all tests
pnpm test:check # Check environment setupContributing
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Make your changes
- Run
pnpm typecheck && pnpm buildto verify - Commit (
git commit -m 'feat: add amazing feature') - Push to your branch (
git push origin feature/amazing-feature) - Open a Pull Request
