gpt-research
v1.0.1
Published
Autonomous AI research agent that conducts comprehensive research on any topic and generates detailed reports with citations
Maintainers
Readme
GPT Research
🔍 GPT Research is an autonomous AI research agent that conducts comprehensive research on any topic, searches the web for real-time information, and generates detailed reports with proper citations.
Built with TypeScript and optimized for both local development and serverless deployment (Vercel, AWS Lambda, etc.).
✨ Features
- 🔍 Multi-source Research: Integrates multiple search providers:
- Tavily - AI-optimized search engine
- Serper - Google Search API (2,500 free searches/month)
- Google Custom Search - Direct Google integration
- DuckDuckGo - Privacy-focused search
- 🌐 Smart Web Scraping: Cheerio and Puppeteer for content extraction
- 🤖 Multiple LLM Support: OpenAI, Anthropic, Google AI, Groq, and more
- 🔌 MCP Integration: Model Context Protocol for external tool connections
- 📊 Various Report Types: Research, Detailed, Summary, Resource, Outline
- 🔄 Streaming Support: Real-time updates via Server-Sent Events
- ⚡ Vercel Optimized: Built for serverless deployment
- 💾 Memory Management: Tracks research context and history
- 💰 Cost Tracking: Monitor LLM usage and costs
🚀 Quick Start
Installation
npm install gpt-research
# or
yarn add gpt-research
# or
pnpm add gpt-researchConfiguration
Create a .env file in the root directory:
# Required
OPENAI_API_KEY=your-openai-api-key
# Optional Search Providers (at least one recommended)
TAVILY_API_KEY=your-tavily-api-key # https://tavily.com (best for AI research)
SERPER_API_KEY=your-serper-api-key # https://serper.dev (Google search, 2,500 free/month)
GOOGLE_API_KEY=your-google-api-key # Google Custom Search
GOOGLE_CX=your-google-custom-search-engine-id
# Optional LLM Providers
ANTHROPIC_API_KEY=your-anthropic-api-key
GOOGLE_AI_API_KEY=your-google-ai-api-key
GROQ_API_KEY=your-groq-api-keyBasic Usage
const { GPTResearch } = require('gpt-research');
// or for TypeScript/ES modules:
// import { GPTResearch } from 'gpt-research';
async function main() {
const researcher = new GPTResearch({
query: 'What are the latest developments in quantum computing?',
reportType: 'research_report',
llmProvider: 'openai',
apiKeys: {
openai: process.env.OPENAI_API_KEY,
tavily: process.env.TAVILY_API_KEY
}
});
// Conduct research
const result = await researcher.conductResearch();
console.log(result.report);
console.log(`Sources used: ${result.sources.length}`);
console.log(`Cost: $${result.costs.total.toFixed(4)}`);
}
main().catch(console.error);Streaming Research
const researcher = new GPTResearch(config);
// Stream research updates in real-time
for await (const update of researcher.streamResearch()) {
switch (update.type) {
case 'progress':
console.log(`[${update.progress}%] ${update.message}`);
break;
case 'data':
if (update.data?.reportChunk) {
process.stdout.write(update.data.reportChunk);
}
break;
case 'complete':
console.log('\nResearch complete!');
break;
}
}🔧 Configuration Options
interface ResearchConfig {
// Required
query: string; // Research query
// Report Configuration
reportType?: ReportType; // Type of report to generate
reportFormat?: ReportFormat; // Output format (markdown, pdf, docx)
tone?: Tone; // Writing tone
// LLM Configuration
llmProvider?: string; // LLM provider (openai, anthropic, etc.)
smartLLMModel?: string; // Model for complex tasks
fastLLMModel?: string; // Model for simple tasks
temperature?: number; // Generation temperature
maxTokens?: number; // Max tokens per generation
// Search Configuration
defaultRetriever?: string; // Default search provider
maxSearchResults?: number; // Max results per search
// Scraping Configuration
defaultScraper?: string; // Default scraper (cheerio, puppeteer)
scrapingConcurrency?: number; // Concurrent scraping operations
// API Keys
apiKeys?: {
openai?: string;
tavily?: string;
serper?: string;
google?: string;
anthropic?: string;
groq?: string;
};
}📋 Report Types
- ResearchReport: Comprehensive research with citations
- DetailedReport: In-depth analysis with extensive coverage
- QuickSummary: Concise overview of key points
- ResourceReport: Curated list of resources and references
- OutlineReport: Structured outline for further research
🔍 Search Providers
Available Providers
| Provider | Best For | Free Tier | API Key Required | |----------|----------|-----------|------------------| | Tavily | AI-optimized research | 1,000/month | Yes - Get Key | | Serper | Google search results | 2,500/month | Yes - Get Key | | Google | Custom search | 100/day | Yes - Setup | | DuckDuckGo | Privacy-focused | Unlimited | No |
Choosing the Right Provider
- Tavily: Best for AI research, academic papers, technical topics
- Serper: Best for current events, general web search, Google quality
- Google Custom Search: Best for specific domains, controlled results
- DuckDuckGo: Best for privacy-sensitive research, no API needed
Using Multiple Providers
// Configure multiple providers for redundancy
const researcher = new GPTResearch({
query: 'Your research topic',
retrievers: ['tavily', 'serper'], // Falls back if one fails
apiKeys: {
tavily: process.env.TAVILY_API_KEY,
serper: process.env.SERPER_API_KEY
}
});🔌 MCP (Model Context Protocol) Support
GPT Research now supports MCP for connecting to external tools and services!
What is MCP?
MCP (Model Context Protocol) is a standardized protocol for connecting AI systems to external tools and data sources. It enables seamless integration with various services through a unified interface.
MCP Features
- Stdio MCP Servers - Local process spawning for NPX/binary tools (Node.js/Docker/VPS)
- HTTP MCP Servers - RESTful API connections (works everywhere including Vercel)
- WebSocket MCP - Real-time bidirectional communication (works everywhere)
- Tool Discovery - Automatic discovery of available tools from all server types
- Smart Selection - AI-powered tool selection based on research query
- Streaming Updates - Real-time progress tracking via SSE
- Mixed Mode - Combine stdio, HTTP, and WebSocket in the same application
MCP Usage Examples
HTTP/WebSocket MCP (Works everywhere including Vercel)
const researcher = new GPTResearch({
query: "Latest AI developments",
mcpConfigs: [
{
name: "research-tools",
connectionType: "http",
connectionUrl: "https://mcp.example.com",
connectionToken: process.env.MCP_TOKEN
}
],
useMCP: true
});Stdio MCP (Local tools - Node.js environments)
const researcher = new GPTResearch({
query: "Analyze this codebase",
mcpConfigs: [
{
name: "filesystem",
connectionType: "stdio",
command: "npx",
args: ["@modelcontextprotocol/filesystem-server"],
env: { READ_ONLY: "false" }
},
{
name: "git",
connectionType: "stdio",
command: "git-mcp",
args: ["--repo", "."]
}
]
});Mixed Mode (Combine all connection types)
const researcher = new GPTResearch({
query: "Research topic",
mcpConfigs: [
// Local tools via stdio
{ name: "local-fs", connectionType: "stdio", command: "npx", args: ["fs-mcp"] },
// Remote API via HTTP
{ name: "api", connectionType: "http", connectionUrl: "https://api.example.com/mcp" },
// Real-time via WebSocket
{ name: "stream", connectionType: "websocket", connectionUrl: "wss://realtime.example.com" }
]
});MCP Deployment Compatibility
| MCP Type | Local/Node.js | Vercel | Docker | VPS/Cloud | |----------|---------------|--------|---------|-----------| | HTTP Servers | ✅ Full | ✅ Full | ✅ Full | ✅ Full | | WebSocket | ✅ Full | ✅ Full | ✅ Full | ✅ Full | | Stdio | ✅ Full | ❌ Not Supported | ✅ Full | ✅ Full |
Stdio MCP Notes:
- Works perfectly in Node.js, Docker, VPS, and self-hosted environments
- Not supported on Vercel, AWS Lambda, or other serverless platforms
- For serverless deployments, use HTTP/WebSocket MCP or deploy a proxy server
Popular Stdio MCP Servers
These MCP servers can be run locally via stdio:
# File System Access
npx @modelcontextprotocol/filesystem-server
# Git Repository Tools
npx @modelcontextprotocol/git-server
# Database Query Execution
npm install -g mcp-database
mcp-database
# Custom Python MCP Server
python -m mcp.server
# Shell Command Execution
cargo install mcp-shell
mcp-shellLearn More
- See
examples/demo-mcp.jsfor HTTP/WebSocket demo - See
examples/demo-mcp-stdio.jsfor stdio demo - Read
MCP.mdfor implementation details - Check MCP Specification for protocol docs
🌐 Vercel Deployment
API Routes
Create API routes in your Next.js/Vercel project:
// api/research/route.js
import { GPTResearch } from 'gpt-research';
export async function POST(request) {
const { query, reportType } = await request.json();
const researcher = new GPTResearch({
query,
reportType,
apiKeys: {
openai: process.env.OPENAI_API_KEY,
tavily: process.env.TAVILY_API_KEY
}
});
const result = await researcher.conductResearch();
return Response.json(result);
}Streaming API
// api/research/stream/route.js
export async function POST(request) {
const { query } = await request.json();
const stream = new ReadableStream({
async start(controller) {
const researcher = new GPTResearch({ query });
for await (const update of researcher.streamResearch()) {
controller.enqueue(
`data: ${JSON.stringify(update)}\n\n`
);
}
controller.close();
}
});
return new Response(stream, {
headers: {
'Content-Type': 'text/event-stream',
'Cache-Control': 'no-cache',
'Connection': 'keep-alive'
}
});
}Environment Variables
Add to your Vercel project settings:
OPENAI_API_KEY=your-key
TAVILY_API_KEY=your-key
SERPER_API_KEY=your-key🧪 Examples
# Basic example
npm run example
# OpenAI-only example (no web search)
npm run example:simple
# Full research with Tavily web search
npm run example:tavily
# Research using Serper (Google Search API)
npm run example:serperCheck the examples/ directory for more detailed usage examples.
📚 Documentation
Quick Links
🎯 Use Cases
- Market Research: Analyze competitors, trends, and market opportunities
- Academic Research: Gather and synthesize information for papers and studies
- Content Creation: Research topics thoroughly for articles and blog posts
- Technical Documentation: Research technical topics and generate comprehensive guides
- Due Diligence: Conduct thorough research on companies, people, or topics
- News Aggregation: Gather and summarize news from multiple sources
🤝 Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
📝 License
MIT License - see LICENSE file for details.
📊 Performance Considerations
- Token Limits: Automatically manages context within token limits
- Concurrent Operations: Configurable concurrency for searches and scraping
- Cost Optimization: Uses appropriate models for different tasks
- Caching: Caches scraped content to avoid redundant operations
- Memory Management: Efficient in-memory storage with export/import capabilities
🔐 Security
- API Key Management: Never commit API keys to version control
- Input Validation: All URLs and inputs are validated
- Rate Limiting: Built-in rate limiting for API calls
- Error Handling: Comprehensive error handling and recovery
🎯 Roadmap
- [ ] Add multi-language support
- [ ] Add more LLM providers (Cohere, Together AI)
- [ ] Implement research templates
- [ ] Add PDF and DOCX report export
💡 Tips
- Use Tavily for best results - It's specifically designed for AI research
- Configure multiple search providers - Automatic fallback ensures reliability
- Adjust concurrency based on your limits - Prevent rate limiting
- Use streaming for long research - Better user experience
- Monitor costs - Track LLM usage to manage expenses
🆘 Troubleshooting
Common Issues
Build Errors: Make sure you have Node.js 18+ and run npm install
API Key Errors: Verify your API keys are correct in .env
Rate Limiting: Reduce scrapingConcurrency and maxSearchResults
Memory Issues: For large research, increase Node.js memory:
node --max-old-space-size=4096 your-script.js📧 Support
- Issues: GitHub Issues
⭐ Show Your Support
If you find GPT Research helpful, please consider:
- Giving us a star on GitHub
- Sharing with your network
- Contributing to the project
