deepsearcher

v0.2.4

Published

2 months ago

DeepResearch Agent with LangGraph, use any llm and web search to build your own deep research agent

0High
0Medium
0Low

zac_ma

langgraph research deepresearch agent deep research ai agent llm web search RAG

🤖 DeepResearch Agent with LangGraph🦜🕸️

DeepResearch Agent with LangGraph, using any LLM models, search engine, RAG retrieval.

[!NOTE] This package is part of the SearChat monorepo. The original standalone repository is archived at deepresearch.
The code logic referenced Google's Gemini LangGraph Project.

A LangGraph-powered research agent that performs comprehensive web research through dynamic query generation, iterative refinement, and citation-supported responses using any LLM model, search engine, or RAG retrieval.

# Node.js
npm install deepsearcher # or yarn add deepsearcher

Features

🧠 Deep Research Agent based on LangGraph
🔍 Dynamic search query generation using any LLM model
🌐 Integrated web research through web search engines or RAG retrieval
🤔 Reflective reasoning, identifying knowledge gaps and optimizing searches
📄 Generate cited answers based on collected sources

🚀 Getting Started

import { DeepResearch, type SearcherFunction } from 'deepsearcher';

// Search engine adapter
const searcher: SearcherFunction = ({ query, id }) => {
  // You need to provide a searcher function
  // which can be a web search engine or a RAG retrieval function
}

const instance = new DeepResearch({
  searcher,
  // LLM Provider Options
  options: {
    type: 'openai', // 'openai' | 'anthropic' | 'gemini' | 'vertexai'
    apiKey: 'YOUR_API_KEY',
    baseURL: 'https://api.openai.com/v1', // Optional, for custom endpoints
    systemPrompt: 'You are a helpful research assistant.', // Optional, default provided
    temperature: 0.1, // Optional, default 0.1, controls randomness (0.0-2.0)
    enableCitationUrl: true, // Optional, default true, controls citation format
  },
});

// langgraph compile
const agent = await instance.compile();

// debug (optional)
agent.debug = true;

// use the stream() method. 
const chunks = await agent.stream(
  {
    messages: [{
      role: 'user',
      content: 'How to use LangGraph to build intelligent agents?'
    }],
  },
  {
    streamMode: 'updates',
    // runtime configuration
    configurable: {
      maxResearchLoops: 3, // default 3.
      numberOfInitialQueries: 3, // default 3.
      // Required model parameters (can use same model)
      queryGeneratorModel: 'gpt-4o-mini',
      reflectionModel: 'gpt-4o-mini',
      answerModel: 'gpt-4o-mini',
    },
  }
);

for await (const chunk of chunks) {
  console.log('chunk', chunk);
}

Multi-Provider Support

The package supports multiple LLM providers:

OpenAI

const instance = new DeepResearch({
  searcher,
  options: {
    type: 'openai',
    apiKey: 'YOUR_OPENAI_API_KEY',
    temperature: 0.1,
  },
});

Anthropic (Claude)

const instance = new DeepResearch({
  searcher,
  options: {
    type: 'anthropic',
    apiKey: 'YOUR_ANTHROPIC_API_KEY',
    temperature: 0.1,
  },
});

Google Gemini

const instance = new DeepResearch({
  searcher,
  options: {
    type: 'gemini',
    apiKey: 'YOUR_GOOGLE_API_KEY',
    temperature: 0.1,
  },
});

Google VertexAI

const instance = new DeepResearch({
  searcher,
  options: {
    type: 'vertexai',
    apiKey: 'YOUR_VERTEXAI_API_KEY',
    temperature: 0.1,
  },
});

Configuration Options

DeepResearchOptions

| Option | Type | Default | Description | |--------|------|---------|-------------| | type | 'openai' \| 'anthropic' \| 'gemini' \| 'vertexai' | 'openai' | LLM provider to use | | apiKey | string | - | API key for the LLM provider | | baseURL | string | - | Custom API endpoint (optional) | | systemPrompt | string | 'You are a helpful research assistant.' | System prompt for the agent | | temperature | number | 0.1 | Controls randomness (0.0 = deterministic, 2.0 = very random) | | enableCitationUrl | boolean | true | Enable URL format in citations (see Citation Formats) |

Runtime Configuration

| Option | Type | Default | Description | |--------|------|---------|-------------| | maxResearchLoops | number | 3 | Maximum number of research iterations | | numberOfInitialQueries | number | 3 | Number of initial search queries to generate | | queryGeneratorModel | string | - | Model for generating search queries | | reflectionModel | string | - | Model for analyzing research gaps | | answerModel | string | - | Model for generating final answer |

Citation Formats

The agent supports two citation formats controlled by the enableCitationUrl option:

URL Format (Default)

When enableCitationUrl: true (default), citations are formatted with clickable URLs:

According to recent studies<sup>[[1](https://example.com/source1)]</sup>, the technology has improved.

Output format:

With URL: [[id](url)]
Without URL: [[id]]

Example:

const instance = new DeepResearch({
  searcher,
  options: {
    type: 'openai',
    apiKey: 'YOUR_API_KEY',
    enableCitationUrl: true, // Default behavior
  },
});

Simple Format

When enableCitationUrl: false, citations use a simple bracket format:

According to recent studies[[citation:1]], the technology has improved.

Output format: [[citation:id]]

Example:

const instance = new DeepResearch({
  searcher,
  options: {
    type: 'openai',
    apiKey: 'YOUR_API_KEY',
    enableCitationUrl: false, // Use simple citation format
  },
});

Citation Format Comparison

| enableCitationUrl | Output Format | Example | |---------------------|------------------------------|-------------------------------------------| | true (default) | [[id](url)] | [[1](https://example.com)] | | false | [[citation:id]] | [[citation:1]] |

Note: The AI model generates citations in the format [[citation:1]], [[citation:2]], etc. The getCitations function then transforms these into the final output format based on the enableCitationUrl setting.

How to stream from the target node

// use the stream() method with streamMode: 'messages'
const stream = await agent.stream(
  {
    messages: [{
      role: 'user',
      content: 'How to use LangGraph to build intelligent agents?'
    }],
  },
  {
    streamMode: 'messages',
    configurable: {
      maxResearchLoops: 2,
      numberOfInitialQueries: 2,
      queryGeneratorModel: 'gpt-4o-mini',
      reflectionModel: 'gpt-4o-mini',
      answerModel: 'gpt-4o-mini',
    },
  }
);

for await (const chunk of stream) {
  const [message, metadata] = chunk;
  // Stream tokens from the 'FinalizeAnswer' node
  if (metadata.langgraph_node === NodeEnum.FinalizeAnswer) {
    console.log(message.content);
  }
}

Below are the definitions of nodes and commonly used event names:

export enum NodeEnum {
  GenerateQuery = 'generate_query',
  Research = 'research',
  Reflection = 'reflection',
  FinalizeAnswer = 'finalize_answer',
}

For detailed usage documentation, please refer to: streaming-from-final-node

How the Agent Works

agent

Generate Initial Queries: Based on your input, it generates a set of initial search queries using an LLM model.
Research: For each query, it uses the LLM model with the Search API (SearcherFunction) to find relevant knowledge.
Reflection & Knowledge Gap Analysis: The agent analyzes the search results to determine if the information is sufficient or if there are knowledge gaps. It uses an LLM model for this reflection process.
Iterative Refinement: If gaps are found or the information is insufficient, it generates follow-up queries and repeats the research and reflection steps (up to a configured maximum number of loops).
Finalize Answer: Once the research is deemed sufficient, the agent synthesizes the gathered information into a coherent answer, including citations from the sources, using an LLM model.

License

This project is licensed under the Apache License 2.0. See the LICENSE file for details.