graph-of-thought
v1.0.2
Published
Advanced Graph-of-Thought implementation with BMSSP algorithm for vectorless document indexing and retrieval. Features selective node activation for efficient processing. Works standalone without API keys or external services. Includes backward-compatible
Downloads
43
Maintainers
Readme
Graph-of-Thought (GoT)
Advanced Document Indexing and Retrieval with Graph-Based Intelligence
Graph-of-Thought is an open-source Node.js library for document indexing and retrieval that works completely standalone - no API keys, no external services, no vector databases required. It combines both Tree-of-Thought (hierarchical) and Graph-of-Thought (networked) approaches with automatic mode selection based on document complexity.
The Problem It Solves
Traditional RAG systems face several challenges:
- Vector Database Dependency: Require expensive vector databases and embeddings
- Poor Cross-Reference Handling: Cannot understand relationships between document sections
- Inefficient Processing: Process all content even when only small portions are relevant
- Setup Complexity: Need complex infrastructure with API keys and external services
Graph-of-Thought solves these by providing vectorless, intelligent document indexing that understands document structure and relationships natively.
RAG System Integration
Graph-of-Thought focuses on the retrieval component of RAG systems:
graph LR
A[User Query] --> B[GoT Index]
B --> C[Relevant Context]
C --> D[LLM Generation]
D --> E[Response]
A -->|"How do I..."| A1(( ))
B -->|Document Structure| B1(( ))
C -->|"Installation, Configuration"| C1(( ))
D -->|"Based on the context..."| D1(( ))
E -->|"To install, run npm install"| E1(( ))
style A fill:#2196f3,color:#ffffff
style B fill:#4caf50,color:#ffffff
style C fill:#ff9800,color:#000000
style D fill:#9c27b0,color:#ffffff
style E fill:#f44336,color:#ffffffGoT replaces traditional vector-based retrieval with intelligent graph/tree traversal, providing better context with zero external dependencies.
Key Features
- Dual Architecture: Tree-based (hierarchical) and Graph-based (networked) indexing
- Automatic Mode Selection: Chooses between tree and graph modes based on document complexity
- Graph Intelligence: Advanced BMSSP (Bounded Multi-Source Shortest Path) algorithm for cross-reference understanding
- Zero Dependencies on External Services - Works offline, no API keys needed
- No Vector Database Required - Uses intelligent graph/tree structures instead
- Selective Node Activation - Only activates nodes relevant to queries, not all nodes at once
- Optimized Retrieval - Tree-like activation patterns within graph structure for efficiency
- Simple Integration - Drop into any existing RAG system
- Open Source - MIT licensed, use it however you want
- Optional LLM Enhancement - Add LLM providers for smarter retrieval (optional)
Installation
npm install graph-of-thoughtThat's it. No API keys to configure, no services to set up.
Quick Start (No API Keys!)
Using Graph-of-Thought (Recommended)
import { GraphOfThought } from 'graph-of-thought';
// Create instance - NO configuration needed (automatically selects best approach)
const got = new GraphOfThought();
// Index your document (auto-selects tree or graph mode based on complexity)
const index = await got.index(`
# User Guide
## Installation
Run npm install to get started.
## Configuration
Edit config.json to customize settings.
## Usage
Import the module and call the main function.
## See Also
For advanced features, see the Advanced Features section.
`, 'User Guide');
// Retrieve with cross-reference understanding
const result = await got.retrieve(index, 'How do I configure after installation?');
console.log(result.context);
// Output includes both Configuration and Installation sections due to cross-referencesUsing Tree-of-Thought (Legacy)
import { TreeOfThought } from 'graph-of-thought'; // Also available in same package
// Create instance - NO configuration needed
const tot = new TreeOfThought();
// Index your document (creates hierarchical tree structure)
const tree = await tot.index(`
# User Guide
## Installation
Run npm install to get started.
## Configuration
Edit config.json to customize settings.
## Usage
Import the module and call the main function.
`, 'User Guide');
// Retrieve relevant content
const result = await tot.retrieve(tree, 'How do I install?');
console.log(result.context);
// Output: "### Installation\nRun npm install to get started."Architecture Overview
System Architecture
graph TB
A[Document Input] --> B{Complexity Analysis}
B --> C[Tree Mode]
B --> D[Graph Mode]
C --> E[Hierarchical Indexing]
D --> F[Graph-based Indexing]
E --> G[BMSSP Algorithm]
F --> G
G --> H[Selective Node Activation]
H --> I[Context Retrieval]
I --> J[Formatted Output]
subgraph "Indexing Phase"
B
C
D
E
F
end
subgraph "Query Phase"
G
H
I
J
endCore Components Architecture
graph LR
A[GraphOfThought] --> B[DocumentParser]
A --> C[GraphIndexer]
A --> D[TreeGenerator]
A --> E[BMSSPAlgorithm]
C --> F[Node Creation]
C --> G[Edge Detection]
C --> H[Relationship Analysis]
D --> I[Tree Building]
D --> J[Hierarchical Structuring]
E --> K[Path Finding]
E --> L[Bounded Search]
E --> M[Multi-source Traversal]
F --> N[Graph Index]
G --> N
H --> N
I --> O[Tree Index]
J --> O
K --> P[Retrieved Context]
L --> P
M --> P
N --> Q[Selective Activation]
O --> Q
Q --> PRAG Integration Flow
sequenceDiagram
participant U as User
participant R as RAG System
participant G as GraphOfThought
participant L as LLM
U->>R: Query: "How to install?"
R->>G: retrieve(index, "install")
G->>G: Analyze query relevance
G->>G: Activate relevant nodes only
G->>G: Apply BMSSP for connected paths
G-->>R: Context with cross-references
R->>L: Generate with context
L-->>R: Response
R-->>U: Final answerTree-of-Thought (ToT)
- Creates hierarchical tree structures with parent-child relationships
- Best for simple, hierarchical documents
- Efficient for linear navigation
- Selective activation: only relevant branches are explored during queries
Graph-of-Thought (GoT)
- Creates networked graph structures with multiple relationship types
- Identifies cross-references, semantic relationships, and connections
- Best for complex documents with interconnections
- Uses BMSSP algorithm for multi-source path finding
- Selective activation: only relevant nodes and paths are explored during queries
1. Document Analysis and Indexing Process
flowchart TD
A[Input Document] --> B[Parse Content]
B --> C[Split into Sections]
C --> D[Analyze Complexity]
D --> E{Complexity Score}
E -->|High| F[Graph Mode Selection]
E -->|Low| G[Tree Mode Selection]
F --> H[Create Graph Nodes]
F --> I[Identify Cross-References]
F --> J[Build Semantic Edges]
H --> K[Graph Index]
I --> K
J --> K
G --> L[Create Tree Nodes]
G --> M[Build Hierarchy]
L --> N[Tree Index]
M --> N
K --> O[Index Storage]
N --> O2. Query Processing Flow
flowchart LR
A[User Query] --> B[Keyword Analysis]
B --> C[Identify Relevant Nodes]
C --> D[Selective Activation]
D --> E{Index Type}
E -->|Graph| F[BMSSP Path Finding]
E -->|Tree| G[Tree Traversal]
F --> H[Multi-source Search]
F --> I[Bounded Depth Search]
F --> J[Path Scoring]
H --> K[Connected Paths]
I --> K
J --> K
G --> L[Branch Exploration]
G --> M[Relevance Scoring]
L --> N[Relevant Branches]
M --> N
K --> O[Context Assembly]
N --> O
O --> P[Formatted Output]3. Selective Node Activation Process
flowchart TD
A[Query Received] --> B[Initial Keyword Matching]
B --> C[Find Seed Nodes]
C --> D[Create Activation Queue]
D --> E[Process Queue Items]
E --> F{Check Relevance}
F -->|Relevant| G[Activate Node]
F -->|Irrelevant| H[Skip Node]
G --> I[Explore Neighbors]
H --> J[Next Queue Item]
I --> K[Add to Results]
I --> L[Queue Connected Nodes]
K --> M[Apply Bounds]
L --> M
M --> N{More Nodes?}
N -->|Yes| E
N -->|No| O[Return Results]Document
├── Tree Mode (for simple docs):
│ ├── Installation (nodeId: abc123)
│ │ └── "Run npm install to get started..."
│ ├── Configuration (nodeId: def456)
│ │ └── "Edit config.json to customize..."
│ └── Usage (nodeId: ghi789)
│ └── "Import the module and call..."
└── Graph Mode (for complex docs):
├── Installation (nodeId: abc123)
├── Configuration (nodeId: def456)
├── Usage (nodeId: ghi789)
└── Edges: [abc123 ↔ def456 (cross-ref)], [def456 ↔ ghi789 (semantic)]2. Selective Node Activation During Retrieval
- When you query, the system identifies initial relevant nodes based on keyword matching
- Only relevant nodes are activated and explored (not all nodes simultaneously)
- For GoT: BMSSP algorithm traverses connected paths from initial nodes
- For ToT: Traverses relevant branches of the tree hierarchy
- Results are bounded by depth/max results to maintain performance
3. Context Formation
- Retrieved sections are formatted as context for your RAG pipeline
- GoT includes connected sections based on document relationships
- ToT includes hierarchical sections based on tree structure
Node Activation Process
The system does NOT activate all nodes simultaneously. Instead:
- Indexing Phase: All nodes are created and stored in memory but remain dormant
- Query Phase: Only nodes relevant to the query are activated:
- Initial relevant nodes identified via keyword matching
- For GoT: BMSSP explores connected nodes within bounded depth
- For ToT: Relevant tree branches are traversed
- Irrelevant nodes remain inactive
Selective Activation Optimization
The optimized BMSSP algorithm implements tree-like activation patterns within the graph structure:
graph LR
A[Query] --> B[Relevance Scoring]
B --> C[Dynamic Thresholding]
C --> D[Selective Activation]
D --> E[Tree-like Traversal]
E --> F[Context Assembly]
style D fill:#2196f3,color:#ffffff
style E fill:#4caf50,color:#ffffffKey Benefits:
- Efficiency: Only 20-40% of nodes typically activated per query
- Performance: Significant speed improvements for large documents
- Scalability: Maintains performance as document size grows
- Intelligence: Preserves graph-based relationships when needed
Activation Criteria:
- Node centrality and importance
- Semantic similarity to query
- Keyword matching relevance
- Edge weight thresholds
- Parent node activation scores
Performance Comparison
| Mode | Nodes Activated | Performance | Best For | |------|----------------|-------------|----------| | Standard | All relevant nodes | Baseline | Small documents | | Optimized | 20-40% of nodes | 2-3x faster | Large documents |
Integration with Existing RAG Systems
Both Tree-of-Thought and Graph-of-Thought are designed to enhance your existing RAG system:
import { GraphOfThought } from 'graph-of-thought';
// Your existing RAG pipeline
class MyRAGSystem {
private got = new GraphOfThought(); // Auto-selects best approach
private indexes: Map<string, any> = new Map();
// Index documents using GoT/ToT
async addDocument(content: string, name: string) {
const index = await this.got.index(content, name);
this.indexes.set(name, index);
}
// Use GoT/ToT for retrieval, then pass to your LLM
async query(question: string) {
// Retrieve from all indexed documents
let allContext = '';
for (const index of this.indexes.values()) {
const result = await this.got.retrieve(index, question);
allContext += result.context + '\n\n';
}
// Pass context to your existing LLM call
return this.yourExistingLLMCall(question, allContext);
}
}API Reference
GraphOfThought (Recommended)
const got = new GraphOfThought(options?: {
debug?: boolean; // Enable debug logging
indexer?: {
maxDepth?: number; // Max search depth (default: 3)
maxResults?: number; // Max results to return (default: 10)
enableCrossReferences?: boolean; // Enable relationship detection
precomputeRelationships?: boolean; // Pre-calculate node metrics
minEdgeWeight?: number; // Min relationship strength (default: 0.1)
};
hybridMode?: {
autoSwitchThreshold?: number; // Complexity threshold for graph mode
fallbackToTree?: boolean; // Use tree mode for simple docs
};
cache?: {
enabled?: boolean; // Enable caching (default: false)
ttlMs?: number; // Cache TTL in milliseconds (default: 1 hour)
};
});Optimization Methods
// Toggle between optimized and standard retrieval modes
got.setOptimizationMode(useOptimized: boolean);
// Get current optimization status
const isOptimized = got.getOptimizationMode();Optimization Modes:
true(default): Selective node activation with tree-like patternsfalse: Standard traversal of all relevant paths
The optimized mode provides 2-3x performance improvements for large documents while maintaining full graph intelligence.
TreeOfThought (Legacy)
const tot = new TreeOfThought(options?: {
debug?: boolean; // Enable debug logging
search?: {
maxResults?: number; // Max results to return (default: 10)
};
generator?: {
maxSummaryLength?: number; // Summary length (default: 200)
};
});Shared Methods
| Method | Description |
|--------|-------------|
| index(content, title, description?) | Index content (auto-selects tree/graph mode) |
| indexFile(filePath, options?) | Index a file (PDF*, MD, HTML, TXT) |
| indexMarkdown(content, title) | Index markdown content |
| retrieve(index, query) | Retrieve relevant content |
| search(index, query) | Search without content retrieval |
| collectAllText(index) | Get all text from index |
| getTreeStats(index) | Get tree statistics (for tree mode) |
| getIndexStats(index) | Get graph statistics (for graph mode) |
| getTreeStructure(index) | Get printable structure |
| findNodesByTitle(index, query) | Find nodes by title |
| getNodeById(index, nodeId) | Get specific node |
*PDF parsing requires optional @cyber2024/pdf-parse-fixed package
Return Types
// Retrieval result
interface RetrievalResult {
context: string; // Formatted context for your LLM
contents: RetrievedContent[]; // Raw retrieved items
searchResult: {
thinking: string; // Search reasoning (GoT) or node list (ToT)
nodeList: string[]; // Matched node IDs
searchTimeMs: number; // Search duration
};
totalTimeMs: number; // Total retrieval time
}
// Index structure (tree or graph)
interface TreeIndex {
title: string;
description?: string;
nodes: TreeNode[];
createdAt: string;
version: string;
}
interface GraphIndex {
title: string;
description?: string;
nodes: GraphNode[];
edges: GraphEdge[]; // Relationships between nodes
metadata: {
createdAt: string;
version: string;
nodeCount: number;
edgeCount: number;
indexType: 'graph' | 'tree';
};
}Advanced Usage
Forcing Specific Indexing Modes
import { GraphOfThought } from 'graph-of-thought';
const got = new GraphOfThought();
// Force graph mode for maximum relationship understanding
const graphIndex = await got.index(content, 'Title', undefined, {
forceMode: 'graph'
});
// Force tree mode for simple hierarchical documents
const treeIndex = await got.index(content, 'Title', undefined, {
forceMode: 'tree'
});Document Analysis
// Analyze document characteristics
const analysis = await got.analyzeDocument(content);
console.log('Complexity score:', analysis.complexityScore);
console.log('Cross-reference ratio:', analysis.crossReferenceRatio);
// Get index statistics
const stats = got.getIndexStats(index);
console.log('Graph density:', stats.density);
console.log('Average node degree:', stats.averageDegree);Indexing Files
// Index different file types
const pdfIndex = await got.indexFile('./document.pdf'); // Requires pdf-parse
const mdIndex = await got.indexFile('./readme.md');
const htmlIndex = await got.indexFile('./page.html');
const txtIndex = await got.indexFile('./notes.txt');Custom Configuration
const got = new GraphOfThought({
indexer: {
maxDepth: 4, // Maximum search depth
maxResults: 15, // Results to return
enableCrossReferences: true, // Enable relationship detection
precomputeRelationships: true, // Pre-calculate node metrics
minEdgeWeight: 0.2 // Minimum relationship strength
},
hybridMode: {
autoSwitchThreshold: 0.3, // Complexity threshold for graph mode
fallbackToTree: true // Use tree mode for simple documents
},
debug: true // Enable detailed logging
});Serializing Indexes
// Indexes are plain objects - serialize easily
const index = await got.index(content, 'My Doc');
// Save to file
fs.writeFileSync('index.json', JSON.stringify(index));
// Load later
const loadedIndex = JSON.parse(fs.readFileSync('index.json', 'utf-8'));
const result = await got.retrieve(loadedIndex, 'query');Optional: LLM-Enhanced Search
For smarter retrieval, you can optionally add an LLM provider:
# Install your preferred provider
npm install openai # or @anthropic-ai/sdk or @google/generative-aiimport { GraphOfThought, createOpenAIProvider } from 'graph-of-thought';
const got = new GraphOfThought({
llm: createOpenAIProvider('gpt-4-turbo'), // Requires OPENAI_API_KEY
});
// Now retrieval uses LLM reasoning instead of keyword matching
const result = await got.retrieve(index, 'complex query');But this is completely optional! The library works great without any LLM.
Performance Comparison
| Feature | Traditional RAG | Tree-of-Thought | Graph-of-Thought | |---------|----------------|-----------------|------------------| | Setup Time | Minutes | Seconds | Seconds | | Query Time | 500ms-2s | 2-10ms | 5-25ms | | Cross-references | No | Limited | Full Support | | Semantic Paths | No | No | Yes (BMSSP) | | Complexity Handling | Good | Good | Excellent | | Node Activation | All nodes | Relevant branches only | Relevant nodes/paths only |
Use Cases
- Documentation Search - Index your docs for quick retrieval
- Knowledge Bases - Build searchable knowledge bases with cross-reference understanding
- Chatbot Context - Provide relevant context to chatbots
- Technical Documentation - Handle cross-references between API sections
- Research Papers - Understand relationships between concepts and citations
- Legal Documents - Follow references between clauses and sections
- Academic Texts - Trace concept relationships across chapters
- Content Management - Organize and search content
- Offline RAG - Run RAG systems without internet
Contributing
Contributions welcome! This is an open-source project.
- Fork the repository
- Create your feature branch
- Submit a pull request
License
MIT License - Use it however you want!
Enhanced intelligence. Zero dependencies. Selective node activation. Just install and use.
Built by Gianne Bacay
