ragforge
v1.0.3
Published
A lightweight developer toolkit that simplifies the use of advanced query translation techniques — HyDE, Chain of Thought, Step-Back Prompting, Reciprocal Rank Fusion, and Fan-Out Retrieval — for any Retrieval-Augmented Generation (RAG) system, independen
Maintainers
Readme
🚀 ragforge
Advanced RAG Techniques Made Simple
A lightweight, developer-friendly toolkit that brings cutting-edge query translation techniques to your Retrieval-Augmented Generation (RAG) pipeline with minimal code.
Features • Installation • Quick Start • Documentation • Examples
🎯 What is ragforge?
ragforge is a lightweight npm package that simplifies the implementation of advanced RAG query translation techniques. Stop writing complex prompts from scratch—use production-ready implementations of:
- 🔮 HyDE (Hypothetical Document Embeddings)
- 🧠 Chain of Thought (CoT)
- 🔙 Step-Back Prompting
- 🔄 Reciprocal Rank Fusion (RRF)
- 🌟 Fan-Out Retrieval
The Problem
Building RAG systems with advanced techniques is hard:
- ❌ Complex Implementation: Each technique requires specialized prompt engineering and orchestration
- ❌ Time-Consuming: Hours spent coding, debugging, and testing each method
- ❌ Vendor Lock-In: Tight coupling with specific vector databases or LLM providers
- ❌ Inconsistent Results: No standardized way to compare techniques
The Solution
✅ Simple API: Integrate advanced techniques in just a few lines
✅ Database Agnostic: Works with Qdrant, Pinecone, Weaviate, or any vector database
✅ LLM Agnostic: Compatible with OpenAI, Anthropic, Gemini, or custom models
✅ Production-Ready: Battle-tested implementations with comprehensive examples
✅ Lightweight: Minimal dependencies, maximum flexibility
📖 Table of Contents
- Features
- Installation
- Quick Start
- Core Concepts
- Available Techniques
- API Reference
- Complete Examples
- Configuration
- Testing
- FAQ
- Contributing
- License
✨ Features
| Feature | Description | |---------|-------------| | 🔮 HyDE | Hypothetical Document Embeddings - Generate hypothetical answers for better retrieval | | 🧠 Chain of Thought | Step-by-step reasoning to refine complex queries | | 🔙 Step-Back Prompting | Abstract queries to broader concepts for deeper understanding | | 🔄 RRF | Reciprocal Rank Fusion - Merge multiple search strategies intelligently | | 🌟 Fan-Out | Generate multiple query variations for comprehensive coverage | | � RAG Pipeline | Complete end-to-end RAG workflow with technique orchestration | | 🔌 Database Agnostic | Works with any vector database (Qdrant, Pinecone, Weaviate, etc.) | | 🤖 LLM Agnostic | Compatible with OpenAI, Anthropic, or custom LLM providers | | 📦 Minimal Setup | Works out of the box with sensible defaults | | 🧪 Well Tested | Comprehensive test coverage for production use |
📦 Installation
npm install ragforgeRequirements:
- Node.js 14.x or higher
- An LLM API key (OpenAI, Anthropic, etc.)
- Vector database (optional but recommended)
🚀 Quick Start
Step 1: Set Up Environment
Create a .env file in your project root:
OPENAI_API_KEY=your_openai_api_key_hereStep 2: Basic Usage (Without Vector Database)
const { hyde, cot, stepback } = require('ragforge');
// Use HyDE for better query understanding
const answer = await hyde("What are the benefits of machine learning?");
console.log(answer);
// Use Chain of Thought for complex reasoning
const result = await cot("How does climate change affect ocean currents?");
console.log(result.reasoning);
// Use Step-Back for broader context
const abstract = await stepback("What is the capital of France?");
console.log(abstract.abstractQuery); // "What are European capitals and their significance?"Step 3: Advanced Usage (With Vector Database)
const { hyde, setVectorClient } = require('ragforge');
const { QdrantClient } = require('@qdrant/js-client-rest');
const { getEmbedding } = require('ragforge/src/core/openaiClient');
// Initialize your vector database
const qdrant = new QdrantClient({ url: 'http://localhost:6333' });
// Configure ragify to use your database
setVectorClient({
query: async (vector, topK) => {
const results = await qdrant.search('your_collection_name', {
vector,
limit: topK,
with_payload: true
});
return results.map(r => ({
content: r.payload.content,
score: r.score,
metadata: r.payload
}));
}
});
// Now HyDE will use your vector database for retrieval
const answer = await hyde("Explain neural networks");
console.log(answer);Step 4: Complete RAG Pipeline
const { ragPipeline, setVectorClient } = require('ragforge');
// After setting up vector client (see Step 3)...
const result = await ragPipeline("What are the advantages of deep learning?", {
techniques: ["hyde", "fanout"],
topK: 5
});
console.log('Answer:', result.answer);
console.log('Retrieved docs:', result.retrieved);
console.log('Translation:', result.translation);🧩 Core Concepts
1. Query Translation
Traditional RAG directly embeds user queries, which can miss relevant documents. ragify translates queries into forms that match your knowledge base better:
// Direct embedding (traditional)
"What is ML?" → [0.23, 0.45, ...] → Search
// Query translation (ragify)
"What is ML?" → "Machine learning is a subset of AI..." → [0.28, 0.52, ...] → Better Search2. Vector Client Abstraction
ragforge doesn't force you to use a specific database. You provide a simple adapter:
setVectorClient({
query: async (vector, topK) => {
// Your custom logic here
// Return: [{ content, score, metadata }]
}
});3. Technique Composition
Combine multiple techniques for optimal results:
// Use multiple strategies
const result = await ragPipeline(query, {
techniques: ["hyde", "cot", "fanout"],
topK: 5
});🎯 Available Techniques
🔮 HyDE (Hypothetical Document Embeddings)
What it does: Generates a hypothetical answer to your query, then uses that answer for semantic search instead of the original query.
Why it works: Answers are semantically closer to documents in your knowledge base than questions are.
const { hyde } = require('ragforge');
const answer = await hyde("What are neural networks?");
console.log(answer);How it works:
- ✍️ Generates hypothetical answer: "Neural networks are computational models inspired by..."
- 🔢 Creates embedding from this answer
- 🔍 Searches vector database
- 📝 Synthesizes final answer from retrieved docs
When to use:
- ✅ Domain-specific questions
- ✅ Complex technical queries
- ✅ When query-document semantic gap is large
- ❌ Simple factual lookups
🧠 Chain of Thought (CoT)
What it does: Breaks down complex queries into step-by-step reasoning before retrieval.
Why it works: Step-by-step thinking helps identify what information is actually needed.
const { cot } = require('ragforge');
const result = await cot("How does photosynthesis affect climate?");
console.log('Reasoning:', result.reasoning);
console.log('Refined Query:', result.refinedQuery);Output:
{
reasoning: "Step 1: Understand photosynthesis process...\nStep 2: Consider climate factors...",
refinedQuery: "relationship between photosynthesis carbon dioxide climate regulation"
}When to use:
- ✅ Multi-step reasoning problems
- ✅ Causal relationship questions
- ✅ Complex analytical queries
- ❌ Simple definition lookups
🔙 Step-Back Prompting
What it does: Reformulates specific questions into broader, principle-based queries.
Why it works: Abstract concepts often provide better foundation for answering specific questions.
const { stepback } = require('ragforge');
const result = await stepback("What is the capital of France?");
console.log('Original:', result.originalQuery);
console.log('Abstract:', result.abstractQuery);
console.log('Reasoning:', result.reasoning);Output:
{
originalQuery: "What is the capital of France?",
abstractQuery: "What are the concepts of national capitals and their significance?",
reasoning: "Understanding capital cities helps answer specific capital questions."
}When to use:
- ✅ Questions benefiting from conceptual context
- ✅ Educational queries
- ✅ When you want comprehensive understanding
- ❌ Time-sensitive or very specific queries
🔄 Reciprocal Rank Fusion (RRF)
What it does: Intelligently merges results from multiple retrieval strategies.
Why it works: Different search methods find different relevant documents. RRF combines their strengths.
const { rrf } = require('ragforge');
// Define multiple search strategies
const searchResults = rrf([
['doc1', 'doc2', 'doc3'], // Results from keyword search
['doc2', 'doc4', 'doc1'], // Results from vector search
['doc3', 'doc1', 'doc5'] // Results from hybrid search
], 60); // k constant (default: 60)
console.log(searchResults);
// ['doc1', 'doc2', 'doc3', 'doc4', 'doc5'] - intelligently rankedHow scoring works:
score(doc) = Σ(1 / (k + rank))
Where:
- k = constant (default 60)
- rank = position in each list (0-indexed)When to use:
- ✅ Combining keyword + vector search
- ✅ Using multiple embedding models
- ✅ Merging different retrieval strategies
- ✅ Improving recall and precision
🌟 Fan-Out Retrieval
What it does: Generates multiple query variations and retrieves documents for each, then synthesizes a comprehensive answer.
Why it works: Different phrasings retrieve different relevant documents, giving broader coverage.
const { fanout } = require('ragforge');
const answer = await fanout("What is artificial intelligence?");
console.log(answer);How it works:
- 📝 Original: "What is artificial intelligence?"
- 🌿 Expands to:
- "Define artificial intelligence"
- "Explain AI technology and applications"
- "What are the fundamentals of AI?"
- 🔍 Retrieves docs for each variation
- 🎯 Synthesizes comprehensive answer
When to use:
- ✅ Broad research questions
- ✅ Ambiguous queries
- ✅ Comprehensive topic exploration
- ❌ Narrow, specific questions (creates noise)
🔧 RAG Pipeline (Complete Workflow)
What it does: Orchestrates multiple techniques in a complete end-to-end RAG workflow.
Why it works: Combines the strengths of multiple techniques for optimal retrieval and generation.
const { ragPipeline, setVectorClient } = require('ragforge');
// After setting up your vector client...
const result = await ragPipeline(
"What are the latest advances in machine learning?",
{
techniques: ["hyde", "fanout"], // Techniques to use
topK: 5, // Top results to return
rerankK: 20, // Results to consider for reranking
finalModelOptions: { // LLM options for final answer
temperature: 0.3,
max_tokens: 500
}
}
);
console.log('Final Answer:', result.answer);
console.log('Retrieved Documents:', result.retrieved);
console.log('Query Translation:', result.translation);Output structure:
{
query: "original user query",
translation: {
hyde: "hypothetical document",
fanout: "combined fanout result",
cot: { reasoning: "...", refinedQuery: "..." },
subQueries: ["query1", "query2", ...]
},
retrieved: ["doc1", "doc2", ...],
answer: "synthesized final answer"
}Pipeline flow:
- 🔄 Query Translation (applies selected techniques)
- 🔍 Multi-Query Retrieval (searches for each translated query)
- 🎯 RRF Merging (combines results intelligently)
- ✨ Answer Synthesis (generates final response)
📚 API Reference
Core Functions
setVectorClient(client)
Configure your vector database adapter. This must be called before using retrieval-based techniques.
setVectorClient({
query: async (vector, topK) => {
// Implement your database query logic
// vector: Array of numbers (embedding)
// topK: Number of results to return
// Must return: Array of { content, score, metadata }
return results;
}
});Parameters:
client.query(Function): Async function that queries your vector databasevector(Array): Embedding vector to search withtopK(number): Number of results to return- Returns:
Promise<Array<{content: string, score: number, metadata: object}>>
Example with different databases:
const { QdrantClient } = require('@qdrant/js-client-rest');
const qdrant = new QdrantClient({ url: 'http://localhost:6333' });
setVectorClient({
query: async (vector, topK) => {
const results = await qdrant.search('collection_name', {
vector,
limit: topK,
with_payload: true
});
return results.map(r => ({
content: r.payload.content,
score: r.score,
metadata: r.payload
}));
}
});const { PineconeClient } = require('@pinecone-database/pinecone');
const pinecone = new PineconeClient();
await pinecone.init({ apiKey: 'your-api-key' });
const index = pinecone.Index('your-index');
setVectorClient({
query: async (vector, topK) => {
const results = await index.query({
vector,
topK,
includeMetadata: true
});
return results.matches.map(m => ({
content: m.metadata.content,
score: m.score,
metadata: m.metadata
}));
}
});const weaviate = require('weaviate-client');
const client = weaviate.client({ scheme: 'http', host: 'localhost:8080' });
setVectorClient({
query: async (vector, topK) => {
const results = await client.graphql
.get()
.withClassName('YourClass')
.withNearVector({ vector })
.withLimit(topK)
.withFields('content _additional { distance }')
.do();
return results.data.Get.YourClass.map(r => ({
content: r.content,
score: 1 - r._additional.distance,
metadata: r
}));
}
});hyde(query, options)
Hypothetical Document Embeddings technique.
const answer = await hyde(query, options);Parameters:
query(string): User's question or queryoptions(object, optional): LLM configurationmodel(string): LLM model to use (default: "gpt-4")temperature(number): Randomness (0-1, default: 0.7)max_tokens(number): Maximum response length (default: 500)
Returns: Promise<string> - Final answer
Example:
const answer = await hyde("What is quantum computing?", {
model: "gpt-3.5-turbo",
temperature: 0.5,
max_tokens: 300
});cot(query, options)
Chain of Thought reasoning technique.
const result = await cot(query, options);Parameters:
query(string): User's questionoptions(object, optional): LLM configuration options
Returns: Promise<{reasoning: string, refinedQuery: string}>
reasoning: Step-by-step thought processrefinedQuery: Cleaned and refined query
Example:
const result = await cot("How does machine learning work?");
console.log('Reasoning:', result.reasoning);
console.log('Refined:', result.refinedQuery);stepback(query, options)
Step-Back Prompting technique.
const result = await stepback(query, options);Parameters:
query(string): Specific user questionoptions(object, optional): LLM configuration
Returns: Promise<{abstractQuery: string, originalQuery: string, reasoning: string}>
abstractQuery: Broader, principle-based versionoriginalQuery: Original specific queryreasoning: Explanation of the abstraction
Example:
const result = await stepback("What is Python's GIL?");
// abstractQuery: "What are concurrency mechanisms in programming languages?"
// reasoning: "Understanding general concurrency helps explain GIL specifically"rrf(rankedLists, k)
Reciprocal Rank Fusion for merging search results.
const merged = rrf(rankedLists, k);Parameters:
rankedLists(Array<Array>): Array of ranked result listsk(number, optional): RRF constant for score calculation (default: 60)
Returns: Array<any> - Merged and ranked results
Example:
const keywordResults = ['doc1', 'doc2', 'doc3'];
const vectorResults = ['doc2', 'doc4', 'doc1'];
const hybridResults = ['doc3', 'doc1', 'doc5'];
const merged = rrf([keywordResults, vectorResults, hybridResults], 60);
console.log(merged); // ['doc1', 'doc2', 'doc3', 'doc4', 'doc5']fanout(query, options)
Fan-Out multi-query retrieval technique.
const answer = await fanout(query, options);Parameters:
query(string): User's questionoptions(object, optional): Configuration options- All standard LLM options
- Number of variations is fixed at 3
Returns: Promise<string> - Synthesized answer from all query variations
Example:
const answer = await fanout("What is deep learning?");
// Internally generates 3 variations, retrieves for each, and synthesizesragPipeline(query, options)
Complete end-to-end RAG pipeline with technique orchestration.
const result = await ragPipeline(query, options);Parameters:
query(string): User's questionoptions(object, optional):techniques(Array): Techniques to use (default:["hyde", "cot", "fanout"])- Available:
"hyde","cot","fanout","stepback"
- Available:
topK(number): Top results to return (default: 5)rerankK(number): Results to consider for reranking (default: 20)finalModelOptions(object): LLM options for final synthesismodel,temperature,max_tokens, etc.
Returns: Promise<{query: string, translation: object, retrieved: Array, answer: string}>
Example:
const result = await ragPipeline(
"Explain transformer architecture",
{
techniques: ["hyde", "fanout"],
topK: 5,
rerankK: 15,
finalModelOptions: {
model: "gpt-4",
temperature: 0.3,
max_tokens: 600
}
}
);
console.log('Answer:', result.answer);
console.log('Sources:', result.retrieved);
console.log('Translation:', result.translation);Utility Functions
getEmbedding(text)
Generate embeddings using OpenAI's API.
const { getEmbedding } = require('ragforge/src/core/openaiClient');
const vector = await getEmbedding("Your text here");
console.log(vector.length); // 1536 (for text-embedding-ada-002)callLLM(prompt, options)
Call the LLM with custom prompts.
const { callLLM } = require('ragforge/src/core/openaiClient');
const response = await callLLM("Explain quantum physics", {
model: "gpt-4",
temperature: 0.7,
max_tokens: 300
});💡 Complete Examples
Example 1: Basic HyDE Usage (No Database)
Perfect for getting started without setting up a vector database.
const { hyde } = require('ragforge');
(async () => {
const answer = await hyde("What are the benefits of exercise?");
console.log(answer);
})();Output:
[HyDE] Step 1: Generating hypothetical document...
[HyDE] Generated: Regular exercise provides numerous health benefits...
[HyDE] Step 2: Creating embedding...
[HyDE] Embedding created (1536 dimensions)
[HyDE] Step 3: Searching vector DB...
[HyDE] No DB results, returning hypothetical document
[HyDE] Complete!
Regular exercise provides numerous health benefits including improved cardiovascular health, weight management, enhanced mental well-being, stronger bones and muscles, and reduced risk of chronic diseases...Example 2: HyDE with Qdrant Vector Database
Complete setup with a real vector database for production use.
const { hyde, setVectorClient } = require('ragforge');
const { QdrantClient } = require('@qdrant/js-client-rest');
const { getEmbedding } = require('ragforge/src/core/openaiClient');
// 1. Initialize Qdrant client
const qdrant = new QdrantClient({ url: 'http://localhost:6333' });
const COLLECTION_NAME = 'my_knowledge_base';
// 2. Configure ragify to use Qdrant
setVectorClient({
query: async (vector, topK) => {
const results = await qdrant.search(COLLECTION_NAME, {
vector,
limit: topK,
with_payload: true
});
return results.map(r => ({
content: r.payload.content,
score: r.score,
metadata: r.payload
}));
}
});
// 3. Use HyDE with database-backed retrieval
(async () => {
const answer = await hyde("What is machine learning?");
console.log('Answer:', answer);
})();Example 3: Comparing Techniques
Compare different RAG techniques side-by-side.
const { hyde, cot, stepback, fanout } = require('ragforge');
async function compareTechniques(query) {
console.log('Query:', query);
console.log('='.repeat(80));
// HyDE
console.log('\n🔮 HyDE Technique:');
const hydeAnswer = await hyde(query);
console.log(hydeAnswer);
// Chain of Thought
console.log('\n🧠 Chain of Thought:');
const cotResult = await cot(query);
console.log('Reasoning:', cotResult.reasoning);
// Step-Back
console.log('\n🔙 Step-Back Prompting:');
const stepbackResult = await stepback(query);
console.log('Abstract Query:', stepbackResult.abstractQuery);
console.log('Reasoning:', stepbackResult.reasoning);
// Fan-Out
console.log('\n🌟 Fan-Out Retrieval:');
const fanoutAnswer = await fanout(query);
console.log(fanoutAnswer);
}
compareTechniques("How does photosynthesis work?");Example 4: Full RAG Pipeline
Production-ready pipeline with multiple techniques.
const { ragPipeline, setVectorClient } = require('ragforge');
const { QdrantClient } = require('@qdrant/js-client-rest');
// Setup (same as Example 2)
const qdrant = new QdrantClient({ url: 'http://localhost:6333' });
setVectorClient({
query: async (vector, topK) => {
const results = await qdrant.search('ml_knowledge', {
vector,
limit: topK,
with_payload: true
});
return results.map(r => ({
content: r.payload.content,
score: r.score,
metadata: r.payload
}));
}
});
// Run complete pipeline
(async () => {
const result = await ragPipeline(
"What are the main types of neural networks?",
{
techniques: ["hyde", "fanout"],
topK: 5,
rerankK: 15,
finalModelOptions: {
model: "gpt-4",
temperature: 0.3,
max_tokens: 500
}
}
);
console.log('📄 Final Answer:');
console.log('='.repeat(80));
console.log(result.answer);
console.log('\n📚 Retrieved Documents:');
console.log('='.repeat(80));
result.retrieved.forEach((doc, i) => {
console.log(`\n[${i + 1}] ${doc.substring(0, 150)}...`);
});
console.log('\n🔄 Query Translation:');
console.log('='.repeat(80));
console.log(JSON.stringify(result.translation, null, 2));
})();Example 5: Reciprocal Rank Fusion
Merge results from multiple search strategies.
const { rrf, setVectorClient } = require('ragforge');
const { QdrantClient } = require('@qdrant/js-client-rest');
const { getEmbedding } = require('ragforge/src/core/openaiClient');
const qdrant = new QdrantClient({ url: 'http://localhost:6333' });
async function hybridSearch(query) {
// Strategy 1: Search with original query
const vector1 = await getEmbedding(query);
const results1 = await qdrant.search('collection', {
vector: vector1,
limit: 10
});
// Strategy 2: Search with expanded query
const expandedQuery = query + " explanation tutorial guide";
const vector2 = await getEmbedding(expandedQuery);
const results2 = await qdrant.search('collection', {
vector: vector2,
limit: 10
});
// Strategy 3: Search with keywords
const vector3 = await getEmbedding(query.split(' ').join(' '));
const results3 = await qdrant.search('collection', {
vector: vector3,
limit: 10
});
// Extract IDs for RRF
const list1 = results1.map(r => r.id);
const list2 = results2.map(r => r.id);
const list3 = results3.map(r => r.id);
// Merge using RRF
const mergedIds = rrf([list1, list2, list3], 60);
console.log('Merged Results (RRF):', mergedIds.slice(0, 5));
return mergedIds;
}
hybridSearch("machine learning algorithms");Example 6: Creating a Knowledge Base
Setup a vector database with sample data.
const { QdrantClient } = require('@qdrant/js-client-rest');
const { getEmbedding } = require('ragforge/src/core/openaiClient');
const qdrant = new QdrantClient({ url: 'http://localhost:6333' });
const COLLECTION_NAME = 'my_docs';
async function setupKnowledgeBase() {
// Create collection
await qdrant.createCollection(COLLECTION_NAME, {
vectors: {
size: 1536,
distance: 'Cosine'
}
});
// Sample documents
const documents = [
{
id: 1,
content: "Machine learning is a subset of artificial intelligence that focuses on building systems that learn from data.",
topic: "ML Basics"
},
{
id: 2,
content: "Neural networks are computing systems inspired by biological neural networks in animal brains.",
topic: "Neural Networks"
},
{
id: 3,
content: "Deep learning uses multiple layers of neural networks to progressively extract higher-level features.",
topic: "Deep Learning"
}
];
// Add documents with embeddings
for (const doc of documents) {
const vector = await getEmbedding(doc.content);
await qdrant.upsert(COLLECTION_NAME, {
points: [{
id: doc.id,
vector,
payload: {
content: doc.content,
topic: doc.topic
}
}]
});
console.log(`✅ Added: ${doc.topic}`);
}
console.log('\n✨ Knowledge base ready!');
}
setupKnowledgeBase();Example 7: Custom LLM Configuration
Use different models and parameters for different use cases.
const { hyde, cot } = require('ragforge');
// Fast, cheap responses
const quickAnswer = await hyde("What is AI?", {
model: "gpt-3.5-turbo",
temperature: 0.3,
max_tokens: 150
});
// Detailed, creative responses
const detailedAnswer = await hyde("Explain quantum computing", {
model: "gpt-4",
temperature: 0.8,
max_tokens: 800
});
// Focused, deterministic responses
const factualAnswer = await cot("What is 2+2?", {
model: "gpt-3.5-turbo",
temperature: 0,
max_tokens: 50
});More Examples
Check out the examples/ directory for additional working examples:
fullRAGExample.js- Complete RAG pipeline demonstrationcompareHydeVsNormal.js- Side-by-side HyDE vs traditional searchsetupMLCollection.js- Create ML knowledge base in QdranttestHyDE.js- HyDE technique testingcotExample.js- Chain of Thought examplestestWithDB.js- Database integration examples
Run examples:
# Start Qdrant with Docker
docker-compose up -d
# Setup knowledge base
node examples/setupMLCollection.js
# Run examples
node examples/fullRAGExample.js
node examples/compareHydeVsNormal.js🔧 Configuration
Environment Variables
Create a .env file in your project root:
# Required: OpenAI API Key
OPENAI_API_KEY=sk-your-openai-api-key-here
# Optional: Custom API Configuration
OPENAI_API_BASE=https://api.openai.com/v1 # Custom endpoint
OPENAI_ORGANIZATION=org-your-org-id # Organization IDDefault Settings
ragforge uses sensible defaults that work for most use cases. You can find them in src/config/defaults.js:
{
model: "gpt-4", // LLM model
temperature: 0.7, // Randomness (0-1)
maxTokens: 500, // Max response length
topP: 1, // Nucleus sampling
frequencyPenalty: 0, // Repetition penalty
presencePenalty: 0 // Topic diversity
}Custom Configuration Per Request
Override defaults for individual requests:
const answer = await hyde("Your query", {
model: "gpt-3.5-turbo", // Use faster/cheaper model
temperature: 0.3, // More focused
max_tokens: 200, // Shorter responses
top_p: 0.9,
frequency_penalty: 0.5,
presence_penalty: 0.3
});Vector Database Configuration
Qdrant
const { QdrantClient } = require('@qdrant/js-client-rest');
const qdrant = new QdrantClient({
url: 'http://localhost:6333',
apiKey: 'your-api-key' // Optional for Qdrant Cloud
});
setVectorClient({
query: async (vector, topK) => {
const results = await qdrant.search('collection_name', {
vector,
limit: topK,
with_payload: true,
score_threshold: 0.7 // Optional: minimum similarity score
});
return results.map(r => ({
content: r.payload.content,
score: r.score,
metadata: r.payload
}));
}
});Pinecone
const { PineconeClient } = require('@pinecone-database/pinecone');
const pinecone = new PineconeClient();
await pinecone.init({
apiKey: process.env.PINECONE_API_KEY,
environment: process.env.PINECONE_ENV
});
const index = pinecone.Index('your-index-name');
setVectorClient({
query: async (vector, topK) => {
const results = await index.query({
vector,
topK,
includeMetadata: true,
includeValues: false
});
return results.matches.map(m => ({
content: m.metadata.content,
score: m.score,
metadata: m.metadata
}));
}
});Docker Setup (Qdrant)
Use the included docker-compose.yml:
version: '3.8'
services:
qdrant:
image: qdrant/qdrant:latest
ports:
- "6333:6333"
- "6334:6334"
volumes:
- ./qdrant_storage:/qdrant/storageStart Qdrant:
docker-compose up -dAccess dashboard: http://localhost:6333/dashboard
🧪 Testing
ragforge includes comprehensive test coverage for all techniques.
Run Tests
# Run all tests
npm test
# Run tests in watch mode
npm run test:watch
# Run specific test file
npm test hyde.test.jsTest Coverage
✓ HyDE technique
✓ Chain of Thought
✓ Step-Back Prompting
✓ Reciprocal Rank Fusion
✓ Fan-Out Retrieval
✓ RAG Pipeline integrationWriting Your Own Tests
const { hyde } = require('ragforge');
describe('HyDE Tests', () => {
it('should generate hypothetical document', async () => {
const result = await hyde("What is AI?");
expect(result).toBeDefined();
expect(typeof result).toBe('string');
expect(result.length).toBeGreaterThan(0);
});
});📁 Project Structure
ragforge/
├── src/
│ ├── core/
│ │ ├── openaiClient.js # LLM API integration (OpenAI)
│ │ ├── vectorClient.js # Vector DB abstraction layer
│ │ └── utils.js # Utility functions (logging, sanitization)
│ │
│ ├── techniques/
│ │ ├── hyde.js # HyDE implementation
│ │ ├── cot.js # Chain of Thought
│ │ ├── stepback.js # Step-Back Prompting
│ │ ├── rrf.js # Reciprocal Rank Fusion
│ │ └── fanout.js # Fan-Out Retrieval
│ │
│ ├── pipeline/
│ │ ├── queryTranslator.js # Query transformation orchestrator
│ │ └── ragPipeline.js # End-to-end RAG workflow
│ │
│ └── config/
│ ├── defaults.js # Default LLM parameters
│ └── env.js # Environment configuration
│
├── examples/
│ ├── fullRAGExample.js # Complete pipeline demo
│ ├── compareHydeVsNormal.js # HyDE vs traditional search
│ ├── setupMLCollection.js # Create knowledge base
│ ├── testHyDE.js # HyDE testing
│ ├── cotExample.js # Chain of Thought demo
│ └── testWithDB.js # Database integration
│
├── tests/
│ ├── hyde.test.js # HyDE tests
│ ├── cot.test.js # CoT tests
│ ├── stepback.test.js # Step-Back tests
│ ├── rrf.test.js # RRF tests
│ ├── fanout.test.js # Fan-Out tests
│ └── pipeline.test.js # Pipeline tests
│
├── index.js # Main entry point
├── package.json # Package configuration
├── docker-compose.yml # Qdrant Docker setup
└── README.md # Documentation❓ FAQ
Q: Do I need a vector database to use ragforge?
A: No! ragforge works without a vector database. Techniques like HyDE will generate hypothetical documents and return them directly. However, for production RAG systems, a vector database is highly recommended for better retrieval.
Q: Can I use ragforge with other LLM providers?
A: Currently, ragforge uses OpenAI's API. However, you can easily extend src/core/openaiClient.js to work with:
- Anthropic Claude
- Google Gemini
- Azure OpenAI
- Local models (Ollama, LM Studio)
- Custom API endpoints
Q: Which technique should I use?
A: It depends on your use case:
- HyDE: Best for most RAG scenarios, especially domain-specific queries
- CoT: Complex reasoning, multi-step problems
- Step-Back: Conceptual understanding, educational content
- Fan-Out: Broad research, comprehensive coverage
- RRF: Combining multiple search strategies
- Pipeline: Production systems combining multiple techniques
Q: How much does it cost?
A: ragify is free and open-source. You only pay for:
- OpenAI API usage (embeddings + completions)
- Vector database hosting (if using cloud service)
Typical costs: ~$0.01-0.05 per query depending on configuration.
Q: Can I use ragforge in production?
A: Yes! ragforge is designed for production use:
- ✅ Comprehensive error handling
- ✅ Configurable timeouts and retries
- ✅ Extensive test coverage
- ✅ Logging and debugging support
- ✅ Scalable architecture
Q: How do I handle rate limits?
A: Implement retry logic in your vector client:
setVectorClient({
query: async (vector, topK, retries = 3) => {
try {
const results = await qdrant.search(...);
return results;
} catch (error) {
if (retries > 0 && error.code === 429) {
await new Promise(r => setTimeout(r, 1000));
return query(vector, topK, retries - 1);
}
throw error;
}
}
});Q: Can I customize the prompts?
A: Yes! You can modify prompts in the technique files (src/techniques/) or create your own implementations inspired by ragforge's architecture.
Q: Does ragforge support streaming responses?
A: Not currently, but it's on the roadmap. Follow the GitHub repo for updates.
Q: How do I contribute?
A: See the Contributing section below!
🤝 Contributing
Contributions are welcome and appreciated! Here's how you can help:
Ways to Contribute
- 🐛 Report bugs - Open an issue on GitHub
- 💡 Suggest features - Share your ideas for improvements
- 📝 Improve documentation - Help make the docs clearer
- 🔧 Submit pull requests - Fix bugs or add features
- ⭐ Star the repo - Show your support!
Development Setup
Fork and clone the repository
git clone https://github.com/YOUR_USERNAME/ragforge.git cd ragforgeInstall dependencies
npm installCreate a
.envfileOPENAI_API_KEY=your_api_key_hereStart Qdrant (for testing)
docker-compose up -dRun tests
npm test
Pull Request Process
- Create a feature branch (
git checkout -b feature/AmazingFeature) - Make your changes
- Add/update tests as needed
- Ensure all tests pass (
npm test) - Commit your changes (
git commit -m 'Add some AmazingFeature') - Push to the branch (
git push origin feature/AmazingFeature) - Open a Pull Request
Code Style
- Use meaningful variable names
- Add comments for complex logic
- Follow existing code patterns
- Keep functions focused and small
- Write tests for new features
Roadmap
Planned features for future releases:
- [ ] Streaming response support
- [ ] Additional LLM providers (Anthropic, Gemini, local models)
- [ ] More vector database adapters
- [ ] Query caching
- [ ] Advanced reranking techniques
- [ ] Evaluation metrics
- [ ] Batch processing
- [ ] TypeScript support
🌟 Acknowledgments
ragforge is built on the shoulders of giants:
Technologies:
- OpenAI API - LLM provider
- Qdrant - Vector database
- Node.js - Runtime environment
Community:
- Built with ❤️ for the RAG and AI community
- Inspired by feedback from developers worldwide
📧 Contact & Support
Author
Kamraan Mulani
- GitHub: @Kamraanmulani
- Project: ragforge
Get Help
- 📖 Documentation: You're reading it!
- 🐛 Issues: GitHub Issues
- 💬 Discussions: GitHub Discussions
Support the Project
If you find ragforge helpful:
- ⭐ Star the repo on GitHub
- 🐦 Share it on social media
- 📝 Write a blog post about your experience
- 🤝 Contribute code or documentation
- 💬 Provide feedback to help improve ragforge
� Quick Links
Built with ❤️ for the RAG Community
If this project helped you, please consider giving it a ⭐!
Made with passion by Kamraan Mulani
