rag-doc-analyzer

v0.1.0

Published

8 months ago

A RAG (Retrieval-Augmented Generation) library for document analysis and processing

0High
0Medium
0Low

vaheesan95

rag document-analysis nlp ai machine-learning llm vector-search document-qa retrieval-augmented-generation

RAG Document Analyzer

A powerful TypeScript library for document analysis and processing using Retrieval-Augmented Generation (RAG) techniques. This library provides tools for document loading, text extraction, embedding generation, and semantic search.

Features

📄 Document loading and processing (PDF support included)
🔍 Semantic search capabilities
🤖 Integration with various LLM providers (OpenAI, Ollama, Gemini)
🧠 In-memory vector storage
🚀 Built with TypeScript for type safety

Installation

npm install rag-doc-analyzer
# or
yarn add rag-doc-analyzer

Prerequisites

Node.js >= 16.0.0
API keys for your chosen LLM provider (OpenAI, Ollama, or Gemini)

Quick Start

import { DocAnalyzer } from 'rag-doc-analyzer';
import { readFileSync } from 'fs';
import { join } from 'path';

async function main() {
  try {
    // Initialize the analyzer
    const analyzer = await DocAnalyzer.init({
      pdfs: [readFileSync(join(process.cwd(), 'path-to-your-document.pdf'))],
      llm: {
        provider: 'openai', // or 'ollama' or 'gemini'
        apiKey: 'your-api-key-here',
        model: 'gpt-3.5-turbo' // or your preferred model
      },
      embedder: 'openai' // or 'ollama'
    });

    // Ask a question about the document
    const answer = await analyzer.ask('What is the main topic of this document?');
    console.log('Answer:', answer);

    // Or have a conversation
    const messages = [
      { role: 'user' as const, content: 'What are the key points?' },
      // The response will be added to the messages array
    ];
    
    const response = await analyzer.chat(messages);
    console.log('Chat response:', response);
  } catch (error) {
    console.error('Error:', error);
  }
}

main();

Configuration Options

interface RagOptions {
  // Array of PDF documents (as Buffer, File, or file path string)
  pdfs: (Buffer | File | string)[];
  
  // LLM configuration
  llm: {
    // The LLM provider to use
    provider: 'openai' | 'ollama' | 'gemini';
    
    // API key for the provider (optional for some providers like Ollama)
    apiKey?: string;
    
    // Model to use (e.g., 'gpt-3.5-turbo', 'llama2', 'gemini-pro')
    model: string;
  };
  
  // Embedding model to use (optional, defaults to 'openai')
  embedder?: 'openai' | 'ollama';
  
  // Vector store to use (currently only 'memory' is supported)
  vectorStore?: 'memory';
}

API Reference

`DocAnalyzer.init(options: RagOptions): Promise<DocAnalyzer>`

Initialize a new DocAnalyzer instance with the provided options.

`analyzer.ask(question: string): Promise<string>`

Ask a question about the loaded documents.

`analyzer.chat(messages: Message[]): Promise<string>`

Have a conversation about the loaded documents.

License

MIT

For detailed documentation and API reference, please visit our documentation website.

Contributing

Contributions are welcome! Please read our contributing guidelines to get started.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

RAG Document Analyzer

Features

Installation

Prerequisites

Quick Start

Configuration Options

API Reference

DocAnalyzer.init(options: RagOptions): Promise<DocAnalyzer>

analyzer.ask(question: string): Promise<string>

analyzer.chat(messages: Message[]): Promise<string>

License

Contributing

License

`DocAnalyzer.init(options: RagOptions): Promise<DocAnalyzer>`

`analyzer.ask(question: string): Promise<string>`

`analyzer.chat(messages: Message[]): Promise<string>`