rag-doc-analyzer
v0.1.0
Published
A RAG (Retrieval-Augmented Generation) library for document analysis and processing
Downloads
10
Maintainers
Readme
RAG Document Analyzer
A powerful TypeScript library for document analysis and processing using Retrieval-Augmented Generation (RAG) techniques. This library provides tools for document loading, text extraction, embedding generation, and semantic search.
Features
- 📄 Document loading and processing (PDF support included)
- 🔍 Semantic search capabilities
- 🤖 Integration with various LLM providers (OpenAI, Ollama, Gemini)
- 🧠 In-memory vector storage
- 🚀 Built with TypeScript for type safety
Installation
npm install rag-doc-analyzer
# or
yarn add rag-doc-analyzerPrerequisites
- Node.js >= 16.0.0
- API keys for your chosen LLM provider (OpenAI, Ollama, or Gemini)
Quick Start
import { DocAnalyzer } from 'rag-doc-analyzer';
import { readFileSync } from 'fs';
import { join } from 'path';
async function main() {
try {
// Initialize the analyzer
const analyzer = await DocAnalyzer.init({
pdfs: [readFileSync(join(process.cwd(), 'path-to-your-document.pdf'))],
llm: {
provider: 'openai', // or 'ollama' or 'gemini'
apiKey: 'your-api-key-here',
model: 'gpt-3.5-turbo' // or your preferred model
},
embedder: 'openai' // or 'ollama'
});
// Ask a question about the document
const answer = await analyzer.ask('What is the main topic of this document?');
console.log('Answer:', answer);
// Or have a conversation
const messages = [
{ role: 'user' as const, content: 'What are the key points?' },
// The response will be added to the messages array
];
const response = await analyzer.chat(messages);
console.log('Chat response:', response);
} catch (error) {
console.error('Error:', error);
}
}
main();Configuration Options
interface RagOptions {
// Array of PDF documents (as Buffer, File, or file path string)
pdfs: (Buffer | File | string)[];
// LLM configuration
llm: {
// The LLM provider to use
provider: 'openai' | 'ollama' | 'gemini';
// API key for the provider (optional for some providers like Ollama)
apiKey?: string;
// Model to use (e.g., 'gpt-3.5-turbo', 'llama2', 'gemini-pro')
model: string;
};
// Embedding model to use (optional, defaults to 'openai')
embedder?: 'openai' | 'ollama';
// Vector store to use (currently only 'memory' is supported)
vectorStore?: 'memory';
}API Reference
DocAnalyzer.init(options: RagOptions): Promise<DocAnalyzer>
Initialize a new DocAnalyzer instance with the provided options.
analyzer.ask(question: string): Promise<string>
Ask a question about the loaded documents.
analyzer.chat(messages: Message[]): Promise<string>
Have a conversation about the loaded documents.
License
MIT
For detailed documentation and API reference, please visit our documentation website.
Contributing
Contributions are welcome! Please read our contributing guidelines to get started.
License
This project is licensed under the MIT License - see the LICENSE file for details.
