@functional-systems/langchain-lambdadb

v0.2.1

Published

3 days ago

LangChain integration for LambdaDB vector database

Downloads

348

0High
0Medium
0Low

chobobdev

swkim86

langchain lambdadb vector-database typescript ai embeddings

LangChain LambdaDB Integration

A production-ready TypeScript library that integrates LambdaDB vector database with LangChain.js, providing seamless vector storage and retrieval capabilities for AI applications.

Features

🚀 Easy Integration: Drop-in replacement for other LangChain vector stores
🎯 Vector Similarity Search: Support for cosine, euclidean, and dot product similarity metrics
🧠 Max Marginal Relevance (MMR): Diverse search results balancing relevance and diversity
📊 Batch Operations: Efficient bulk document insertion and processing
🔍 Flexible Configuration: Custom field names, similarity metrics, and collection settings
🛡️ Type Safety: Full TypeScript support with comprehensive type definitions
⚡ High Performance: Leverages LambdaDB's optimized vector search engine with consistent reads
🧪 Production Ready: Comprehensive test suite with 43 passing tests (16 unit + 27 integration)
🔄 Retry Logic: Built-in exponential backoff for robust error handling
📈 Collection Management: Full lifecycle management with state monitoring
🗑️ Document Deletion: LangChain delete() support with server-side LambdaDB filter (by ids, filter, or deleteAll)

Installation

npm install langchain-lambdadb @langchain/core

Quick Start

import { LambdaDBVectorStore } from 'langchain-lambdadb';
import { OpenAIEmbeddings } from '@langchain/openai';
import { Document } from '@langchain/core/documents';

// Initialize embeddings
const embeddings = new OpenAIEmbeddings({
  apiKey: process.env.OPENAI_API_KEY
});

// Configure LambdaDB connection
const config = {
  projectApiKey: process.env.LAMBDADB_API_KEY!,
  serverURL: process.env.LAMBDADB_SERVER_URL, // Optional: custom server
  collectionName: 'my-documents',
  vectorDimensions: 1536, // OpenAI embedding dimensions
  similarityMetric: 'cosine',
  // Optional: Configure retry behavior
  retryOptions: {
    maxAttempts: 3,
    initialDelay: 500,
    maxDelay: 5000
  }
};

// Create vector store
const vectorStore = new LambdaDBVectorStore(embeddings, config);

// Create collection if it doesn't exist
await vectorStore.createCollection();

// Add documents
const documents = [
  new Document({ 
    pageContent: 'LangChain is a framework for developing applications powered by language models.',
    metadata: { source: 'documentation', category: 'framework' }
  }),
  new Document({ 
    pageContent: 'LambdaDB is a vector database optimized for AI applications.',
    metadata: { source: 'documentation', category: 'database' }
  })
];

await vectorStore.addDocuments(documents);

// Perform similarity search
const results = await vectorStore.similaritySearch('What is LangChain?', 5);
console.log(results);

Configuration Options

LambdaDBConfig

| Option | Type | Required | Description | |--------|------|----------|-------------| | projectApiKey | string | ✅ | Your LambdaDB project API key | | collectionName | string | ✅ | Name of the collection to use | | vectorDimensions | number | ✅ | Vector dimensions for embeddings | | similarityMetric | SimilarityMetric | ❌ | Similarity metric (default: 'cosine') | | baseUrl | string | ❌ | API base URL (e.g. https://api.lambdadb.ai). Preferred with projectName. | | projectName | string | ❌ | Project name (path under /projects/). Preferred with baseUrl. | | serverURL | string | ❌ | Deprecated. Full server URL override. Prefer baseUrl + projectName. | | textField | string | ❌ | Field name for document content (default: 'content') | | vectorField | string | ❌ | Field name for vectors (default: 'vector') | | validateCollection | boolean | ❌ | Validate collection before operations (default: false) | | defaultConsistentRead | boolean | ❌ | Use consistent reads by default (default: true) | | retryOptions | RetryOptions | ❌ | Configure retry behavior with exponential backoff | | partitionConfig | PartitionConfigOption | ❌ | Optional partition config for collection creation |

Similarity Metrics

'cosine' - Cosine similarity (default, recommended for most use cases)
'euclidean' - Euclidean distance
'dot_product' - Dot product similarity

Usage Examples

Basic Vector Search

import { LambdaDBVectorStore } from 'langchain-lambdadb';
import { OpenAIEmbeddings } from '@langchain/openai';

const vectorStore = new LambdaDBVectorStore(
  new OpenAIEmbeddings(),
  {
    projectApiKey: process.env.LAMBDADB_API_KEY!,
    collectionName: 'documents',
    vectorDimensions: 1536,
  }
);

// Search with custom parameters
const results = await vectorStore.similaritySearchWithScore('query text', 10);
results.forEach(([doc, score]) => {
  console.log(`Score: ${score}, Content: ${doc.pageContent}`);
});

Using with Different Embedding Models

import { HuggingFaceTransformersEmbeddings } from '@langchain/community/embeddings/hf_transformers';

// Using Hugging Face embeddings
const embeddings = new HuggingFaceTransformersEmbeddings({
  modelName: 'Xenova/all-MiniLM-L6-v2',
});

const vectorStore = new LambdaDBVectorStore(embeddings, {
  projectApiKey: process.env.LAMBDADB_API_KEY!,
  collectionName: 'hf-documents',
  vectorDimensions: 384, // all-MiniLM-L6-v2 dimensions
  similarityMetric: 'cosine'
});

Creating from Texts and Metadata

// Create vector store from texts
const texts = [
  'The quick brown fox jumps over the lazy dog.',
  'Machine learning is a subset of artificial intelligence.',
  'Vector databases enable efficient similarity search.'
];

const metadatas = [
  { category: 'literature' },
  { category: 'technology' },
  { category: 'database' }
];

const vectorStore = await LambdaDBVectorStore.fromTexts(
  texts,
  metadatas,
  embeddings,
  config
);

Max Marginal Relevance (MMR) Search

// MMR search for diverse results
const mmrResults = await vectorStore.maxMarginalRelevanceSearch(
  'machine learning frameworks', 
  {
    k: 5,        // Number of results to return
    fetchK: 20,  // Number of initial candidates to fetch
    lambda: 0.7  // Balance between relevance (1.0) and diversity (0.0)
  }
);

Advanced Filtering

Search supports server-side filters (LambdaDB syntax) or a client-side function. Prefer server-side for efficiency.

// Server-side: LambdaDB query string (recommended)
const results = await vectorStore.similaritySearchVectorWithScore(
  queryVector,
  5,
  'category:technology'
);

// Server-side: full LambdaDB filter object
const results2 = await vectorStore.similaritySearchVectorWithScore(queryVector, 5, {
  queryString: { query: 'category:technology AND year:2024' },
});

// Client-side: filter function (applied after fetch)
const filterFn = (doc: Document) => doc.metadata?.category === 'technology';
const results3 = await vectorStore.similaritySearchVectorWithScore(queryVector, 5, filterFn);

See LambdaDB Query string for filter syntax.

Deleting Documents

The store implements the LangChain VectorStore delete() interface. You must pass explicit parameters (no default to delete all, to avoid accidental wipe).

By IDs (most efficient when you know the ids):

await vectorStore.delete({ ids: ['id1', 'id2'] });

By LambdaDB filter (recommended when filtering by metadata; server-side, one API call):

// Query string – converted to LambdaDB queryString filter
await vectorStore.delete({ filter: 'genre:documentary AND year:2019' });

// Or full LambdaDB filter object
await vectorStore.delete({
  filter: { queryString: { query: 'genre:documentary AND year:2019' } },
});

See LambdaDB Delete data and Query string for filter syntax.

Delete all documents in the collection (explicit):

await vectorStore.delete({ deleteAll: true });

By client-side filter function (fetches all docs then deletes by ids; use only when LambdaDB filter is not enough):

await vectorStore.delete({
  filter: (doc) => doc.metadata.source === 'legacy',
});

RAG (Retrieval-Augmented Generation) Integration

import { ChatOpenAI } from '@langchain/openai';
import { ConversationalRetrievalQAChain } from 'langchain/chains';

const llm = new ChatOpenAI();
const retriever = vectorStore.asRetriever({
  searchType: 'similarity',
  searchKwargs: { k: 6 }
});

const chain = ConversationalRetrievalQAChain.fromLLM(llm, retriever);

const response = await chain.call({
  question: 'What is the main topic of the documents?',
  chat_history: []
});

API Reference

LambdaDBVectorStore Class

Constructor

new LambdaDBVectorStore(embeddings: EmbeddingsInterface, config: LambdaDBConfig)

Methods

`addDocuments(documents: Document[]): Promise<string[] \| void>`

Adds documents to the vector store with automatic embedding generation. Returns assigned document IDs.

`addVectors(vectors: number[][], documents: Document[]): Promise<string[] \| void>`

Adds pre-computed vectors with associated documents. Returns assigned document IDs.

`similaritySearch(query: string, k?: number, filter?: DocumentFilter): Promise<Document[]>`

Performs similarity search with a text query.

`similaritySearchVectorWithScore(query: number[], k: number, filter?: DocumentFilter | LambdaDBFilterObject | string): Promise<[Document, number][]>`

Performs similarity search with a vector query, returns documents with similarity scores. Filter: string or LambdaDB object → server-side knn.filter; function → client-side filter after fetch.

`maxMarginalRelevanceSearch(query: string, options?: MMRSearchOptions): Promise<Document[]>`

Performs MMR search using vector similarity: fetches candidates with includeVectors: true and balances relevance to the query with diversity among selected documents (cosine similarity).

`createCollection(options?: Partial<CreateCollectionOptions>): Promise<void>`

Creates a new collection in LambdaDB with proper state monitoring.

`deleteCollection(): Promise<void>`

Deletes the collection from LambdaDB.

`getCollectionInfo(): Promise<CollectionInfo>`

Returns information about the collection including status and document count.

`delete(_params?: Record<string, any>): Promise<void>` (LangChain VectorStore interface)

Deletes documents. Requires explicit params (no default). Use one of:

{ ids: string[] } – delete by document IDs
{ filter: string | LambdaDBFilterObject } – server-side delete (recommended); string is used as queryString.query
{ filter: (doc: Document) => boolean } – client-side filter (fetches all, then deletes by ids)
{ deleteAll: true } – delete all documents in the collection

`deleteDocuments(options: DeleteOptions): Promise<void>`

Lower-level delete with the same options as delete(): ids, filter (string, LambdaDB object, or function), or deleteAll: true.

Static Factory Methods

`fromTexts(texts: string[], metadatas: object[] | object, embeddings: EmbeddingsInterface, config: LambdaDBConfig): Promise<LambdaDBVectorStore>`

Creates a vector store from an array of texts.

`fromDocuments(docs: Document[], embeddings: EmbeddingsInterface, config: LambdaDBConfig): Promise<LambdaDBVectorStore>`

Creates a vector store from an array of documents.

Environment Variables

You can set your LambdaDB credentials using environment variables:

export LAMBDADB_API_KEY="your-api-key-here"
export LAMBDADB_SERVER_URL="https://your-instance.lambdadb.ai"  # Optional

Error Handling

The library provides comprehensive error handling:

try {
  await vectorStore.addDocuments(documents);
} catch (error) {
  if (error.message.includes('LambdaDB Error')) {
    console.error('LambdaDB service error:', error.message);
  } else if (error.message.includes('Vector dimension mismatch')) {
    console.error('Embedding dimension error:', error.message);
  } else {
    console.error('Unexpected error:', error.message);
  }
}

Development

Running Tests

# Run all tests
npm test

# Run only unit tests
npm run test:unit

# Run only integration tests (requires LAMBDADB_API_KEY)
npm run test:integration

Integration Tests: Set LAMBDADB_API_KEY and optionally LAMBDADB_SERVER_URL to run integration tests against real LambdaDB service.

Building

npm run build

Linting

npm run lint

Contributing

Fork the repository
Create your feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'Add some amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

License

This project is licensed under the MIT License - see the LICENSE file for details.

Implementation Details

Key Features Implemented

Eventual Consistency Handling: Uses consistentRead: true by default for immediate consistency
Collection State Management: Proper waiting for collection to become ACTIVE before operations
Error Handling: Comprehensive error handling with retry logic and exponential backoff
Field Name Configuration: Supports custom field names for text and vector data
Batch Processing: Efficient bulk operations with proper error handling
MMR: Vector-based MMR with includeVectors: true and cosine similarity for relevance/diversity balance
Client options: Prefer baseUrl + projectName; serverURL supported but deprecated
Test Coverage: Unit and integration tests covering core functionality and edge cases

LambdaDB Integration Notes

Uses KNN query format: { knn: { field, queryVector, k } }
Prefer baseUrl + projectName; use serverURL (exact name, not serverUrl) only if overriding full URL
Supports immediate consistency with consistentRead: true
Collection creation includes state polling until ACTIVE; optional partitionConfig supported
Delete: Prefer server-side filter (filter as string or LambdaDB object) for efficiency; deleteAll: true uses LambdaDB filter { queryString: { query: "*:*" } }. Delete data

Support

If you encounter any issues or have questions: