@andrejs1979/document

v1.0.0

Published

6 months ago

MongoDB-compatible document database for NoSQL

0High
0Medium
0Low

andrejs1979

document-database mongodb nosql json schema-free

NoSQL - Document Module

A comprehensive MongoDB-compatible document database built for Cloudflare's edge infrastructure, featuring advanced vector integration, intelligent indexing, and real-time capabilities.

Features

🚀 MongoDB Compatibility

Full MongoDB query language support
CRUD operations (Create, Read, Update, Delete)
Advanced aggregation pipelines
Complex query operators ($and, $or, $in, $regex, etc.)
GridFS-like large document support with R2 integration

🔍 Hybrid Search

Text search with full-text indexing
Vector similarity search
Semantic search using embeddings
Multi-modal search (text, image, audio)
Personalized recommendations
Similar document discovery

🔗 Relationships

Define relationships between collections
Automatic population of related documents
One-to-one, one-to-many, many-to-many relationships
Cascade operations with referential integrity
Deep population with configurable depth

⚡ Performance

Intelligent auto-indexing based on query patterns
Dynamic field indexing for optimal performance
Query plan optimization and explanation
Caching with TTL support
Connection pooling and batch operations

📊 Bulk Operations

High-performance bulk writes
Streaming inserts for large datasets
Real-time document streams
Parallel processing with configurable concurrency
Error handling and recovery

🏷️ Smart Tagging

Automatic content-based tagging
Hierarchical tag systems
Tag recommendations and suggestions
Bulk tagging operations
Tag analytics and cleanup

📈 Analytics & Monitoring

Real-time performance metrics
Query pattern analysis
Index usage statistics
Document lifecycle tracking
Health monitoring and diagnostics

Quick Start

Installation

import { EdgeDocumentDB } from './src/document';

// Create database instance
const db = await EdgeDocumentDB.create({
  name: 'my_app_db',
  d1Database: env.DB,      // Cloudflare D1 binding
  kvStore: env.KV,         // Cloudflare KV binding
  r2Bucket: env.BUCKET,    // Cloudflare R2 binding
  options: {
    enableAutoIndexing: true,
    enableRelationships: true,
    vectorConfig: {
      enabled: true,
      autoEmbedding: true
    }
  }
});

Basic Operations

// Insert documents
const user = await db.insertOne('users', {
  name: 'John Doe',
  email: '[email protected]',
  tags: ['developer', 'javascript']
});

// Query documents
const users = await db.find('users', {
  tags: { $in: ['developer'] },
  createdAt: { $gte: new Date('2024-01-01') }
}, {
  sort: { name: 1 },
  limit: 10
});

// Update documents
await db.updateOne('users', 
  { email: '[email protected]' },
  { $set: { lastLogin: new Date() } }
);

// Aggregation pipeline
const stats = await db.aggregate('users', [
  { $match: { active: true } },
  { $group: { _id: '$department', count: { $sum: 1 } } },
  { $sort: { count: -1 } }
]);

Hybrid Search

// Text search
const articles = await db.textSearch('content', 'machine learning', {
  filters: { category: 'tech' },
  limit: 10
});

// Vector search
const similar = await db.vectorSearch('content', embeddings, {
  threshold: 0.7,
  limit: 5
});

// Semantic search
const semantic = await db.semanticSearch('content', 
  'artificial intelligence neural networks', {
    textWeight: 0.3,
    vectorWeight: 0.7
  }
);

// Hybrid search combining multiple signals
const hybrid = await db.hybridSearch('content', {
  text: 'react development',
  vector: queryEmbedding,
  filter: { publishedAt: { $gte: recentDate } },
  weights: { text: 0.4, vector: 0.6, metadata: 0.0 }
});

Relationships

// Define relationships
await db.defineRelationship('posts', 'users', {
  type: 'manyToOne',
  localField: 'authorId',
  foreignField: '_id',
  foreignCollection: 'users'
});

// Query with population
const posts = await db.findWithPopulate('posts', {}, [
  {
    path: 'authorId',
    select: 'name email avatar'
  },
  {
    path: 'comments',
    match: { approved: true },
    options: { sort: { createdAt: -1 }, limit: 5 }
  }
]);

Bulk Operations

// Bulk write operations
const bulkOps = [
  { insertOne: { document: { name: 'User 1' } } },
  { updateOne: { filter: { name: 'User 2' }, update: { $set: { active: true } } } },
  { deleteOne: { filter: { name: 'User 3' } } }
];

const result = await db.bulkWrite('users', bulkOps);

// Streaming inserts
async function* generateData() {
  for (let i = 0; i < 100000; i++) {
    yield { id: i, value: Math.random() };
  }
}

const streamResult = await db.streamInsert('analytics', generateData(), {
  batchSize: 1000,
  onProgress: (inserted, total) => console.log(`${inserted}/${total}`)
});

Indexing

// Create indexes
await db.createIndex('products', {
  key: { category: 1, price: -1 },
  options: { name: 'category_price_idx' }
});

// Text index
await db.createIndex('articles', {
  key: { title: 'text', content: 'text' },
  options: { weights: { title: 10, content: 5 } }
});

// Vector index
await db.createIndex('embeddings', {
  key: { vector: 'vector' },
  options: {
    vectorOptions: {
      dimensions: 1536,
      similarity: 'cosine',
      type: 'hnsw'
    }
  }
});

// Auto-indexing
await db.autoCreateIndexes('products');

// Get recommendations
const recommendations = await db.getIndexRecommendations('products');

Tagging System

// Auto-tag documents
const tags = await db.autoTag('articles', document, {
  tagSources: ['content', 'metadata'],
  customTagger: (doc) => {
    const tags = [];
    if (doc.content?.includes('React')) tags.push('react');
    return tags;
  }
});

// Apply tags
await db.tagDocument('articles', documentId, tags);

// Find by tags
const taggedDocs = await db.findByTags('articles', ['react', 'typescript'], {
  operator: 'and',
  includeHierarchy: true
});

// Tag statistics
const tagStats = await db.getTagStats('articles', {
  sortBy: 'count',
  limit: 20
});

Advanced Features

Vector Integration

The document module seamlessly integrates with NoSQL's vector capabilities:

// Documents with embedded vectors
const document = {
  title: 'AI Research Paper',
  content: 'Latest advances in machine learning...',
  _vector: {
    id: 'doc1',
    data: new Float32Array([0.1, 0.2, 0.3, ...]), // 1536 dimensions
    metadata: { model: 'text-embedding-ada-002' }
  }
};

// Automatic embedding generation
const db = await EdgeDocumentDB.create({
  name: 'ai_db',
  d1Database: env.DB,
  options: {
    vectorConfig: {
      enabled: true,
      autoEmbedding: true,
      embeddingFields: ['content', 'title'],
      defaultModel: 'text-embedding-ada-002'
    }
  }
});

Real-time Streams

// Create real-time document stream
const stream = db.createDocumentStream('events', {
  batchSize: 100,
  flushInterval: 5000,
  compression: true,
  transform: (doc) => ({
    ...doc,
    processedAt: new Date()
  }),
  errorHandler: (error, batch) => {
    console.error('Stream error:', error);
    // Implement retry logic or dead letter queue
  }
});

// Write to stream
await stream.write({
  event: 'user_action',
  userId: 'user123',
  action: 'click',
  timestamp: new Date()
});

// Stop stream
await stream.stop();

Performance Monitoring

// Database statistics
const stats = await db.stats();
console.log('Total documents:', stats.totalDocuments);
console.log('Index count:', stats.indexCount);

// Query performance
const queryMetrics = db.queryEngine.getQueryMetrics();
const slowQueries = queryMetrics.filter(m => m.latency > 100);

// Index usage
const indexStats = await db.indexManager.getIndexStats('mydb', 'collection');
console.log('Unused indexes:', indexStats.recommendations);

Configuration Options

interface DocumentDatabaseConfig {
  name: string;
  d1Database: any;          // Cloudflare D1 instance
  kvStore?: any;            // Cloudflare KV store
  r2Bucket?: any;           // Cloudflare R2 bucket
  
  // Performance settings
  maxDocumentSize?: number;       // Default: 16MB
  queryTimeout?: number;          // Default: 30s
  batchSize?: number;             // Default: 100
  
  // Caching
  enableQueryCache?: boolean;     // Default: true
  queryCacheTTL?: number;        // Default: 300s
  cacheSize?: number;            // Default: 100MB
  
  // Indexing
  enableAutoIndexing?: boolean;   // Default: true
  autoIndexThreshold?: number;    // Default: 1000
  maxIndexedFields?: number;      // Default: 20
  
  // Vector integration
  vectorConfig?: {
    enabled?: boolean;             // Default: true
    defaultDimensions?: number;    // Default: 1536
    defaultModel?: string;         // Default: 'text-embedding-ada-002'
    autoEmbedding?: boolean;       // Default: false
    embeddingFields?: string[];    // Default: ['content', 'text']
  };
  
  // Features
  enableValidation?: boolean;     // Default: true
  enableSchemaEvolution?: boolean; // Default: true
  enableChangeStreams?: boolean;  // Default: true
  enableRelationships?: boolean;  // Default: true
  enableQueryLogging?: boolean;   // Default: false
  enablePerformanceMetrics?: boolean; // Default: true
  
  // Bulk operations
  bulkWriteBatchSize?: number;    // Default: 1000
  bulkWriteParallelism?: number;  // Default: 4
}

Architecture

The document module is built with a modular architecture:

src/document/
├── edge-document-db.ts          # Main database class
├── types.ts                     # Type definitions
├── storage/
│   └── document-storage.ts      # Core storage engine
├── operations/
│   ├── query-engine.ts          # MongoDB query processing
│   ├── hybrid-search.ts         # Hybrid search engine
│   └── bulk-operations.ts       # Bulk and streaming operations
├── indexes/
│   └── index-manager.ts         # Intelligent indexing
├── relationships/
│   └── relationship-manager.ts  # Document relationships
├── metadata/
│   └── tagging-system.ts        # Smart tagging system
└── examples/
    └── basic-usage.ts           # Comprehensive examples

Performance Characteristics

Latency Targets

Simple queries: < 10ms p99
Complex aggregations: < 100ms p99
Bulk operations: 10,000+ docs/second
Vector similarity: < 50ms p99

Scalability

Documents: Unlimited (distributed across D1 + R2)
Collections: Unlimited
Indexes: 20 dynamic indexes per collection
Concurrent operations: 1000+ per database

Storage Efficiency

Automatic compression for large documents
Intelligent caching with LRU eviction
Vector quantization for storage optimization
Tiered storage (D1 for metadata, R2 for large docs)

Best Practices

Query Optimization

// Use indexes effectively
await db.find('products', {
  category: 'electronics',  // Indexed field
  price: { $gte: 100 }     // Indexed field
});

// Limit results
await db.find('products', filter, { limit: 20 });

// Use projection to reduce data transfer
await db.find('products', filter, {
  projection: { name: 1, price: 1 }
});

// Explain queries for optimization
const explanation = await db.explain('products', filter);

Memory Management

// Use streaming for large datasets
const stream = db.createDocumentStream('logs', {
  batchSize: 1000,
  flushInterval: 5000
});

// Clear caches periodically
db.hybridSearchEngine.clearSearchCache();
db.relationshipManager.clearPopulateCache();

Error Handling

try {
  await db.insertOne('users', document);
} catch (error) {
  if (error instanceof DuplicateKeyError) {
    // Handle duplicate key
  } else if (error instanceof ValidationError) {
    // Handle validation error
  } else {
    // Handle other errors
  }
}

Integration with NoSQL

The document module is designed to work seamlessly with other NoSQL modules:

// Unified database instance
import { NoSQLDB } from '../index';

const vectorDB = new NoSQLDB({
  name: 'unified_db',
  d1Database: env.DB,
  kvStore: env.KV,
  r2Bucket: env.BUCKET
});

// Access document operations
const documentDB = vectorDB.documents();
await documentDB.insertOne('content', document);

// Access vector operations
const vectorStore = vectorDB.vectors();
await vectorStore.addVectors(vectors);

// Hybrid operations
const results = await documentDB.hybridSearch('content', {
  text: 'search query',
  vector: queryVector
});

Migration from MongoDB

The document module provides a migration-friendly API:

// MongoDB equivalent operations
const collection = db.collection('users');

// NoSQL
const users = await db.find('users', filter, options);
const user = await db.findOne('users', filter);
const result = await db.insertOne('users', document);
const updateResult = await db.updateMany('users', filter, update);
const deleteResult = await db.deleteOne('users', filter);

// Aggregation pipelines work identically
const pipeline = [
  { $match: { active: true } },
  { $group: { _id: '$department', count: { $sum: 1 } } }
];
const results = await db.aggregate('users', pipeline);

Monitoring and Observability

// Performance metrics
const metrics = db.queryEngine.getQueryMetrics();
const slowQueries = metrics.filter(m => m.latency > 100);

// Index recommendations
const recommendations = await db.getIndexRecommendations('collection');

// Database health
const isHealthy = await db.ping();

// Resource usage
const cacheStats = db.hybridSearchEngine.getSearchCacheStats();
const populateStats = db.relationshipManager.getPopulateCacheStats();

License

Part of NoSQL - distributed under the same license as the main project.