npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2025 – Pkg Stats / Ryan Hefner

advanced-github-db

v1.0.2

Published

A MongoDB-like database built on GitHub repos with TypeScript support, featuring document/collection modes, format conversion, and atomic transactions

Downloads

14

Readme

Advanced GitHub DB

A production-ready, MongoDB-like database built directly on GitHub repositories. Store, query, and manage JSON/JSONL documents with atomic transactions, field-level encryption, and multi-format support. Perfect for applications needing simple, version-controlled database storage with the reliability of GitHub's infrastructure.

Features

  • 🗄️ Dual Storage Modes: Document mode (MongoDB-style) and Collection mode (batch operations)
  • 🔄 Multi-Format Support: JSON, JSONL, CSV, and YAML with automatic conversion
  • 🔒 Field-Level Encryption: AES-256 encryption for sensitive fields
  • ⚡ Atomic Transactions: Safe transfers between files with rollback protection
  • 📊 Advanced Querying: In-memory filtering with predicate functions
  • 🚀 Optimized Performance: Smart caching, batching, and rate limit management
  • 💪 Full TypeScript Support: Complete type safety throughout the API
  • 🔍 Automatic Sharding: Repository size monitoring and auto-sharding
  • 📝 GitHub Native: Leverages GitHub's version control and collaboration features

Installation

npm install advanced-github-db

Quick Start

import { GitHubDB } from 'advanced-github-db';

// Initialize the database
const db = new GitHubDB({
  token: process.env.GITHUB_TOKEN!,
  repo: 'your-username/your-repo', // Format: "owner/repo"
  mode: 'document', // or 'collection'
  cache: true,
  batchSize: 50,
  format: 'jsonl' // Default format for new files
});

await db.initialize();

// Create a collection
await db.createCollection('users', { format: 'jsonl' });

// Store documents with encrypted fields
await db.set('users/alice.json', {
  id: 'user_001',
  name: 'Alice',
  email: '[email protected]',
  password: 'secret123',
  age: 28
}, { hashFields: ['password'] });

// Query documents
const adults = await db.query('users', doc => doc.age >= 18);

Configuration

GitHubDB Constructor Options

interface GitHubDBConfig {
  // Required
  token: string;           // GitHub Personal Access Token
  repo: string;           // Repository name (format: "owner/repo")
  
  // Optional
  mode?: 'document' | 'collection';  // Default: 'document'
  cache?: boolean;                   // Default: true
  batchSize?: number;                // Default: 50
  format?: 'json' | 'jsonl';         // Default: 'json'
  baseBranch?: string;               // Default: 'main'
  owner?: string;                    // Auto-extracted from repo if not provided
}

Environment Variables

# Required
GITHUB_TOKEN=ghp_your_personal_access_token_here

# Optional
GITHUB_REPO=your-username/your-repo

API Reference

Core Database Operations

initialize()

Validates configuration and GitHub access. Must be called before other operations.

await db.initialize();
// Throws error if token is invalid or repo doesn't exist

createCollection(collectionName, options)

Creates a new collection (directory) with sample file.

await db.createCollection('users', { format: 'jsonl' });
await db.createCollection('logs', { format: 'json' });

// Options
interface CreateCollectionOptions {
  format?: 'json' | 'jsonl'; // Default: uses db config format
}

listCollections()

Returns array of all collection names in the repository.

const collections = await db.listCollections();
// Returns: ['users', 'logs', 'products']

Document Operations

set(path, data, options)

Creates or updates a document at the specified path.

// Document mode - creates individual files
await db.set('users/alice.json', {
  name: 'Alice',
  email: '[email protected]',
  password: 'secret123'
}, { 
  hashFields: ['password'],
  format: 'json' // Override default format
});

// Collection mode - appends to collection file
await db.set('users/all.jsonl', {
  name: 'Bob',
  email: '[email protected]'
});

get(path, options)

Retrieves a document from the specified path.

const user = await db.get('users/alice.json', {
  unhashFields: ['password'] // Decrypt encrypted fields
});

// Returns: { name: 'Alice', email: '[email protected]', password: 'secret123' }

delete(path)

Deletes a document from the repository.

await db.delete('users/alice.json');

Collection Operations

append(collectionPath, data, options)

Appends a document to a collection file (JSONL format recommended).

await db.append('users/logins.jsonl', {
  userId: 'user_001',
  action: 'login',
  timestamp: new Date().toISOString(),
  ip: '192.168.1.1'
}, { 
  hashFields: ['ip'] // Encrypt sensitive fields
});

getCollection(collectionPath, options)

Retrieves all documents from a collection file.

const users = await db.getCollection('users/all.jsonl', {
  unhashFields: ['password']
});

// Returns: Array of user objects

remove(collectionPath, predicate)

Removes documents from a collection that match the predicate.

// Remove inactive users
await db.remove('users/all.jsonl', user => user.status === 'inactive');

// Remove documents older than 30 days
const thirtyDaysAgo = new Date(Date.now() - 30 * 24 * 60 * 60 * 1000);
await db.remove('logs/events.jsonl', log => new Date(log.timestamp) < thirtyDaysAgo);

Query Operations

query(collectionPath, predicate)

Queries a collection with a filter function.

// Simple equality check
const alice = await db.query('users', user => user.name === 'Alice');

// Complex conditions
const activeAdults = await db.query('users', user => 
  user.age >= 18 && user.status === 'active'
);

// Date range queries
const recentLogs = await db.query('logs', log =>
  new Date(log.timestamp) > new Date('2024-01-01')
);

// Array operations
const premiumUsers = await db.query('users', user =>
  ['premium', 'vip'].includes(user.tier)
);

Advanced Query Examples

// Chaining multiple conditions
const results = await db.query('products', product => 
  product.price > 100 && 
  product.category === 'electronics' &&
  product.stock > 0 &&
  product.tags.includes('featured')
);

// Text search (case insensitive)
const searchResults = await db.query('articles', article =>
  article.title.toLowerCase().includes('javascript') ||
  article.content.toLowerCase().includes('typescript')
);

// Nested object queries
const usersWithAddress = await db.query('users', user =>
  user.address?.city === 'New York' && user.address.zipCode.startsWith('100')
);

Transaction Operations

transfer(sourcePath, destPath, options)

Atomically transfers documents between files with optional format conversion.

// Basic transfer
await db.transfer('users/new.jsonl', 'users/processed.jsonl', {
  predicate: user => user.processed === true
});

// Transfer with format conversion
await db.transfer('logs/raw.json', 'logs/processed.jsonl', {
  predicate: log => log.severity === 'ERROR',
  conversion: {
    sourceFormat: 'json',
    targetFormat: 'jsonl',
    fieldMapping: {
      'log_id': 'id',
      'message_content': 'message'
    }
  }
});

// Transfer with data transformation
await db.transfer('products/raw.jsonl', 'products/enriched.jsonl', {
  predicate: product => product.price > 0,
  transform: product => ({
    ...product,
    price: Math.round(product.price * 100), // Convert to cents
    currency: 'USD',
    lastUpdated: new Date().toISOString()
  })
});

Transfer Options

interface TransferOptions {
  predicate: (doc: any) => boolean;    // Required: Filter function
  conversion?: ConversionOptions;      // Optional: Format conversion
  transform?: (doc: any) => any;       // Optional: Data transformation
  batchSize?: number;                  // Optional: Batch processing size
}

interface ConversionOptions {
  sourceFormat: 'json' | 'jsonl' | 'csv' | 'yaml';
  targetFormat: 'json' | 'jsonl' | 'csv' | 'yaml';
  fieldMapping?: Record<string, string>;  // Rename fields during conversion
  filter?: (doc: any) => boolean;         // Additional filtering
  transform?: (doc: any) => any;          // Additional transformation
  timestampField?: string;                // Add timestamp field
}

Utility Operations

autoShardIfSizeExceeded(maxSizeBytes)

Automatically creates new repositories when current repo approaches size limits.

// Shard if repository exceeds 4GB
await db.autoShardIfSizeExceeded(4 * 1024 * 1024 * 1024);

flushCache()

Flushes all pending write operations from the cache queue.

await db.flushCache();

getRateLimitStatus()

Returns current GitHub API rate limit information.

const limits = db.getRateLimitStatus();
console.log(`Remaining requests: ${limits.remaining}`);
console.log(`Reset time: ${new Date(limits.reset * 1000).toISOString()}`);

Field-Level Encryption

Hashing Options

// Write with encryption
await db.set('users/sensitive.json', {
  username: 'alice',
  password: 'secret123',
  apiKey: 'sk_live_123456',
  socialSecurity: '123-45-6789'
}, {
  hashFields: ['password', 'apiKey', 'socialSecurity']
});

// Read with decryption
const user = await db.get('users/sensitive.json', {
  unhashFields: ['password', 'apiKey'] // Only decrypt specific fields
});

Custom Encryption Key

import { HashManager } from 'advanced-github-db';

// Use custom encryption key
const hashManager = new HashManager('your-custom-32-byte-encryption-key');

// The library automatically generates a secure key if not provided

Format Conversion

Supported Formats

  • JSON: Standard JSON format, good for small datasets
  • JSONL: JSON Lines format, perfect for large append-only collections
  • CSV: Comma-separated values, great for spreadsheet integration
  • YAML: Human-readable format, good for configuration files

Conversion Examples

// JSON to JSONL
await db.transfer('data/config.json', 'data/stream.jsonl', {
  predicate: doc => doc.active === true,
  conversion: {
    sourceFormat: 'json',
    targetFormat: 'jsonl'
  }
});

// JSONL to CSV with field mapping
await db.transfer('logs/events.jsonl', 'logs/events.csv', {
  predicate: doc => doc.timestamp > '2024-01-01',
  conversion: {
    sourceFormat: 'jsonl',
    targetFormat: 'csv',
    fieldMapping: {
      'event_id': 'id',
      'event_type': 'type',
      'event_timestamp': 'timestamp'
    }
  }
});

// CSV to JSON with transformation
await db.transfer('import/products.csv', 'database/products.json', {
  predicate: doc => doc.price > 0,
  conversion: {
    sourceFormat: 'csv',
    targetFormat: 'json',
    transform: doc => ({
      ...doc,
      price: parseFloat(doc.price),
      inStock: doc.stock > 0
    })
  }
});

Storage Architecture

Repository Structure

your-repo/
├── users/
│   ├── alice.json              # Document mode (individual files)
│   ├── bob.json
│   └── all.jsonl               # Collection mode (batch file)
├── logs/
│   ├── 2024-01-access.jsonl
│   └── 2024-01-errors.json
├── products/
│   └── catalog.jsonl
└── .index/                     # Internal indexes (auto-generated)
    └── users_index.json

Storage Modes

Document Mode (MongoDB-style)

  • One file per document
  • Best for: Random access, concurrent updates, large documents
  • File extensions: .json, .jsonl

Collection Mode (Batch style)

  • Multiple documents per file
  • Best for: Append-heavy workloads, batch processing, logs
  • File extensions: .jsonl (recommended), .json

Performance Optimization

Caching Strategy

const db = new GitHubDB({
  token: process.env.GITHUB_TOKEN!,
  repo: 'your/repo',
  cache: true,           // Enable read caching
  batchSize: 100         // Batch write operations
});

// Manual cache control
await db.flushCache();   // Force write all pending changes

Rate Limit Management

// Monitor rate limits
const limits = db.getRateLimitStatus();
if (limits.remaining < 100) {
  console.log('Approaching rate limit, slowing down operations');
  await new Promise(resolve => setTimeout(resolve, 1000));
}

Batch Operations

// Process large datasets in batches
const largeDataset = [...]; // Thousands of documents

for (let i = 0; i < largeDataset.length; i += 100) {
  const batch = largeDataset.slice(i, i + 100);
  
  await Promise.all(
    batch.map(doc => 
      db.append('data/collection.jsonl', doc)
    )
  );
  
  // Respect rate limits
  if (i % 1000 === 0) {
    await new Promise(resolve => setTimeout(resolve, 100));
  }
}

Advanced Usage

Custom Formatters

import { FormatConverter, Formatter } from 'advanced-github-db';

class CustomFormatter implements Formatter {
  parse(content: string): any[] {
    // Custom parsing logic
    return content.split('|').map(item => JSON.parse(item));
  }
  
  serialize(documents: any[]): string {
    return documents.map(doc => JSON.stringify(doc)).join('|');
  }
  
  getFileExtension(): string {
    return 'custom';
  }
}

// Register custom formatter
const converter = new FormatConverter();
converter.registerFormatter('custom', new CustomFormatter());

Error Handling

try {
  await db.initialize();
  await db.set('data/doc.json', { value: 'test' });
} catch (error) {
  if (error.message.includes('rate limit')) {
    // Handle rate limits
    console.log('Rate limit exceeded, retrying later');
  } else if (error.message.includes('not found')) {
    // Handle missing files/repos
    console.log('Repository or file not found');
  } else if (error.message.includes('invalid token')) {
    // Handle authentication issues
    console.log('Invalid GitHub token');
  } else {
    // Generic error handling
    console.error('Operation failed:', error.message);
  }
}

Integration with Web Frameworks

Express.js Example

import express from 'express';
import { GitHubDB } from 'advanced-github-db';

const app = express();
const db = new GitHubDB({
  token: process.env.GITHUB_TOKEN!,
  repo: 'your-username/your-api-data'
});

await db.initialize();

app.get('/users/:id', async (req, res) => {
  try {
    const user = await db.get(`users/${req.params.id}.json`);
    if (!user) {
      return res.status(404).json({ error: 'User not found' });
    }
    res.json(user);
  } catch (error) {
    res.status(500).json({ error: error.message });
  }
});

app.post('/users', async (req, res) => {
  try {
    await db.set(`users/${req.body.id}.json`, req.body, {
      hashFields: ['password', 'email']
    });
    res.status(201).json({ success: true });
  } catch (error) {
    res.status(500).json({ error: error.message });
  }
});

Monitoring and Debugging

Logging Operations

// Add logging to track operations
const originalSet = db.set.bind(db);
db.set = async (path, data, options) => {
  console.log(`Setting document: ${path}`);
  const start = Date.now();
  try {
    const result = await originalSet(path, data, options);
    console.log(`Document set successfully in ${Date.now() - start}ms`);
    return result;
  } catch (error) {
    console.error(`Failed to set document ${path}:`, error);
    throw error;
  }
};

Performance Monitoring

// Track operation timings
const timings = {
  set: [],
  get: [],
  query: []
};

// Wrap operations to collect metrics
const trackTiming = (operation, fn) => async (...args) => {
  const start = Date.now();
  try {
    const result = await fn(...args);
    timings[operation].push(Date.now() - start);
    return result;
  } catch (error) {
    timings[operation].push({ error: error.message, duration: Date.now() - start });
    throw error;
  }
};

db.set = trackTiming('set', db.set);
db.get = trackTiming('get', db.get);

Best Practices

1. Repository Organization

  • Use descriptive collection names
  • Group related data in the same collection
  • Use subdirectories for complex data hierarchies

2. File Format Selection

  • Use JSONL for large, append-only datasets
  • Use JSON for small datasets or configuration files
  • Consider CSV for data that needs spreadsheet integration

3. Security

  • Always encrypt sensitive fields (passwords, API keys, PII)
  • Use fine-grained GitHub tokens with minimal permissions
  • Regularly rotate encryption keys in production

4. Performance

  • Enable caching for read-heavy workloads
  • Use appropriate batch sizes for your rate limits
  • Monitor GitHub API rate limits and adjust accordingly

5. Error Handling

  • Always wrap operations in try-catch blocks
  • Implement retry logic for rate limit errors
  • Validate data before writing to the repository

🐛 Troubleshooting

Common Issues

Rate Limit Errors

// Solution: Implement retry logic
const withRetry = async (operation, maxRetries = 3) => {
  for (let i = 0; i < maxRetries; i++) {
    try {
      return await operation();
    } catch (error) {
      if (error.message.includes('rate limit') && i < maxRetries - 1) {
        await new Promise(resolve => setTimeout(resolve, 1000 * (i + 1)));
      } else {
        throw error;
      }
    }
  }
};

await withRetry(() => db.set('data/doc.json', { value: 'test' }));

File Not Found Errors

// Solution: Check file existence before operations
const safeGet = async (path) => {
  try {
    return await db.get(path);
  } catch (error) {
    if (error.message.includes('not found')) {
      return null;
    }
    throw error;
  }
};

Authentication Errors

  • Verify your GitHub token has repo permissions
  • Check token expiration
  • Ensure repository exists and is accessible

License

MIT License - see LICENSE file for details.

🤝 Contributing

Contributions welcome! Please feel free to submit pull requests or open issues for bugs and feature requests.

Support