@rapidrabbit/gitdb
v2.0.0
Published
A powerful, flexible database module for storing data in various formats with local file and GitHub storage options
Downloads
8
Maintainers
Readme
GitDB - Generic Database Module
A powerful, flexible database module for storing data in various formats (JSON, CSV, YAML, XML) with support for both local file storage and GitHub/GitLab integration. Features ACID compliance, encryption, backups, and advanced filtering capabilities.
Features
- Multiple Formats: JSON, CSV, YAML, XML
- Storage Options: Local file system or GitHub/GitLab
- ACID Compliance: Atomic operations with locking and rollback
- Encryption: Optional AES-256 encryption for data security
- Backups: Automatic backup creation with version history
- Advanced Filtering: Functional programming methods and SQL-like queries
- CRUD Operations: Create, Read, Update, Delete with bulk operations
- Data Integrity: Checksums and validation
- Format Conversion: Convert between different data formats
- Statistics: Database analytics and performance metrics
- Version History: Complete commit history tracking (GitHub storage)
- Record History: Track changes to individual records over time
- Field History: Monitor changes to specific fields in records
- History Search: Search and filter commit history by various criteria
- Revert Capability: Rollback to any previous commit
Installation
npm install gitdbOptional Dependencies
For additional format support, install these packages:
# For YAML support
npm install js-yaml
# For CSV support
npm install csv-parser csv-stringify
# For XML support
npm install xml2js
# For GitHub storage
npm install @octokit/restQuick Start
const GitDB = require('gitdb');
// Local JSON storage
const db = new GitDB({
storageType: 'local',
format: 'json',
repoPath: './data'
});
// GitHub storage
const githubDb = new GitDB({
storageType: 'github',
format: 'json',
owner: 'git-as-a-db',
repo: 'your-repo',
token: 'your-github-token',
path: 'data/database.json'
});
// Basic operations
await db.create('users', { name: 'John', email: '[email protected]' });
const users = await db.read('users');
const user = await db.readById('users', 'user-id');
await db.update('users', 'user-id', { name: 'Jane' });
await db.delete('users', 'user-id');Configuration
Storage Options
Local Storage
const db = new GitDB({
storageType: 'local',
format: 'json',
repoPath: './data', // Data directory
dataFile: 'database.json', // Data file name
backupDir: './backups', // Backup directory
lockFile: '.db.lock', // Lock file name
autoBackup: true, // Enable automatic backups
encryptionKey: 'your-key' // Optional encryption
});GitHub Storage
const db = new GitDB({
storageType: 'github',
format: 'json',
owner: 'git-as-a-db', // GitHub username
repo: 'your-repo', // Repository name
branch: 'main', // Branch name
path: 'data/database.json', // File path in repo
token: 'your-github-token', // GitHub personal access token
autoBackup: true
});Format Options
JSON (Default)
const db = new GitDB({ format: 'json' });CSV
const db = new GitDB({
format: 'csv',
csvOptions: {
headers: true,
delimiter: ','
}
});YAML
const db = new GitDB({ format: 'yaml' });XML
const db = new GitDB({
format: 'xml',
xmlOptions: {
rootName: 'database',
itemName: 'item',
collectionName: 'collection'
}
});Data Processing & Performance
How GitDB Processes Data
GitDB uses an in-memory processing approach where the entire data file is loaded into memory for all operations. This design choice has specific implications for performance and scalability.
Data Flow
// 1. READ - Loads entire file into memory
async read(collection, query = {}) {
const data = await this.readDataFile(); // ← Loads ALL data
let items = data[collection]; // ← Gets entire collection
// 2. FILTER - Processes everything in memory
if (Object.keys(query).length > 0) {
items = items.filter(item => { // ← Filters in memory
return Object.entries(query).every(([key, value]) => {
if (typeof value === 'function') {
return value(item[key]);
}
return item[key] === value;
});
});
}
return items;
}Performance Characteristics
✅ Advantages
- Fast for small datasets (< 10,000 items): ~1-5ms read operations
- Simple and reliable: No complex indexing or query optimization
- Full data access: Can perform any operation on any field
- Consistent performance: Predictable behavior regardless of query complexity
- No external dependencies: Works with any supported file format
- Real-time data: Always works with the latest version of the data
⚠️ Limitations
- Memory intensive: Loads entire dataset into memory
- Slower for large datasets (> 100,000 items): Performance degrades linearly
- Network overhead: Downloads entire file from GitHub for each operation
- No indexing: Linear search through all items for filtering
- Scaling limitations: Not suitable for very large datasets (> 1M items)
📊 Performance Benchmarks
| Dataset Size | Read Time | Filter Time | Memory Usage | |-------------|-----------|-------------|--------------| | 1,000 items | ~1ms | ~2ms | ~2MB | | 10,000 items | ~5ms | ~10ms | ~20MB | | 100,000 items | ~50ms | ~100ms | ~200MB | | 1,000,000 items | ~500ms | ~1000ms | ~2GB |
Benchmarks based on JSON format, local storage, on modern hardware
Storage Type Performance
Local Storage
- Fastest: Direct file system access
- No network latency: All operations are local
- Limited by disk I/O: File size affects read/write speed
- Best for: High-frequency operations, large datasets
GitHub Storage
- Slower: Network API calls required
- Network dependent: Performance varies with connection
- Rate limited: GitHub API has rate limits
- Best for: Version control, collaboration, backup
Format Performance
| Format | Read Speed | Write Speed | File Size | Use Case | |--------|------------|-------------|-----------|----------| | JSON | Fastest | Fastest | Medium | General purpose | | CSV | Fast | Fast | Smallest | Large datasets, analysis | | YAML | Medium | Medium | Large | Configuration, readability | | XML | Slowest | Slowest | Largest | Legacy systems |
Optimization Strategies
For Small to Medium Datasets (< 50,000 items)
1. Use Appropriate Format
// For large datasets, use CSV
const db = new GitDB({
format: 'csv',
csvOptions: { headers: true }
});
// For configuration, use YAML
const configDb = new GitDB({
format: 'yaml'
});2. Optimize Queries
// Good: Select only needed fields
const users = await db.select('users', {
fields: ['name', 'email'],
where: { active: true }
});
// Good: Use specific filters
const engineers = await db.filter('users', user =>
user.department === 'Engineering' && user.active
);
// Avoid: Reading entire collection for simple lookups
const user = await db.readById('users', 'user-id'); // Use this instead3. Batch Operations
// Efficient: Use bulk operations
const operations = [
{ type: 'create', collection: 'users', item: { name: 'Alice' } },
{ type: 'create', collection: 'users', item: { name: 'Bob' } },
{ type: 'update', collection: 'users', id: 'user-1', updates: { active: false } }
];
await db.bulkOperation(operations);For Large Datasets (> 100,000 items)
1. Consider Alternative Solutions
- SQLite: For structured data with complex queries
- MongoDB: For document-based data
- PostgreSQL: For relational data with ACID requirements
2. Implement Caching
class GitDBWithCache extends GitDB {
constructor(options) {
super(options);
this.cache = new Map();
this.cacheTimeout = options.cacheTimeout || 300000; // 5 minutes
}
async read(collection, query = {}) {
const cacheKey = `${collection}-${JSON.stringify(query)}`;
if (this.cache.has(cacheKey)) {
const cached = this.cache.get(cacheKey);
if (Date.now() - cached.timestamp < this.cacheTimeout) {
return cached.data;
}
}
const data = await super.read(collection, query);
this.cache.set(cacheKey, { data, timestamp: Date.now() });
return data;
}
}3. Data Sharding
// Split large collections across multiple files
const userDb1 = new GitDB({ dataFile: 'users-1.json' });
const userDb2 = new GitDB({ dataFile: 'users-2.json' });
// Route queries based on user ID
const getUserDb = (userId) => {
const hash = crypto.createHash('md5').update(userId).digest('hex');
return parseInt(hash, 16) % 2 === 0 ? userDb1 : userDb2;
};For GitHub Storage
1. Minimize API Calls
// Cache frequently accessed data
const cache = new Map();
const getCachedData = async (key) => {
if (cache.has(key)) return cache.get(key);
const data = await db.readDataFile();
cache.set(key, data);
return data;
};2. Use Compression
const db = new GitDB({
storageType: 'github',
format: 'json',
compression: true // Enable compression for large files
});3. Branch-Based Sharding
// Use different branches for different data types
const userDb = new GitDB({
storageType: 'github',
branch: 'users',
path: 'data/users.json'
});
const productDb = new GitDB({
storageType: 'github',
branch: 'products',
path: 'data/products.json'
});When to Use GitDB
✅ Perfect For
- Small to medium datasets (< 50,000 items)
- Configuration management with version control
- Prototyping and development environments
- Simple applications with basic CRUD operations
- Collaborative projects requiring Git-based version control
- Backup and archival systems
- Static site data management
❌ Not Suitable For
- Very large datasets (> 1,000,000 items)
- High-frequency transactions (> 1000 operations/second)
- Complex relational queries requiring joins across many tables
- Real-time applications requiring sub-millisecond response times
- Multi-user concurrent write scenarios
- Applications requiring advanced indexing or query optimization
🔄 Migration Paths
From GitDB to Traditional Databases
// Export data for migration
const data = await db.readDataFile();
await fs.writeFile('migration.json', JSON.stringify(data, null, 2));
// Import into SQLite
const sqlite = require('sqlite3');
const db = new sqlite.Database('migrated.db');
// ... migration logic
// Import into MongoDB
const { MongoClient } = require('mongodb');
const client = new MongoClient('mongodb://localhost:27017');
// ... migration logicAPI Reference
Basic CRUD Operations
Create
// Create a single item
const item = await db.create('collection', {
name: 'John Doe',
email: '[email protected]'
});
// Bulk create
const operations = [
{ type: 'create', collection: 'users', item: { name: 'Alice' } },
{ type: 'create', collection: 'users', item: { name: 'Bob' } }
];
const results = await db.bulkOperation(operations);Read
// Read all items in a collection
const items = await db.read('users');
// Read with simple filtering
const activeUsers = await db.read('users', { active: true });
// Read by ID
const user = await db.readById('users', 'user-id');Update
// Update an item
const updated = await db.update('users', 'user-id', {
name: 'Jane Doe',
updatedAt: new Date().toISOString()
});Delete
// Delete an item
const deleted = await db.delete('users', 'user-id');
// Delete entire collection
const deletedCollection = await db.deleteCollection('users');Advanced Filtering Methods
Functional Programming Methods
// MAP - Transform items
const upperNames = await db.map('users', user => ({
...user,
name: user.name.toUpperCase()
}));
// REDUCE - Aggregate data
const totalSalary = await db.reduce('users', (sum, user) => sum + user.salary, 0);
// FILTER - Filter items
const activeUsers = await db.filter('users', user => user.active);
// FIND - Find first matching item
const firstEngineer = await db.find('users', user => user.department === 'Engineering');
// FIND ALL - Find all matching items
const engineers = await db.findAll('users', user => user.department === 'Engineering');
// SORT - Sort items
const sortedUsers = await db.sort('users', 'age', 'desc');
// LIMIT - Limit results
const topUsers = await db.limit('users', 10, 0);
// GROUP BY - Group items
const groupedByDept = await db.groupBy('users', 'department');Aggregation Methods
// COUNT
const totalUsers = await db.count('users');
const activeCount = await db.count('users', user => user.active);
// SUM
const totalSalary = await db.sum('users', 'salary');
// AVERAGE
const avgAge = await db.average('users', 'age');
// MIN/MAX
const minAge = await db.min('users', 'age');
const maxSalary = await db.max('users', 'salary');
// DISTINCT
const departments = await db.distinct('users', 'department');SQL-Like Query Methods
SELECT with Conditions
// Basic SELECT
const users = await db.select('users', {
fields: ['name', 'email', 'department']
});
// SELECT with WHERE
const engineers = await db.select('users', {
where: { department: 'Engineering' },
fields: ['name', 'salary']
});
// SELECT with complex WHERE
const highPaidEngineers = await db.select('users', {
where: {
department: 'Engineering',
salary: { operator: 'gt', value: 70000 }
},
orderBy: ['salary', 'desc'],
limit: 10
});WHERE with Operators
// Comparison operators
const users = await db.where('users', {
age: { operator: 'gte', value: 25 },
salary: { operator: 'lt', value: 80000 }
});
// Available operators: eq, ne, gt, gte, lt, lte, in, nin, like, regex
const results = await db.where('users', {
name: { operator: 'like', value: 'john' },
department: { operator: 'in', value: ['Engineering', 'Marketing'] }
});JOIN Operations
// INNER JOIN
const userOrders = await db.join('users', 'orders', 'id', 'userId', 'inner');
// LEFT JOIN
const allUserOrders = await db.join('users', 'orders', 'id', 'userId', 'left');
// RIGHT JOIN
const allOrderUsers = await db.join('users', 'orders', 'id', 'userId', 'right');Aggregation Pipeline
// Complex aggregation
const deptStats = await db.aggregate('users', [
{
type: 'match',
conditions: { active: true }
},
{
type: 'group',
groupBy: ['department'],
aggregations: [
{ name: 'count', type: 'count', field: 'id' },
{ name: 'avgSalary', type: 'avg', field: 'salary' },
{ name: 'minAge', type: 'min', field: 'age' },
{ name: 'maxAge', type: 'max', field: 'age' }
]
},
{
type: 'sort',
sort: { avgSalary: -1 }
},
{
type: 'limit',
limit: 5
}
]);Advanced Features
Bulk Operations
const operations = [
{ type: 'create', collection: 'users', item: { name: 'Alice' } },
{ type: 'update', collection: 'users', id: 'user-1', updates: { active: false } },
{ type: 'delete', collection: 'users', id: 'user-2' }
];
const results = await db.bulkOperation(operations, 'Bulk update');Data Conversion
// Convert between formats
const csvDb = await db.convertFormat('csv', {
csvOptions: { delimiter: ';' }
});
// Export data
await db.export('./backup.json');
// Import data
await db.import('./data.json', 'Import from backup');Statistics and Analytics
// Get database statistics
const stats = await db.getStats();
console.log(stats);
// Get commit history (GitHub storage)
const commits = await db.getCommitHistory(10);
// Get specific version
const oldData = await db.getVersion('commit-sha');Format Examples
JSON Format
{
"users": [
{
"id": "1",
"name": "John Doe",
"email": "[email protected]",
"createdAt": "2024-01-15T10:30:00.000Z",
"updatedAt": "2024-01-15T10:30:00.000Z"
}
]
}CSV Format
id,name,email,createdAt,updatedAt
1,John Doe,[email protected],2024-01-15T10:30:00.000Z,2024-01-15T10:30:00.000Z
2,Jane Smith,[email protected],2024-01-15T11:00:00.000Z,2024-01-15T11:00:00.000ZYAML Format
users:
- id: "1"
name: "John Doe"
email: "[email protected]"
createdAt: "2024-01-15T10:30:00.000Z"
updatedAt: "2024-01-15T10:30:00.000Z"XML Format
<database>
<collections name="users">
<item>
<id>1</id>
<name>John Doe</name>
<email>[email protected]</email>
<createdAt>2024-01-15T10:30:00.000Z</createdAt>
<updatedAt>2024-01-15T10:30:00.000Z</updatedAt>
</item>
</collections>
</database>Frontend Integration
Express.js Backend
const express = require('express');
const GitDB = require('gitdb');
const app = express();
const db = new GitDB({ /* config */ });
app.use(express.json());
// GET all users
app.get('/api/users', async (req, res) => {
try {
const users = await db.read('users');
res.json(users);
} catch (error) {
res.status(500).json({ error: error.message });
}
});
// POST new user
app.post('/api/users', async (req, res) => {
try {
const user = await db.create('users', req.body);
res.json(user);
} catch (error) {
res.status(500).json({ error: error.message });
}
});
// Advanced filtering endpoint
app.get('/api/users/filter', async (req, res) => {
try {
const { where, orderBy, limit, fields } = req.query;
const users = await db.select('users', {
where: where ? JSON.parse(where) : undefined,
orderBy: orderBy ? JSON.parse(orderBy) : undefined,
limit: limit ? parseInt(limit) : undefined,
fields: fields ? fields.split(',') : undefined
});
res.json(users);
} catch (error) {
res.status(500).json({ error: error.message });
}
});Frontend JavaScript
// Fetch users with filtering
const fetchUsers = async (filters = {}) => {
const params = new URLSearchParams();
if (filters.where) params.append('where', JSON.stringify(filters.where));
if (filters.orderBy) params.append('orderBy', JSON.stringify(filters.orderBy));
if (filters.limit) params.append('limit', filters.limit);
if (filters.fields) params.append('fields', filters.fields.join(','));
const response = await fetch(`/api/users/filter?${params}`);
return response.json();
};
// Example usage
const highPaidEngineers = await fetchUsers({
where: {
department: 'Engineering',
salary: { operator: 'gt', value: 70000 }
},
orderBy: ['salary', 'desc'],
limit: 10,
fields: ['name', 'salary', 'department']
});History Tracking (GitHub Storage Only)
GitDB provides comprehensive version history tracking when using GitHub storage. This allows you to track every change to your data, including who made the change, when it was made, and what exactly changed.
Prerequisites
History tracking is only available with GitHub storage. Make sure you have:
- GitHub Personal Access Token with
reposcope - Data stored in a GitHub repository
Basic History Operations
Get Commit History
// Get the last 50 commits
const commits = await db.getCommitHistory(50);
commits.forEach(commit => {
console.log(`${commit.message} by ${commit.author.name} on ${commit.date}`);
});Get Record History
// Track changes to a specific record
const recordHistory = await db.getRecordHistory('users', 'user-123', 20);
recordHistory.forEach(entry => {
console.log(`${entry.action}: ${entry.message} by ${entry.author.name}`);
console.log(`Record:`, entry.record);
});Get Field History
// Track changes to a specific field
const salaryHistory = await db.getFieldHistory('users', 'user-123', 'salary', 10);
salaryHistory.forEach(entry => {
console.log(`Salary: $${entry.fieldValue} (${entry.action}) - ${entry.message}`);
});Advanced History Features
Search History
// Search by author
const authorCommits = await db.searchHistory({
author: '[email protected]'
});
// Search by date range
const dateRangeCommits = await db.searchHistory({
since: '2024-01-01T00:00:00Z',
until: '2024-12-31T23:59:59Z'
});
// Search by commit message
const messageCommits = await db.searchHistory({
message: 'update user'
});Get History Statistics
const stats = await db.getHistoryStats();
console.log(`Total commits: ${stats.totalCommits}`);
console.log(`Authors: ${stats.authors.join(', ')}`);
console.log(`Average commits per day: ${stats.averageCommitsPerDay}`);
console.log(`First commit: ${stats.firstCommit.message}`);
console.log(`Last commit: ${stats.lastCommit.message}`);Get Collection Timeline
// Track how a collection has grown over time
const timeline = await db.getCollectionTimeline('users', 50);
timeline.forEach(entry => {
console.log(`${entry.recordCount} users - ${entry.message} (${entry.author.name})`);
});Compare Versions
// Get the difference between two commits
const diff = await db.getDiff('abc123', 'def456');
console.log(`Commits ahead: ${diff.ahead_by}`);
console.log(`Commits behind: ${diff.behind_by}`);
console.log(`Files changed: ${diff.files.length}`);
diff.files.forEach(file => {
console.log(`${file.filename}: ${file.status} (+${file.additions} -${file.deletions})`);
});Get File at Specific Commit
// Get the data file content at a specific commit
const fileData = await db.getFileAtCommit('abc123');
console.log(`File size: ${fileData.size} bytes`);
const data = JSON.parse(fileData.content);
console.log('Data at that commit:', data);Get Commit Changes
// Get all changes made in a specific commit
const changes = await db.getCommitChanges('abc123');
console.log('Added:', changes.added);
console.log('Modified:', changes.modified);
console.log('Deleted:', changes.deleted);Revert Operations
Revert to Previous Commit
// Revert the entire database to a previous state
const result = await db.revertToCommit('abc123', 'Revert to stable version');
console.log(`Reverted to commit: ${result.revertedTo}`);
console.log(`Message: ${result.message}`);History API Endpoints
You can easily create REST API endpoints for history tracking:
const express = require('express');
const app = express();
// Get commit history
app.get('/api/history/commits', async (req, res) => {
const limit = parseInt(req.query.limit) || 50;
const commits = await db.getCommitHistory(limit);
res.json(commits);
});
// Get record history
app.get('/api/history/record/:collection/:id', async (req, res) => {
const { collection, id } = req.params;
const limit = parseInt(req.query.limit) || 50;
const history = await db.getRecordHistory(collection, id, limit);
res.json(history);
});
// Get field history
app.get('/api/history/field/:collection/:id/:field', async (req, res) => {
const { collection, id, field } = req.params;
const limit = parseInt(req.query.limit) || 50;
const history = await db.getFieldHistory(collection, id, field, limit);
res.json(history);
});
// Search history
app.get('/api/history/search', async (req, res) => {
const { author, since, until, message } = req.query;
const results = await db.searchHistory({ author, since, until, message });
res.json(results);
});
// Revert to commit
app.post('/api/history/revert/:sha', async (req, res) => {
const { sha } = req.params;
const { message } = req.body;
const result = await db.revertToCommit(sha, message);
res.json(result);
});History Tracking Use Cases
Audit Trails
// Track all changes to sensitive data
const auditTrail = await db.getRecordHistory('employees', 'emp-123', 100);
const salaryChanges = auditTrail.filter(entry =>
entry.action === 'updated' && entry.record.salary !== entry.previousRecord?.salary
);Data Recovery
// Find when data was accidentally deleted
const recordHistory = await db.getRecordHistory('users', 'user-456', 50);
const deletionCommit = recordHistory.find(entry => entry.action === 'deleted');
if (deletionCommit) {
// Revert to the commit before deletion
await db.revertToCommit(deletionCommit.commit, 'Recover deleted user');
}Change Analysis
// Analyze how a field has changed over time
const fieldHistory = await db.getFieldHistory('products', 'prod-789', 'price', 50);
const priceChanges = fieldHistory.filter(entry => entry.action === 'updated');
console.log('Price change history:');
priceChanges.forEach(change => {
console.log(`${change.date}: $${change.previousValue} → $${change.fieldValue}`);
});Performance Considerations
- API Rate Limits: GitHub API has rate limits (5,000 requests/hour for authenticated users)
- Large Histories: For repositories with many commits, consider limiting the number of commits retrieved
- Caching: Consider caching frequently accessed history data
- Batch Operations: Use bulk operations to reduce the number of commits
Security Features
Encryption
const db = new GitDB({
encryptionKey: 'your-secure-key',
// ... other options
});Data Validation
- Automatic checksum calculation and verification
- Data integrity checks on read/write operations
- Format validation for supported data types
Access Control
- GitHub token-based authentication for remote storage
- File system permissions for local storage
- Locking mechanism to prevent concurrent writes
Migration Guide
From Local to GitHub Storage
// Create local database
const localDb = new GitDB({ storageType: 'local' });
// Create GitHub database
const githubDb = new GitDB({
storageType: 'github',
owner: 'git-as-a-db',
repo: 'your-repo',
token: 'your-token'
});
// Export from local and import to GitHub
const data = await localDb.readDataFile();
await githubDb.writeDataFile(data, 'Migrate from local storage');Format Conversion
// Convert JSON to CSV
const csvDb = await db.convertFormat('csv');
// Convert CSV to YAML
const yamlDb = await csvDb.convertFormat('yaml');Testing
Unit Tests
const GitDB = require('gitdb');
describe('GitDB', () => {
let db;
beforeEach(() => {
db = new GitDB({ storageType: 'local', format: 'json' });
});
test('should create and read items', async () => {
const item = await db.create('test', { name: 'Test Item' });
expect(item.name).toBe('Test Item');
const items = await db.read('test');
expect(items).toHaveLength(1);
});
test('should filter items correctly', async () => {
await db.create('users', { name: 'Alice', age: 25 });
await db.create('users', { name: 'Bob', age: 30 });
const youngUsers = await db.filter('users', user => user.age < 30);
expect(youngUsers).toHaveLength(1);
expect(youngUsers[0].name).toBe('Alice');
});
});Integration Tests
test('GitHub storage integration', async () => {
const db = new GitDB({
storageType: 'github',
owner: 'git-as-a-db',
repo: 'test-repo',
token: process.env.GITHUB_TOKEN
});
const item = await db.create('test', { data: 'test' });
expect(item).toBeDefined();
const items = await db.read('test');
expect(items).toContainEqual(expect.objectContaining({ data: 'test' }));
});Contributing
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests for new functionality
- Submit a pull request
License
MIT License - see LICENSE file for details.
Support
For issues and questions:
- Create an issue on GitHub
- Check the documentation
- Review the examples in the repository
GitDB - Your data, your format, your storage, your way! 🚀
