automated-database-seed
v1.0.0
Published
**Generate realistic, schema-aware test data for PostgreSQL databases in seconds, not hours.**
Maintainers
Readme
🚀 Smart Test Data Fabricator
Generate realistic, schema-aware test data for PostgreSQL databases in seconds, not hours.
Stop wasting time manually scrubbing production data or dealing with broken foreign keys from simple faker scripts. Smart Test Data Fabricator automatically understands your database schema and generates consistent, realistic test data that just works.
# Generate realistic test data instantly
fabricate-data --url postgresql://localhost/testdb --mode generate
# Safely scrub production data
fabricate-data --url postgresql://prod/backup --mode scrub --output postgresql://test/db✨ Why Smart Test Data Fabricator?
The Problem
- Manual data scrubbing takes hours and risks PII leaks
- Simple faker libraries create data that violates foreign keys
- Copying production is dangerous and time-consuming
- Writing custom scripts for each schema is tedious
The Solution
- 🧠 Schema-aware: Automatically understands tables, relationships, and constraints
- 🔗 Referential integrity: All foreign keys point to valid records
- 🎯 Realistic data: Uses smart patterns to generate believable content
- 🛡️ PII protection: Safely anonymizes sensitive data while preserving structure
- ⚡ Fast: Generates millions of records efficiently
- 🔧 Easy: Works with simple commands or detailed configurations
🚀 Quick Start
Installation
pip install smart-test-data-fabricatorGenerate Your First Dataset
# Connect to your empty database and generate realistic data
fabricate-data --url postgresql://user:pass@localhost/testdb --mode generate
# See what it would do without making changes
fabricate-data --url postgresql://localhost/testdb --mode generate --dry-runThat's it! The tool will:
- 🔍 Analyze your database schema
- 📊 Determine table dependencies
- 🎲 Generate realistic, consistent data
- ✅ Maintain all foreign key relationships
💡 Common Use Cases
🏗️ Local Development
Perfect for seeding your development database with realistic data:
# Generate a small dataset for development
fabricate-data --url postgresql://localhost/myapp_dev --template small_demo🧪 Testing & QA
Create specific scenarios for testing edge cases:
# test_scenario.yml
tables:
users: 100
orders:
count: "each users 0-15" # Some users have no orders, others have many
fields:
status: [pending:20%, completed:70%, cancelled:10%]
products: 50fabricate-data --url postgresql://localhost/test --config test_scenario.yml🏭 CI/CD Pipeline Integration
Automatically seed test databases in your deployment pipeline:
# .github/workflows/test.yml
- name: Seed Test Database
run: |
fabricate-data \
--url ${{ secrets.TEST_DATABASE_URL }} \
--mode generate \
--template integration_test \
--quiet🔒 Production Data Scrubbing
Safely anonymize production data for development use:
# Scrub sensitive data while preserving relationships
fabricate-data \
--url postgresql://prod-backup/db \
--mode scrub \
--output postgresql://staging/db \
--config scrub_rules.yml📚 Configuration Examples
Simple Generation
# quick_demo.yml
mode: generate
tables:
users: 50
posts: 200
comments: 500Advanced Scenarios
# ecommerce_scenario.yml
mode: generate
settings:
seed: 12345 # Reproducible data
tables:
users:
count: 1000
fields:
email_verified: 80% true
plan: [free:70%, pro:25%, enterprise:5%]
products:
count: 200
fields:
category: [electronics:30%, clothing:25%, books:20%, home:25%]
orders:
count: "each users 0-10" # Realistic distribution
fields:
status: [pending:5%, shipped:20%, delivered:70%, returned:5%]
order_items:
count: "each orders 1-5"PII Scrubbing Rules
# scrub_config.yml
mode: scrub
auto_detect_pii: true
custom_rules:
users.email: fake_email
users.phone: fake_phone
users.ssn: mask_with_x
profiles.bio: lorem_paragraph
consistency:
preserve_relationships: true
maintain_distributions: true🛠️ CLI Reference
Basic Commands
# Generate mode - create synthetic data
fabricate-data --url <database_url> --mode generate [options]
# Scrub mode - anonymize existing data
fabricate-data --url <source_url> --mode scrub --output <target_url> [options]Essential Options
| Option | Description | Example |
|--------|-------------|---------|
| --config FILE | Use configuration file | --config scenario.yml |
| --template NAME | Use built-in template | --template small_demo |
| --dry-run | Show what would be done | --dry-run |
| --quiet | Minimal output for scripts | --quiet |
| --verbose | Detailed logging | --verbose |
| --seed NUMBER | Reproducible generation | --seed 12345 |
Built-in Templates
small_demo- Perfect for development (100s of records)integration_test- Medium dataset for testing (1000s of records)performance_test- Large dataset for load testing (100K+ records)saas_app- Typical SaaS application schemaecommerce- E-commerce platform schema
🎯 Advanced Features
Smart Data Generation
The tool automatically generates realistic data based on column names and types:
emailcolumns →[email protected]phonecolumns →(555) 123-4567first_name+last_name→ Consistent fake namescreated_at→ Realistic timestampspricecolumns → Reasonable monetary values
Referential Integrity
Automatically handles complex relationships:
- ✅ Foreign keys always point to valid records
- ✅ Handles self-referencing tables (categories, employees)
- ✅ Manages circular dependencies intelligently
- ✅ Supports composite keys and unique constraints
Performance Optimization
Efficient for large datasets:
- 🚀 Bulk inserts using PostgreSQL COPY protocol
- 🧵 Parallel processing for independent tables
- 💾 Memory-efficient streaming for large datasets
- 📊 Progress reporting for long-running operations
🔧 Development Setup
Prerequisites
- Python 3.8+
- PostgreSQL 9.6+
Local Development
# Clone the repository
git clone https://github.com/your-org/smart-test-data-fabricator
cd smart-test-data-fabricator
# Install dependencies
pip install -r requirements.txt
# Run tests
pytest tests/
# Install in development mode
pip install -e .Docker Usage
# Using Docker Compose
docker-compose up -d postgres
export DATABASE_URL="postgresql://test:test@localhost:5432/testdb"
fabricate-data --url $DATABASE_URL --mode generate🐛 Troubleshooting
Common Issues
Connection refused
# Check your database URL and credentials
fabricate-data --url postgresql://user:pass@host:port/db --dry-runForeign key violations
# The tool should prevent this, but if it happens:
fabricate-data --url <database_url> --mode generate --validate-schemaOut of memory with large datasets
# Use streaming mode for large datasets
fabricate-data --url <database_url> --mode generate --batch-size 1000 --streamPII not detected during scrubbing
# Use custom rules for specific columns
fabricate-data --mode scrub --config custom_pii_rules.ymlGetting Help
- 📖 Check our documentation
- 🐛 Report bugs on GitHub Issues
- 💬 Join our Discord community
- 📧 Email us at [email protected]
🤝 Contributing
We love contributions! Here's how to help:
- 🍴 Fork the repository
- 🌿 Create a feature branch (
git checkout -b feature/amazing-feature) - ✅ Test your changes (
pytest tests/) - 📝 Commit your changes (
git commit -am 'Add amazing feature') - 🚀 Push to the branch (
git push origin feature/amazing-feature) - 🔄 Open a Pull Request
Development Guidelines
- Write tests for new features
- Follow PEP 8 style guidelines
- Update documentation for user-facing changes
- Add type hints for better code quality
📄 License
This project is licensed under the MIT License - see the LICENSE file for details.
🙏 Acknowledgments
- Built with ❤️ for developers who deserve better test data tools
- Inspired by the pain of manual data scrubbing and broken faker scripts
- Thanks to the PostgreSQL community for excellent introspection capabilities
- Special thanks to the Faker library maintainers
Ready to generate some realistic test data?
pip install smart-test-data-fabricator
fabricate-data --url postgresql://localhost/myapp --mode generateMade with ❤️ by developers, for developers
