npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2025 – Pkg Stats / Ryan Hefner

automated-database-seed

v1.0.0

Published

**Generate realistic, schema-aware test data for PostgreSQL databases in seconds, not hours.**

Readme

🚀 Smart Test Data Fabricator

Generate realistic, schema-aware test data for PostgreSQL databases in seconds, not hours.

Stop wasting time manually scrubbing production data or dealing with broken foreign keys from simple faker scripts. Smart Test Data Fabricator automatically understands your database schema and generates consistent, realistic test data that just works.

# Generate realistic test data instantly
fabricate-data --url postgresql://localhost/testdb --mode generate

# Safely scrub production data 
fabricate-data --url postgresql://prod/backup --mode scrub --output postgresql://test/db

✨ Why Smart Test Data Fabricator?

The Problem

  • Manual data scrubbing takes hours and risks PII leaks
  • Simple faker libraries create data that violates foreign keys
  • Copying production is dangerous and time-consuming
  • Writing custom scripts for each schema is tedious

The Solution

  • 🧠 Schema-aware: Automatically understands tables, relationships, and constraints
  • 🔗 Referential integrity: All foreign keys point to valid records
  • 🎯 Realistic data: Uses smart patterns to generate believable content
  • 🛡️ PII protection: Safely anonymizes sensitive data while preserving structure
  • Fast: Generates millions of records efficiently
  • 🔧 Easy: Works with simple commands or detailed configurations

🚀 Quick Start

Installation

pip install smart-test-data-fabricator

Generate Your First Dataset

# Connect to your empty database and generate realistic data
fabricate-data --url postgresql://user:pass@localhost/testdb --mode generate

# See what it would do without making changes
fabricate-data --url postgresql://localhost/testdb --mode generate --dry-run

That's it! The tool will:

  1. 🔍 Analyze your database schema
  2. 📊 Determine table dependencies
  3. 🎲 Generate realistic, consistent data
  4. ✅ Maintain all foreign key relationships

💡 Common Use Cases

🏗️ Local Development

Perfect for seeding your development database with realistic data:

# Generate a small dataset for development
fabricate-data --url postgresql://localhost/myapp_dev --template small_demo

🧪 Testing & QA

Create specific scenarios for testing edge cases:

# test_scenario.yml
tables:
  users: 100
  orders: 
    count: "each users 0-15"  # Some users have no orders, others have many
    fields:
      status: [pending:20%, completed:70%, cancelled:10%]
  products: 50
fabricate-data --url postgresql://localhost/test --config test_scenario.yml

🏭 CI/CD Pipeline Integration

Automatically seed test databases in your deployment pipeline:

# .github/workflows/test.yml
- name: Seed Test Database
  run: |
    fabricate-data \
      --url ${{ secrets.TEST_DATABASE_URL }} \
      --mode generate \
      --template integration_test \
      --quiet

🔒 Production Data Scrubbing

Safely anonymize production data for development use:

# Scrub sensitive data while preserving relationships
fabricate-data \
  --url postgresql://prod-backup/db \
  --mode scrub \
  --output postgresql://staging/db \
  --config scrub_rules.yml

📚 Configuration Examples

Simple Generation

# quick_demo.yml
mode: generate
tables:
  users: 50
  posts: 200
  comments: 500

Advanced Scenarios

# ecommerce_scenario.yml
mode: generate
settings:
  seed: 12345  # Reproducible data
  
tables:
  users: 
    count: 1000
    fields:
      email_verified: 80% true
      plan: [free:70%, pro:25%, enterprise:5%]
      
  products:
    count: 200
    fields:
      category: [electronics:30%, clothing:25%, books:20%, home:25%]
      
  orders:
    count: "each users 0-10"  # Realistic distribution
    fields:
      status: [pending:5%, shipped:20%, delivered:70%, returned:5%]
      
  order_items:
    count: "each orders 1-5"

PII Scrubbing Rules

# scrub_config.yml
mode: scrub
auto_detect_pii: true

custom_rules:
  users.email: fake_email
  users.phone: fake_phone  
  users.ssn: mask_with_x
  profiles.bio: lorem_paragraph
  
consistency:
  preserve_relationships: true
  maintain_distributions: true

🛠️ CLI Reference

Basic Commands

# Generate mode - create synthetic data
fabricate-data --url <database_url> --mode generate [options]

# Scrub mode - anonymize existing data  
fabricate-data --url <source_url> --mode scrub --output <target_url> [options]

Essential Options

| Option | Description | Example | |--------|-------------|---------| | --config FILE | Use configuration file | --config scenario.yml | | --template NAME | Use built-in template | --template small_demo | | --dry-run | Show what would be done | --dry-run | | --quiet | Minimal output for scripts | --quiet | | --verbose | Detailed logging | --verbose | | --seed NUMBER | Reproducible generation | --seed 12345 |

Built-in Templates

  • small_demo - Perfect for development (100s of records)
  • integration_test - Medium dataset for testing (1000s of records)
  • performance_test - Large dataset for load testing (100K+ records)
  • saas_app - Typical SaaS application schema
  • ecommerce - E-commerce platform schema

🎯 Advanced Features

Smart Data Generation

The tool automatically generates realistic data based on column names and types:

  • email columns → [email protected]
  • phone columns → (555) 123-4567
  • first_name + last_name → Consistent fake names
  • created_at → Realistic timestamps
  • price columns → Reasonable monetary values

Referential Integrity

Automatically handles complex relationships:

  • ✅ Foreign keys always point to valid records
  • ✅ Handles self-referencing tables (categories, employees)
  • ✅ Manages circular dependencies intelligently
  • ✅ Supports composite keys and unique constraints

Performance Optimization

Efficient for large datasets:

  • 🚀 Bulk inserts using PostgreSQL COPY protocol
  • 🧵 Parallel processing for independent tables
  • 💾 Memory-efficient streaming for large datasets
  • 📊 Progress reporting for long-running operations

🔧 Development Setup

Prerequisites

  • Python 3.8+
  • PostgreSQL 9.6+

Local Development

# Clone the repository
git clone https://github.com/your-org/smart-test-data-fabricator
cd smart-test-data-fabricator

# Install dependencies
pip install -r requirements.txt

# Run tests
pytest tests/

# Install in development mode
pip install -e .

Docker Usage

# Using Docker Compose
docker-compose up -d postgres
export DATABASE_URL="postgresql://test:test@localhost:5432/testdb"
fabricate-data --url $DATABASE_URL --mode generate

🐛 Troubleshooting

Common Issues

Connection refused

# Check your database URL and credentials
fabricate-data --url postgresql://user:pass@host:port/db --dry-run

Foreign key violations

# The tool should prevent this, but if it happens:
fabricate-data --url <database_url> --mode generate --validate-schema

Out of memory with large datasets

# Use streaming mode for large datasets
fabricate-data --url <database_url> --mode generate --batch-size 1000 --stream

PII not detected during scrubbing

# Use custom rules for specific columns
fabricate-data --mode scrub --config custom_pii_rules.yml

Getting Help

🤝 Contributing

We love contributions! Here's how to help:

  1. 🍴 Fork the repository
  2. 🌿 Create a feature branch (git checkout -b feature/amazing-feature)
  3. ✅ Test your changes (pytest tests/)
  4. 📝 Commit your changes (git commit -am 'Add amazing feature')
  5. 🚀 Push to the branch (git push origin feature/amazing-feature)
  6. 🔄 Open a Pull Request

Development Guidelines

  • Write tests for new features
  • Follow PEP 8 style guidelines
  • Update documentation for user-facing changes
  • Add type hints for better code quality

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

  • Built with ❤️ for developers who deserve better test data tools
  • Inspired by the pain of manual data scrubbing and broken faker scripts
  • Thanks to the PostgreSQL community for excellent introspection capabilities
  • Special thanks to the Faker library maintainers

Ready to generate some realistic test data?

pip install smart-test-data-fabricator
fabricate-data --url postgresql://localhost/myapp --mode generate

Made with ❤️ by developers, for developers