@otg-dev/seedforge
v2.7.1
Published
Generate realistic seed data from your database schema
Downloads
150
Maintainers
Readme
seedforge
Stop maintaining brittle seed scripts. seedforge reads your schema and generates valid relational test data automatically.
The zero-config npx seed tool for developers who need realistic, relational test data from their actual schema. Point it at any PostgreSQL, MySQL, or SQLite database — or hand it your Prisma / Drizzle / TypeORM / JPA schema — and it produces deterministic, FK-correct seed data in seconds.
npx @otg-dev/seedforge --db $DATABASE_URL --count 100 --seed 42That's the whole interface. No config file, no decorators, no schema definitions. seedforge introspects what you have and fills it in.
seedforge --db "$DATABASE_URL" --count 5 --seed 42 --quiet --yesWhy
- Schema-aware. Reads tables, FKs, enums, CHECK constraints from the source of truth — not a copy of it.
- Zero-config. Works on day one against any supported database or ORM. No setup beyond pointing at your schema.
- Deterministic. Pass
--seed 42and get byte-identical output across machines and CI runs. Faker can't promise this; hand-written fixtures rot.
Quick Demo
$ seedforge --db postgres://localhost/myapp --count 10
seedforge v2.1.0
Introspected 8 tables, 2 enums, 12 foreign keys
Insert order: countries -> users -> categories -> products -> orders -> order_items -> reviews -> audit_log
Generated 80 rows across 8 tables
Seeding complete:
Tables: 8
Rows: 80
Time: 142ms
Mode: DIRECTQuery the result:
SELECT first_name, last_name, email, created_at FROM users LIMIT 5; first_name | last_name | email | created_at
------------+-----------+----------------------------+---------------------
Marcus | Chen | [email protected] | 2025-09-14 08:23:01
Fatima | Okafor | [email protected] | 2025-11-02 14:45:33
Lena | Johansson | [email protected] | 2025-12-01 09:12:47
Diego | Reyes | [email protected] | 2026-01-18 16:30:22
Aisha | Patel | [email protected] | 2026-02-25 11:05:59Not random gibberish. Real-looking data with correct relationships.
Install
npm install -g @otg-dev/seedforgeOr run without installing:
npx @otg-dev/seedforge --db postgres://localhost/mydbFor programmatic use in your project:
npm install @otg-dev/seedforgeQuick Start
CLI
# Seed a PostgreSQL database (50 rows per table by default)
seedforge --db postgres://localhost/mydb
# MySQL
seedforge --db mysql://root:pass@localhost/mydb
# SQLite
seedforge --db sqlite:///path/to/database.db
# 100 rows per table, deterministic output
seedforge --db postgres://localhost/mydb --count 100 --seed 42
# Export to a SQL file
seedforge --db postgres://localhost/mydb --output seed.sql
# Preview without touching the database
seedforge --db postgres://localhost/mydb --dry-run
# Seed from a Prisma schema (no running database needed for SQL file output)
seedforge --prisma ./prisma/schema.prisma --output seed.sql
# Seed from Drizzle, TypeORM, or JPA schemas
seedforge --drizzle ./src/db/schema.ts --output seed.sql
seedforge --typeorm ./src/entities/ --output seed.sql
seedforge --jpa ./src/main/java/com/example/entities/ --output seed.sql
# Fast mode: COPY-based insertion for PostgreSQL (10x+ faster)
seedforge --db postgres://localhost/mydb --count 10000 --fast
# Use a config file for fine-grained control
seedforge --config .seedforge.ymlWhen to use --db vs. a parser flag
seedforge supports two workflows. Pick based on what you need:
| Path | Use when | Tradeoff |
|---|---|---|
| --db <url> (live DB) | You want insertion fidelity — sequences, defaults, triggers, extensions all behave like production. | Requires a running database and credentials. |
| --prisma / --drizzle / --typeorm / --jpa | You want portability and reproducibility — generate seed SQL once, replay forever, no live DB needed. CI artifacts, PR preview, offline dev. | Function-based defaults (now(), gen_random_uuid()) and triggers won't fire until you apply. |
For other ORMs (GORM, SQLAlchemy, Django, ActiveRecord, EF Core, Sequelize, …): connect to a live DB via --db after running your migrations. Native parsers for these ecosystems are not on the roadmap; community plugins are the planned path.
Auto-discovery
Run seedforge with no flags from your project root and it walks the cwd looking for schema sources, prints what it found, and asks you to pick:
$ seedforge
seedforge: no schema source specified.
Detected schema sources in this project:
• Live DB: postgres://***@localhost:5432/myapp_dev — from .env
• Prisma: prisma/schema.prisma
Run one of:
seedforge --db "postgres://localhost:5432/myapp_dev"
seedforge --prisma prisma/schema.prismaIf DATABASE_URL is set in env, that wins automatically. Otherwise you choose — seedforge never silently picks between sources of different fidelity. Disable with --no-auto.
Programmatic API
import { seed, createSeeder, withSeed } from '@otg-dev/seedforge'
// One-liner: seed everything and get results
const result = await seed('postgres://localhost/testdb', {
count: 100,
seed: 42,
})
console.log(`Generated ${result.rowCount} rows in ${result.duration}ms`)
// Stateful seeder: persistent connection, per-table control
const seeder = await createSeeder('postgres://localhost/testdb', {
seed: 42,
transaction: true,
})
const users = await seeder.seed('users', 10)
const posts = await seeder.seed('posts', 50)
await seeder.teardown() // rolls back everything
// Test helper: automatic transaction + cleanup
const { seed: seedTable, teardown } = withSeed('postgres://localhost/testdb', {
seed: 42,
transaction: true,
})
const rows = await seedTable('users', 5)
// ... run assertions ...
await teardown() // rolls back all seeded dataSee USAGE.md for the full programmatic API reference with examples.
Features
Schema Sources
- Multi-database -- PostgreSQL (v12+), MySQL (v5.7+), SQLite3 via live introspection
- Prisma -- parse
.prismaschema files directly - Drizzle -- parse Drizzle ORM schema files (TypeScript/JavaScript)
- TypeORM -- parse TypeORM entity classes with decorators
- JPA/Hibernate -- parse Java entity classes with annotations
- Plugins -- extend with custom schema parsers and data generators
Data Generation
- 190+ column patterns -- detects column semantics from names (email, phone, price, address, first_name, etc.) across 10 domains and generates matching Faker.js data
- Multi-tier matching -- exact name -> suffix -> prefix -> regex -> stem-based semantic matching with confidence levels
- Constraint-safe -- respects NOT NULL, UNIQUE, CHECK constraints, enum types, and generated columns
- JSON/JSONB support -- generates valid JSON objects for JSON columns
- UUID primary keys -- generates valid UUIDs for UUID PK columns
- Composite keys -- handles composite primary keys and composite foreign keys
- Safe emails -- all generated emails use RFC 2606 reserved domains (@example.com)
- Deterministic output -- pass
--seedto get identical data every time, for reproducible CI/CD
Relationships
- Foreign key resolution -- topologically sorts tables and inserts in dependency order; supports one-to-one, one-to-many, and many-to-many (via join tables) out of the box
- Circular references -- detects cycles and resolves them with deferred UPDATEs
- Self-referencing tables -- generates tree structures with NULL roots, then fills parent references
- Composite foreign keys -- threads multi-column FK references correctly
- Cardinality control -- configure min..max child rows per parent with uniform, zipf, or normal distributions
- Existing data aware -- augments existing data without unique constraint violations; resets sequences after seeding
Constraints
Seedforge is constraint-aware, not just FK-aware. It honors:
- NOT NULL -- required columns always get a value
- UNIQUE (single + composite) -- retries until unique within the generated batch
- CHECK constraints -- extracts literal sets like
role IN ('admin', 'editor')and picks from them automatically - Enums -- native PostgreSQL/MySQL enum types and inferred enums from CHECK
- VARCHAR(n) / CHAR(n) -- truncates generated strings to the declared length
- Numeric precision / scale -- fits
DECIMAL(p,s)columns correctly - Defaults -- leaves columns with
DEFAULTexpressions alone when they're optional - Generated / identity columns -- skipped; the DB fills them in
- FK actions (
ON DELETE,ON UPDATE) -- parsed and visible via--inspect
Output & Performance
- Multiple output modes -- insert directly, write a
.sqlfile, or dry-run to preview - Fast mode -- PostgreSQL COPY-based bulk insertion (10x+ faster for large datasets)
- Auto-batch tuning -- optimal batch sizes calculated per table
- Streaming generation -- memory-efficient generation for very large datasets
Safety & Configuration
- Production safety -- warns before seeding databases that look like production (RDS, Cloud SQL, Neon, Supabase, etc.) and requires
--yesto override - Config file support -- fine-tune per-table row counts, column overrides, value weights, cardinality, and table exclusions via
.seedforge.yml - Environment variables -- interpolate
${DATABASE_URL}in config files - Programmatic API -- four-tier API from one-liner to test helper, with transaction support
Configuration
Create a .seedforge.yml in your project root for full control:
connection:
url: ${DATABASE_URL}
schema: public
count: 100
seed: 42
tables:
users:
count: 200
columns:
email:
generator: faker.internet.email
unique: true
role:
values: [admin, user, moderator]
weights: [0.05, 0.9, 0.05]
bio:
nullable: 0.3
relationships:
orders:
cardinality: "1..5"
distribution: zipf
products:
columns:
price:
generator: faker.commerce.price
exclude:
- schema_migrations
- _prisma_migrations
- pg_*Environment variables are interpolated with ${VAR} syntax. Table exclusion supports glob patterns.
CLI Reference
Usage: seedforge [options]
Generate realistic seed data from your database schema
Options:
--db <url> Database connection string (postgres://, mysql://, sqlite://)
--prisma <path> Path to Prisma schema file (.prisma)
--drizzle <path> Path to Drizzle schema file or directory
--typeorm <path> Path to TypeORM entity directory
--jpa <path> Path to JPA entity directory
--plugin <paths...> Paths to plugin modules
--config <path> Path to .seedforge.yml config file
--count <number> Rows per table (default: 50)
--seed <number> PRNG seed for deterministic output
--dry-run Preview without executing
--output <file> SQL file export path
--schema <name> Database schema to introspect (default: "public")
--only <tables...> Generate data only for these tables (FK ancestors auto-included)
--strict-only Error if --only requires FK ancestors not in the list
--exclude <tables...> Tables to skip (glob patterns supported)
--inspect Describe detected schema (columns, constraints, FKs, insert order) and exit
--fast Use COPY-based insertion for PostgreSQL (faster)
--verbose Verbose logging
--quiet Suppress non-essential output
--debug Debug logging
--json Machine-readable JSON output
--yes Skip confirmation prompts
-V, --version Output the version number
-h, --help Display help for command
Examples:
$ seedforge --db postgres://localhost/mydb
$ seedforge --db mysql://root:pass@localhost/mydb
$ seedforge --db sqlite:///path/to/db.sqlite
$ seedforge --db postgres://localhost/mydb --count 100 --seed 42
$ seedforge --db postgres://localhost/mydb --output seed.sql --dry-run
$ seedforge --prisma ./prisma/schema.prisma --output seed.sql
$ seedforge --drizzle ./src/db/schema.ts --db postgres://localhost/mydb
$ seedforge --config .seedforge.yml
$ seedforge --db postgres://localhost/mydb --inspect
$ seedforge --db postgres://localhost/mydb --only users posts --count 500
$ seedforge --prisma ./schema.prisma --only Order --strict-onlyPartial seeding (--only)
By default seedforge generates rows for every table in the schema. Pass --only
to restrict generation to a subset:
# Generate 500 rows each for users + posts only.
seedforge --db $DATABASE_URL --only users posts --count 500Any tables referenced via FK from the selected set are auto-included so the
result stays referentially valid. For example, --only comments on a
comments → posts → users graph will also seed posts and users.
Pass --strict-only if you want an explicit error instead of auto-inclusion —
useful in CI when you want to guarantee the exact list of tables that will be
touched:
seedforge --db $DATABASE_URL --only comments --strict-only
# ✖ SF5041: --only requires auto-included FK ancestors: posts, users--only composes with --exclude (glob patterns) and with --count /
--seed / --output / --dry-run / --fast as you'd expect.
Schema inspection (--inspect)
Pass --inspect to get a structured description of what seedforge sees in
your schema — columns, primary keys, foreign keys, unique + check constraints,
enums, and the topological insert order — without generating any rows:
seedforge --prisma ./schema.prisma --inspectCombine with --json --quiet to get a machine-readable InspectReport for
scripting or CI audits:
seedforge --db $DATABASE_URL --inspect --json --quiet > schema.jsonThis is the same data structure the in-browser playground shows in its "detected schema" panel, so you can verify locally what the CLI will do before you run it against a real database.
How It Works
- Introspect -- connects to your database and reads the catalog (pg_catalog, information_schema, or sqlite_master), or parses your ORM schema files (Prisma, Drizzle, TypeORM, JPA)
- Resolve dependencies -- builds a directed graph of FK relationships, detects cycles, and produces a topological insertion order with deferred UPDATEs for cycles
- Map columns -- analyzes 190+ column name patterns across 10 domains to select appropriate Faker.js generators (e.g.,
first_name->faker.person.firstName()) - Generate -- produces rows table-by-table in dependency order, threading FK references, enforcing uniqueness, respecting constraints, and applying cardinality distributions
- Output -- inserts directly into the database (with sequence reset), writes a SQL file, or prints a dry-run summary
Supported Databases
| Database | Introspection | Direct Insert | COPY (fast) | File Export | |------------|:---:|:---:|:---:|:---:| | PostgreSQL | yes | yes | yes | yes | | MySQL | yes | yes | -- | yes | | SQLite | yes | yes | -- | yes |
Supported Schema Parsers
| Parser | Source Type | Live DB Required |
|---------|------------|:---:|
| Prisma | .prisma files | No (for file output) |
| Drizzle | TypeScript/JavaScript schema files | No (for file output) |
| TypeORM | TypeScript entity classes with decorators | No (for file output) |
| JPA | Java entity classes with annotations | No (for file output) |
Supported Column Types
TEXT, VARCHAR, CHAR, INTEGER, BIGINT, SMALLINT, SERIAL, BIGSERIAL, REAL, DOUBLE, DECIMAL, NUMERIC, BOOLEAN, DATE, TIME, TIMESTAMP, TIMESTAMPTZ, INTERVAL, UUID, JSON, JSONB, BYTEA, INET, CIDR, MACADDR, ARRAY, ENUM, POINT, and more. Unknown types fall back to safe defaults.
Column Pattern Domains
seedforge recognizes column semantics across 10 domains:
| Domain | Example Columns | Example Output |
|--------|----------------|----------------|
| Person | first_name, last_name, age | Marcus, Chen, 34 |
| Contact | email, phone | [email protected], +1-555-0123 |
| Internet | url, domain, ip_address | https://example.com, 192.168.1.1 |
| Location | city, country, zipcode | San Francisco, US, 94102 |
| Finance | price, amount, currency | 29.99, 1500.00, USD |
| Commerce | product_name, category, sku | Wireless Headphones, Electronics |
| Text | description, body, comment | Lorem ipsum dolor sit amet... |
| Temporal | created_at, updated_at, birthdate | 2025-09-14T08:23:01Z |
| Boolean | is_active, has_verified, enabled | true, false |
| Identifiers | uuid, slug, code | 550e8400-e29b-41d4-a716-446655440000 |
Plugin System
Extend seedforge with custom schema parsers and data generators:
// my-parser-plugin.ts
import type { SchemaParserPlugin } from '@otg-dev/seedforge'
const plugin: SchemaParserPlugin = {
name: 'my-parser',
version: '1.0.0',
filePatterns: ['*.myorm'],
detect: async (projectRoot) => { /* ... */ },
parse: async (projectRoot) => { /* ... */ },
}
export default pluginseedforge --plugin ./my-parser-plugin.ts --output seed.sqlHow seedforge compares
| | Schema-aware | Zero-config | Deterministic | FK graph | Cross-DB |
|---|:---:|:---:|:---:|:---:|:---:|
| seedforge | yes | yes | yes (--seed) | yes | PG / MySQL / SQLite |
| @faker-js/faker | no — field-level only | no — you write the schema | no | no — single field at a time | n/a |
| Mockaroo | manual UI/API setup | no | no | partial | export to SQL |
| Hand-rolled SQL seed scripts | drifts with migrations | no | yes (until schema changes) | manual | per-DB hand-written |
seedforge sits between Faker (great primitive, no schema awareness) and platforms like Tonic / Gretel (powerful but enterprise-scale). The wedge: zero-config relational seeding from real schemas, deterministic by default.
Contributing
See CONTRIBUTING.md for setup instructions and guidelines.
