@seed-kit/cli
v1.5.1
Published
Generate realistic, constraint-safe seed data for any database
Downloads
186
Maintainers
Readme
SeedKit
Generate realistic, constraint-safe seed data for any database.
SeedKit connects to your PostgreSQL, MySQL, or SQLite database, reads the schema, and generates seed data that respects foreign keys, unique constraints, check constraints, and enum types -- all without copying production data.
Install
npm install -g @seed-kit/cliOr run without installing:
npx @seed-kit/cli generate --db postgres://localhost/myapp --rows 1000Quick Start
# Generate 1000 rows per table as SQL
seedkit generate --db postgres://localhost/myapp --rows 1000 --output seed.sql
# Insert directly into database
seedkit generate --db postgres://localhost/myapp --rows 1000
# JSON or CSV output
seedkit generate --db postgres://localhost/myapp --rows 100 --output data.json
# Deterministic output with seed
seedkit generate --db postgres://localhost/myapp --rows 100 --seed 42 --output seed.sqlDatabase Connection
SeedKit automatically finds your database URL by checking (in order):
--dbCLI flagDATABASE_URLenvironment variable.envfile in the current directoryseedkit.tomlconfig file
Supported URL formats:
# PostgreSQL
seedkit generate --db postgres://user:pass@localhost:5432/mydb
# MySQL
seedkit generate --db mysql://user:pass@localhost:3306/mydb
# SQLite
seedkit generate --db sqlite://path/to/db.sqliteAI-Enhanced Classification
SeedKit can use an LLM to improve column classification beyond the built-in 50+ regex rules. This helps with ambiguous column names that the rule engine classifies as Unknown.
# Set one of these environment variables:
export ANTHROPIC_API_KEY=sk-ant-... # Uses Claude Sonnet (default)
export OPENAI_API_KEY=sk-... # Uses GPT-4o (default)
# Run with --ai flag
seedkit generate --db postgres://localhost/myapp --rows 1000 --ai --output seed.sql
# Override the model
seedkit generate --db postgres://localhost/myapp --rows 1000 --ai --model claude-opus-4-20250514The AI classification is cached locally so subsequent runs with the same schema don't re-query the LLM. Results are also stored in the lock file for team reproducibility.
Smart Sampling
Extract statistical distributions from a production database to generate data that mirrors real patterns:
# Sample distributions (read-only, PII auto-masked)
seedkit sample --db postgres://readonly-replica:5432/myapp
# Generate using sampled distributions
seedkit generate --db postgres://localhost/myapp --rows 1000 --subset seedkit.distributions.jsonAll Commands
| Command | Description |
|---|---|
| seedkit generate | Generate seed data (SQL, JSON, CSV, or direct insert) |
| seedkit sample | Extract production distributions with PII masking |
| seedkit introspect | Analyze schema and show classification results |
| seedkit preview | Preview sample rows without full generation |
| seedkit check | Detect schema drift against lock file (CI-friendly) |
| seedkit graph | Visualize table dependencies (Mermaid or Graphviz) |
Configuration
Create a seedkit.toml in your project root:
[database]
url = "postgres://localhost/myapp"
[generate]
rows = 500
seed = 42
[tables.users]
rows = 1000
[tables.orders]
rows = 5000
# Custom value lists with optional weights
[columns."products.color"]
values = ["red", "blue", "green", "black", "white"]
weights = [0.25, 0.20, 0.20, 0.20, 0.15]Supported Platforms
| Platform | Architecture | |---|---| | Linux | x64, ARM64 | | macOS | Intel, Apple Silicon | | Windows | x64 |
Documentation
Full documentation, architecture details, and benchmarks: github.com/kclaka/seedkit
License
MIT
