@roostorg/db-migrator
v1.0.6
Published
Migrating (and eventually seeding) our dbs on deploy
Downloads
28
Readme
Concepts/Terminology
The CLI classifies scripts as "seeds" or a "migrations". The fundamental difference is that a migration runs across every environment -- including production -- whereas a seed only runs in one environment. Therefore, schema changes should always be migrations. "Reference data" that are the same in every environment -- essentially, application constants that just happen to be stored in the database -- are also appropriate to add as migrations. But fixtures data for tests or other specific use cases is a seed.
Critically, seed files are timestamped just like migrations, and they're run interspersed with the migrations. This approach is unconventional -- the normal approach is to have seed scripts run after all migrations have been applied -- but it makes sense given that we're writing the seeds as SQL or JS scripts that don't have access to our Sequelize model's db mappings.
Under the conventional approach, the seed scripts must be written against the latest schema (since they run after all migrations). This isn't a problem when the seed scripts are written using entity classes that the main application also uses to talk to the database, because the ORM mapping for those classes will also have updated when the migration is deployed. I.e., if we wrote our seed scripts in typescript using our sequelize models classes, then updating the model definitions to keep our classes working for our app would also keep our seed scripts working, and we might even get type errors (from Typescript) if, e.g., we removed a field from the model class that a seed script depended on.
However, when seed scripts don't go though the Sequelize model classes, and instead include direct references to SQL column names etc (as ours do), then using the conventional approach of running seeds at the end means that making a schema change in a new migration can break the existing seed scripts. This is bad. It creates extra work that must be done after each new migration to keep the seed scripts working with the latest schema. More importantly, though, we might not catch for quite a while that a migration has broken one of our seed scripts, since we'd get no compile errors and since, even on deploy, only the seed scripts for the environment being deployed get run. E.g., it'd be a pain to find out three weeks after writing a migration, when we go to re-deploy demo, that we'd actually broken its seed script; then we'd have to go back, remember what we changed, and fix it.
For now, we don't want to introduce the complexity of exporting/depending on our Sequelize model classes in our seed scripts, so it makes sense to ditch the conventional approach of running seeds all at the end.
By instead interspersing the running of the seed scripts w/ the migrations' schema changes, each script knows exactly what the schema will look like at the time it runs, and can't be broken by future schema changes.
That said, long term, it would be nice to write the seed scripts using Sequelize models and run them at the end, because that would allow us to have fewer, more-intentionally-divided seed scripts, which makes it a bit easier to see exactly what seed data we're setting up (without actually checking the db). It would also be more performant.
