[OPEN-ILS-DEV] Database upgrades (recent approach)

Tue Sep 22 12:45:28 EDT 2009

On Tue, 2009-09-22 at 08:52 -0400, Mike Rylander wrote:
> On Tue, Sep 22, 2009 at 12:56 AM, Dan Scott <dan at coffeecode.net> wrote:
> > On Mon, 2009-09-21 at 16:44 -0400, Bill Erickson wrote:
> >>
> >>
> >> On Mon, Sep 21, 2009 at 3:23 PM, Dan Scott <dan at coffeecode.net> wrote:
> >>         It's great to see the atomic database migration scripts
> >>         (better name
> >>         suggestions appreciated) getting committed recently! Is there
> >>         a design
> >>         document that lays out how these are expected to be used & how
> >>         other
> >>         developers should create these?
> >>
> >>
> >> Thanks for the poke Dan, I've been meaning to discuss and request
> >> feedback on this.  The problem of managing schema changes has gotten
> >> out of hand when working on a development server.  The goal was to
> >> start moving in the direction of atomic changes then work out the
> >> rough edges.   Here's a rough brain dump of where things stand
> >> currently:
> >>
> >> Each branch has it's own upgrade directory in Open-ILS/src/sql/Pg/, if
> >> needed.
> >>
> >> Every schema change or addition of seed data to a branch results in a
> >> new file in that directory.
> >>
> >> Clumping multiple changes into a single file is OK if they are tightly
> >> linked (e.g. seed data for 2 related org unit settings).
> >>
> >> Each new file uses the next unique numeric prefix.  0001, 0002, etc.
> >>
> >> Following the numbers is ".schema" or ".data", indicating a schema
> >> change or new seed data.
> >
> > Hmm. How about the case that I outlined previously about adding an
> > enhanced subject index, in which we would generate new rows for every
> > existing bib record? It's munging existing data and creating new data,
> > but it's not seed data. I think that we could put that in the data
> > category and name it a .data file, but we wouldn't put it into the
> > 950.sql seed data file.
> 
> 
> That (the INSERT INTO .. SELECT) should go in the .data file right
> next to the insert of the definition.  And, yes, it would be skipped
> in any baseline file.

Okay. So data is not just for seed data, but also for munged data.
Thanks for the confirmation. High-fives all around!

> >
> > Just a picky detail.
> >
> >> Anything after that is just descriptive.
> >>
> >> If it's necessary to back-port a DB change, just merge the .sql file
> >> back into the branch's upgrade directory.
> >>
> >> Each new file tells the DB that it has been inserted with "INSERT INTO
> >> config.upgrade_log (version) VALUES ('$ENTRY');"  The value for $ENTRY
> >> needs discussion.  I belive most of the existing scripts put the
> >> actual file name in there, while 002.schema.config.sql (which defines
> >> the baseline for a newly created DB) is currently just storing the
> >> numeric prefix.
> >>
> >> The value in 002.schema.config.sql will also need to be bumped up with
> >> each additional script.
> >>
> >> For schema changes, the db upgrade scripts capture the ALTER TABLE,
> >> etc. statements.  Changes to the table definitions still need to be
> >> added to the base schema file for new installations.
> >>
> >> Similarly, seed data should still go into 950.data.seed-values.sql as
> >> well.  At least, that's the current thought.  I could see an argument
> >> for changing that.
> >
> > I kind of like one big 950.data.seed-values.sql file, as it makes it a
> > bit easier to see everything in one place. That said, there's something
> > that seems a bit wrong about duplicating data... chances of typos
> > creeping into one version or the other, for example.
> >
> 
> Seeing everything in one place, and not trying to depend on these
> upgrade scripts for a fresh install, which IMO is just asking for
> trouble.  There will be lots of upgrade-time bits, both schema and
> data and munging, the should not (and probably cannot) happen at
> install-time.

Agreed.

> I'm very strongly in favor of keeping the baseline seed data file (and
> also in favor of splitting it into "required" and "demo" files, and
> "demo" data would almost never go into an upgrade script ... but this
> is a topic for another email).

When you write that other email, you might want to even consider the
possibility of several different demo files, representing different
starting points for stock scenarios (large public library consortium vs.
academic consortium vs. small special library). But that could be way
down the road.

> >>         I also think we have a number of atomic scripts to create
> >>         retroactively (?)
> >>
> >>
> >> It may be simpler to just account for missing changes in the first
> >> batch of upgrade scripts.  IOW, going back and adding all the changes
> >> could be a big job and not worth the effort.  I'll defer judgement on
> >> that, though.
> >
> > Right, I agree that it could be a big job. So you're suggesting rolling
> > those changes into a traditional big upgrade SQL script for this kick at
> > the can? If so, I would support that.
> >
> 
> I say no to any pre-0001 upgrade scripts, if that's what you're
> discussing -- it's not clear to me.  Assuming that, with the
> config.upgrade_log versioning we have a point in time when the new age
> starts, and we just say, "install the db from /at least/ revision X"
> and move on.  There are some db changes from Scott that need to be
> upgrade-ified since the first script was created, but I don't think we
> should invest any time in a bridge to this brave new world.

Yes, to be clear, I was talking about Scott's post-rel_1_6_0 schema
changes that predated upgrade script 0001. So by "upgrade-ified", do you
mean that these just need to be created as a set of the next
00xy-numbered upgrade scripts (0010, 0011, etc), versus a single 0000
script that holds all of those changes and which would ensure they are
applied in chronological order? I haven't checked the existing 9 upgrade
scripts to see if they depend on any changes that Scott committed
earlier, so perhaps it's not an issue in any case.