[OPEN-ILS-DEV] libjs

Sat Feb 17 20:11:31 EST 2007

On 2/15/07, Eric Lesage <lesagee at iro.umontreal.ca> wrote:
> On Tue, 13 Feb 2007, Mike Rylander wrote:

[snip]

> > Is that generally what you were asking about?
>
> Yes, thank you for the information. What I was looking at was the question
> of availability. What I gather is that when there is a system upgrade, a
> given staff client is tolerant enough to refetch the IDL and work with it
> even if the tables have changed a bit, so not everything has to be done at
> once (well, up to the limits defined in the version scheme you defined).

Right.  Now, I wouldn't advise anyone to perform any upgrade during
normal business hours on any server-side software if it can be helped,
but we have in order to fix large-ish issues.

Best-practices aside, the only time we need to take Evergreen down
(from an outside perpsective) would be to restart something that all
boxes in the cluster must use.  We've built the cluster in such a way
that we can take a chunk (we call them bricks, and there are four in
production today consisting of five servers a piece) of it out of
production, upgrade it, and then put it back into rotation.  Then
lather, rinse, repeat with the rest of the bricks.

This might seem like a bad idea because the upgrade to the whole
cluster isn't atomic, but in practice it's actually quite safe.  The
practical safety comes from the fact that keep-alive settings in
Apache will pin a client to one particular brick for the duration of
the building of any one interface, and each brick is atomically
updated from the perspective of the outside world.  We could make this
even more certain by using a source-based hash in LVS to distribute
the load, but we haven't seen the need, and round-robin (being dumber)
keeps things smoother.

We built things this way so that we would have the option of never
creating downtime for almost all upgrades, whether they were required
during peak hours or even if we had the luxury of doing it after
hours.  Even during non-peak hours you'll have some traffic, and that
traffic will tend to be from your power users -- the people you want
to keep the happiest.  These rolling upgrades make updates transparent
to the user, so peak time or not, there's no down time as far as they
are concerned.  (With your correct qualification of very major
upgrades...)

In PINES' case, we will be scheduling some downtime (probably 15
minutes, of which we plan to use just a few seconds) in order to
change the load distribution between our replicated DBs and to bring
another replica into rotation.  Of the 40 or more upgrades we've
pushed out, this will be maybe the 4th requiring us to put up the
"we'll be back in a minute" page, and at least one of the previous
times I'm sure that wasn't required.

So, in short ... yes. :)

>
> Regards,
>
> --
> Eric Lesage
>

-- 
Mike Rylander
mrylander at gmail.com
GPLS -- PINES Development
Database Developer
http://open-ils.org