[OPEN-ILS-DEV] Clustering Mechanics

Mike Rylander mrylander at gmail.com
Fri Jun 1 14:36:08 EDT 2007


On 6/1/07, Uhlman, Brandon EDUC:EX <Brandon.Uhlman at gov.bc.ca> wrote:
> Hi, everyone.
>
> British Columbia is well into the planning stages of a pilot Evergreen
> implementation for some of our libraries. Right now, our goal is to
> enter production with the first participating sites this fall. After a
> meeting yesterday, one of the participants asked me about load balancing
> and the brick structure (that PINES is using) that I couldn't answer
> with complete confidence.
>
> He asked: "...it seems there are two levels of load balancing. First,
> LVM is used to load balance between identical units of brics. Second,
> within a bric, balance is based on application. I wonder if a single
> level of load/function balance can be used to reduce number of servers
> and achieve similar levels of performance and redundancy."
>
> My understanding, sketchy at best, is that the Jabber/OpenSRF router
> machines on each of the bricks still have to do some (relatively)
> processor-intensive work, making their max workload CPU-bound.
>

You would generally run Apache on the brick leads, so you've got that
load (CPU and memory) too.

> Therefore, using only OpenSRF/Jabber routing as sort of an ad hoc load
> balancer between an arbitrary number of app servers - think of it as a
> single brick, containing all the application machines - is limited by
> how much processor work the routing box needs to do. I think this played
> itself out Georgia's own implementation as well, when deciding how big
> to make the individual bricks.

I don't remember it that way... :)  (I proposed the "bricks" early on
before we'd written OpenSRF.  It's a common clustering mechanism.)

Actually, the main reason for having multiple "bricks" is that we can
upgrade the entire system transparently by pulling one brick out at a
time.  We need to be able to pull the app servers out of rotation in a
graceful way (tell LVS not to send more work this way) so that any
active backend processes have a chance to finish and respond.  Once
every active request has been completed we can upgrade and restart at
will.  We then put the brick back into rotation and move on to the
next.

As far as sizing the bricks, you basically want to be able to handle
the entire load without breaking a sweat while you're one brick down.
So, say you have 15 app servers.  If they are normally heavily loaded
you would want to make, say, five 3-box bricks.

>
> Another option, if I understand the process correctly, would be to run
> an LVS-type load balancer that distributes incoming requests directly to
> an arbitrary app server, based on whatever LVS algorithm is being used,
> and that app server deals with all the requests thrown at it, bypassing
> this Jabber/OpenSRF router idea. This feels like a Stephen Colbert
> situation to me - I *feel* like this is a bad solution in my gut, but I
> can't quite put my finger on why. :-)
>

No, that won't work.  :)

> If anyone who has looked at the codebase or considered the clustering
> mechanics more than I has any thoughts on this, that'd be super.
>
> Cheers,
>
> Brandon
>
> ================================
> Brandon W. Uhlman
> Systems Consultant
> Public Library Services Branch
> British Columbia Ministry of Education
>
> 605 Robson Street,  5th Floor
> Vancouver, British Columbia
> V6B 5J3
>
> Phone: (604) 660-2972
> Toll-free: (800) 663-2165
>
> "Arranging their knowledge by category
> just made it easier to absorb. Dewey,
> you fool! Your decimal system has
> played right into our hands!"
>   - Giant Space Brain, "Futurama"
>
>


-- 
Mike Rylander


More information about the Open-ils-dev mailing list