[OPEN-ILS-DEV] login problem with Evergreen and LVS -- SOLVED

Tue May 22 17:01:23 EDT 2007

On 5/22/07, dkyle <dkyle at grpl.org> wrote:
> Nope, both EG app servers are running their own memcache - we'll have to
> change that.
>
> How do you guys deal with memcache for a bunch of EG app servers?

For the first six months or so we had about 20 servers listed in all
the opensf.xml files (4 of them, at the time -- 4 clusters of 5
machines, sitting on top of 3 DBs).  Under normal circumstances this
works fine, and causes no noticeable overhead.

Once in a while a bug would pop up that would take a single machine
down.  That's no problem at all for the application proper; OpenSRF
just ignores the misbehaving machine and distributes the load to the
rest (yay opensrf).  However, it was a problem for memcache.  1/20th
of the session keys (and the search cache, and some other stuff) would
go away, and in some cases (depending on the timing and length of the
outage for that one server, etc) the entire memcache cluster would
start acting wonky.

So, we've since move memcache to a dedicated set of machines that only
provide caching services.  Incidentally, we have also since squashed
all the bugs that we'd seen cause such issues (so fear not, at least
until the next bug is found), but segregating the caching services
makes sense on its own, IMO.

--miker

>
> Mike Rylander wrote:
> > On 5/22/07, dkyle <dkyle at grpl.org> wrote:
> >> After some more poking around, I'm guessing that the client receives a
> >> session key from the initial http request, and any subsequent https
> >> requests to a server that was not aware of that key (the other real
> >> server in my case) would fail?
> >
> > The most likely reason is that the auth service is not using the same
> > memcache pool on both of the EG app servers.  If this is the case,
> > you'll likely get errors about not being logged in or sessions timing
> > out or the like from within the staff client.
> >
> > Can you confirm that both EG servers are pointing to the same set of
> > memcache servers (opensrf.xml =>
> > /opensrf/default/cache/global/servers/server)?
> >
> > --miker
> >
> >>
> >> We had two different virtual services setup, http, and https.  I changed
> >> the LVS to use one virtual service for the virtual IP regardless of
> >> port, and it works.
> >>
> >> dkyle wrote:
> >> > We are testing a small Evergreen cluster.  We setup two servers
> >> > running Postgres and pgpool, two "real servers" running Evergreen, and
> >> > one LVS server.  Login with the Evergreen client would sometimes fail
> >> > with a lengthy message regarding no authentication seed being found.
> >> > Some packet capturing revealed that the error occurred whenever the
> >> > LVS director changed real servers between the clients initial http
> >> > request, and the subsequent https request.  This seemed a little
> >> > strange since this involves 2 separate TCP streams.
> >> >
> >> > Is this what would be expected? If so, how have others dealt with the
> >> > issue?
> >> > What is going on with the client login process during the initial http
> >> > request that requires the https request to hit the same real server?
> >> > I did look through the packet decode for that, but I don't understand
> >> > the internal workings enough.
> >> >
> >> > Doug.
> >>
> >>
> >
> >
>
>

-- 
Mike Rylander