[OPEN-ILS-DEV] login problem with Evergreen and LVS -- SOLVED

Tue May 22 17:49:43 EDT 2007

Thanks for the info Mike.  We really hadn't thought that much about 
memcache yet, but segregating sounds like the thing to do.

Mike Rylander wrote:
> On 5/22/07, dkyle <dkyle at grpl.org> wrote:
>> Nope, both EG app servers are running their own memcache - we'll have to
>> change that.
>>
>> How do you guys deal with memcache for a bunch of EG app servers?
>
> For the first six months or so we had about 20 servers listed in all
> the opensf.xml files (4 of them, at the time -- 4 clusters of 5
> machines, sitting on top of 3 DBs).  Under normal circumstances this
> works fine, and causes no noticeable overhead.
>
> Once in a while a bug would pop up that would take a single machine
> down.  That's no problem at all for the application proper; OpenSRF
> just ignores the misbehaving machine and distributes the load to the
> rest (yay opensrf).  However, it was a problem for memcache.  1/20th
> of the session keys (and the search cache, and some other stuff) would
> go away, and in some cases (depending on the timing and length of the
> outage for that one server, etc) the entire memcache cluster would
> start acting wonky.
>
> So, we've since move memcache to a dedicated set of machines that only
> provide caching services.  Incidentally, we have also since squashed
> all the bugs that we'd seen cause such issues (so fear not, at least
> until the next bug is found), but segregating the caching services
> makes sense on its own, IMO.
>
> --miker
>
>>
>> Mike Rylander wrote:
>> > On 5/22/07, dkyle <dkyle at grpl.org> wrote:
>> >> After some more poking around, I'm guessing that the client 
>> receives a
>> >> session key from the initial http request, and any subsequent https
>> >> requests to a server that was not aware of that key (the other real
>> >> server in my case) would fail?
>> >
>> > The most likely reason is that the auth service is not using the same
>> > memcache pool on both of the EG app servers.  If this is the case,
>> > you'll likely get errors about not being logged in or sessions timing
>> > out or the like from within the staff client.
>> >
>> > Can you confirm that both EG servers are pointing to the same set of
>> > memcache servers (opensrf.xml =>
>> > /opensrf/default/cache/global/servers/server)?
>> >
>> > --miker
>> >
>> >>
>> >> We had two different virtual services setup, http, and https.  I 
>> changed
>> >> the LVS to use one virtual service for the virtual IP regardless of
>> >> port, and it works.
>> >>
>> >> dkyle wrote:
>> >> > We are testing a small Evergreen cluster.  We setup two servers
>> >> > running Postgres and pgpool, two "real servers" running 
>> Evergreen, and
>> >> > one LVS server.  Login with the Evergreen client would sometimes 
>> fail
>> >> > with a lengthy message regarding no authentication seed being 
>> found.
>> >> > Some packet capturing revealed that the error occurred whenever the
>> >> > LVS director changed real servers between the clients initial http
>> >> > request, and the subsequent https request.  This seemed a little
>> >> > strange since this involves 2 separate TCP streams.
>> >> >
>> >> > Is this what would be expected? If so, how have others dealt 
>> with the
>> >> > issue?
>> >> > What is going on with the client login process during the 
>> initial http
>> >> > request that requires the https request to hit the same real 
>> server?
>> >> > I did look through the packet decode for that, but I don't 
>> understand
>> >> > the internal workings enough.
>> >> >
>> >> > Doug.
>> >>
>> >>
>> >
>> >
>>
>>
>
>