[OPEN-ILS-DEV] Batch program dying on cstore timeouts

Mon Jan 3 20:29:41 EST 2011

On Mon, Jan 3, 2011 at 8:07 PM, John Craig
<jc-mailinglist at alphagconsulting.com> wrote:
> Hi Folks,
>
> We've got a batch program in Java (we had to make a few mods to the Java
> interface that we'll have patches for shortly). The program does bib
> overlay/merging. Essentially, it just reads a MARC file and then makes
> REQUEST calls to OpenSRF to do the XML update and bib merges.
>
> The trouble with it is, it does a couple of things that we do not like so
> much:
>
> 1) It dies periodically (arrgh) because the OpenSRF/Open-ILS
> indexing/metabib update stuff can get ahead of the database and so the
> cstore calls can time out when the busy DB doesn't respond rapidly enough;
> and
> 2) It can really hammer the DB; so that although the program could run while
> the system is in use (based on it's touching a record at a time); as it
> stands, it tends to hog all the cycles on the DB host.
>
> Now, we've investigated some options for fixing this:
>
> Option 1): Add a configurable wait (the program just pauses for n
> milliseconds every m records).
> It takes some tuning to get this to the point that it doesn't overload the
> DB so each run is a kind of trial and error situation. But it has the
> advantage of being dead simple.
>
> Option 2): Retry when there's a cstore error.
> This results in a somewhat unexpected situation in which, after a certain
> number of errors, the Evergreen login becomes invalid and so you can't even
> restart the program using the same login: inconvenient (this lock-out may be
> on a timer, but we've just resorted to restarting all processes to clear
> it). The existence of such a feature makes some sense, but it wasn't a
> result we'd anticipated. So far, we haven't found where this lock-out could
> be adjusted (and we don't want to have to install a custom version of
> OpenSRF or Open-ILS to prevent the errors leading to the login lock-out).
> And, it's not much help with the hammering-the-DB issue. So, this isn't
> sounding like a terribly good option.
>
> So, now to the question: should we use some other technique to ensure that
> we're not spawning c-store tasks to handle each record, but are using a
> single process synchronously? And, if that's the way to go, would that be a
> stateful connection? (Not currently supported by the Java API, it would
> seem, but we could sure add it.) Or, perhaps more elegantly, is there a way
> to allow some number of asynchronous processes to do the DB updates, but
> somehow limit it to a given number?
>

There is, though it's not Java but Perl:
http://svn.open-ils.org/trac/OpenSRF/browser/trunk/src/perl/lib/OpenSRF/MultiSession.pm

That provides a simple parallelization mechanism (think: MapReduce)
that will only allow the configured number of requests to actually
occur at the same time, queuing up the rest.  The trunk fine generator
(http://svn.open-ils.org/trac/ILS/browser/trunk/Open-ILS/src/support-scripts/fine_generator.pl)
is a super-simple client, and the trunk hold processor
(http://svn.open-ils.org/trac/ILS/browser/trunk/Open-ILS/src/support-scripts/hold_targeter.pl)
is a somewhat more sophisticated client that uses an advanced option
(session_hash_function) to make sure that inter-related requests are
processed in the correct order, which is important for hold processing
but not for fine generation.

This is available in all released versions of OpenSRF, not just trunk
-- it's been there forever.

Of course, it may just be a matter of using a different, existing API
call so that you are getting foreground ingest calls instead of
background ones that return a COMPLETE message to the client early.
Without seeing the code, I couldn't say one way or the other...

-- 
Mike Rylander
 | VP, Research and Design
 | Equinox Software, Inc. / The Evergreen Experts
 | phone:  1-877-OPEN-ILS (673-6457)
 | email:  miker at esilibrary.com
 | web:  http://www.esilibrary.com