[OPEN-ILS-GENERAL] ***SPAM*** RE: Evergreen & Software Performance Analysis

Mike Rylander mrylander at gmail.com
Thu Oct 3 11:20:03 EDT 2013


On Wed, Sep 25, 2013 at 5:51 PM, Scott Myers
<sMyers at catalystitservices.com> wrote:
> Mike,
>
> The multithreaded reingest project was shared during the hackathon at the last evergreen conference.
>

Thanks, Scott.  I'd like to have responded on this sooner, but I was
sick for a few days, and then it was "dig out of email overload" for
the last couple.

> Here is a link to what we ended up running for moving KCLS from 2.1 to 2.2.
>
> https://github.com/CatalystIT/multithread_2_2_update
>
> The files to pay attention to are the data_update_driver.pl and the update_driver.pl both have pod files attached with quite a few comments on how they work.
>
> If I can clear up what that means basically we created driver files that divide large amounts of data into smaller chunks and run those on multiple connections for cpu bound updates. A good example is the 2.1->2.2  which had changes in how the data was stored in the metabib field entry tables. This was a very CPU bound update and ended up being run with 32 simultaneous connections to reduce the amount of estimated time from 5 days to complete in 4 hours.
>

So, if I'm following the code correctly, the idea is to generate a
huge SQL script that contains an update statement for each non-deleted
bib record, and use this tool to split that script into several that
can run in parallel.  That's a good goal, and this helpfully codifies
the advice that's generally given for migrations and upgrade
reingests, though I personally usually just use and recommend the unix
split command and a set of psql sessions inside screen.  If I'm not
following the code as you intend, please let me know.

However, there's a caveat to using this technique, generally, for 2.3
and beyond.  Because of the browse indexing (and more specifically,
the unique requirement on browse entries) done now, parallelizing
becomes a bit harder.  It's still possible, mind you, but you need to
take certain (new) steps before reingesting, and then perform a final
post-reingest run to handle the browse data.  Just a head's up that
you may run into forced serialization if you use your script, or the
split+psql method, for future post-upgrade reingests.

-- 
Mike Rylander
 | Director of Research and Development
 | Equinox Software, Inc. / Your Library's Guide to Open Source
 | phone:  1-877-OPEN-ILS (673-6457)
 | email:  miker at esilibrary.com
 | web:  http://www.esilibrary.com

> Let me know if you have questions on how this can be setup or run.
>
> Thanks
>
> Scott Myers
>
> -----Original Message-----
> From: open-ils-general-bounces at list.georgialibraries.org [mailto:open-ils-general-bounces at list.georgialibraries.org] On Behalf Of Mike Rylander
> Sent: Wednesday, September 25, 2013 1:41 PM
> To: Evergreen Discussion Group
> Subject: Re: [OPEN-ILS-GENERAL] Evergreen & Software Performance Analysis
>
> Scott,
>
> I echo Rogan's down-thread thanks for following up here.
>
> I'm curious where the multi-threaded reingest project is shared.  I can't find anything like that searching any of the Evergreen the mailing lists or launchpad for terms like "ingest" and "multi".
> Perhaps I'm just missing it.  Some interest was expressed in the community IRC channel, but also some confusion as to what exactly that means.
>
> TIA,
>
> --
> Mike Rylander
>  | Director of Research and Development
>  | Equinox Software, Inc. / Your Library's Guide to Open Source  | phone:  1-877-OPEN-ILS (673-6457)  | email:  miker at esilibrary.com  | web:  http://www.esilibrary.com
>
>
> On Wed, Sep 25, 2013 at 3:50 PM, Scott Myers
> <sMyers at catalystitservices.com> wrote:
>> Hi Rogan,
>>
>>
>>
>> The db work Command Prompt has done for KCLS is mostly configuration things,
>> work mem, max connections, etc. They have been fine tuning all those
>> settings to get the best performance. These settings wouldn't help other
>> people as it would be dependent on each libraries load. Another change made
>> by Command Prompt was to remove slony replication and move to pgpool. If
>> anyone needs help doing the same with their database I would highly
>> recommend Command Prompt.
>>
>>
>>
>> As for work done by Catalyst, all work that is directly applicable and
>> beneficial to the community has been added. Kyle Tomita
>> https://launchpad.net/~tomitakyle and Fred Parks
>> https://launchpad.net/~fparks have been the most active community members
>> from our team with Kyle being the 9th on the top contributors list as of
>> 9/24/13.
>>
>>
>>
>> Catalyst also shared a multithreaded bib reingest that greatly reduces the
>> time needed to do a full reingest. We also plan to share the way that
>> Catalyst deploys code to KCLS without downtime.
>>
>>
>>
>> Catalyst considers itself part of the community and is actively working to
>> add more value. We have developed a strong relationship with KCLS and enjoy
>> working with them greatly and our relationship has allowed us to gain a
>> strong understanding of Evergreen. We've got some interesting work that we
>> are going to be doing in the near future for KCLS, and as we have in the
>> past, that which is beneficial to the community will be shared.
>>
>>
>>
>> If you would like detail on any of these items now, feel free to reach out
>> to me. You have my cell phone number.
>>
>>
>>
>> Thanks
>>
>>
>>
>> Scott Myers
>>
>>
>>
>>
>>
>> From: open-ils-general-bounces at list.georgialibraries.org
>> [mailto:open-ils-general-bounces at list.georgialibraries.org] On Behalf Of
>> Rogan Hamby
>> Sent: Tuesday, September 24, 2013 7:10 AM
>> To: Joshua D. Drake
>> Cc: Evergreen Discussion Group
>> Subject: Re: [OPEN-ILS-GENERAL] Evergreen & Software Performance Analysis
>>
>>
>>
>> Picking back up an old thread...
>>
>>
>>
>> I was hoping at some point to hear more about the db work Command Prompt has
>> done for KCLS and perhaps see some work in git. I was sad to see that in the
>> new LJ article that Jed Moffitt said that at this point KCLS has forked
>> Evergreen so I suppose the work Catalyst and Command Prompt has done isn't
>> relevant to the rest of the Evergreen community.  I suppose that also means
>> that any experience gained in working on the KCLS system isn't
>> transferrable.
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> On Thu, Aug 22, 2013 at 11:05 AM, Rogan Hamby <rogan.hamby at yclibrary.net>
>> wrote:
>>
>> Hi Joshua,
>>
>>
>>
>> I don't know if you had a chance to see my message below so I'll copy you in
>> directly as well and maybe touch base again after labor day.  With the
>> Evergreen community having a rich collection of input from various
>> contributors (many like yourself paid to do individual development by
>> community members) all participating in the open source spirit and putting
>> their code out there, allowing others to build on top of it or modify it or
>> package it into master it would be exciting to see this work since you've
>> indicated it's had a big impact for your customers.
>>
>>
>>
>> I did a quick mark mail search since I sometimes lose emails to spam filters
>> and noticed that back in Feb you mentioned that your Evergreen customer has
>> been KCLS.  I know that at the conference they talked about setting up a
>> public repo that would be available right after the conference.  Maybe they
>> can chime in on an update on that?
>>
>>
>>
>>
>>
>> On Fri, Aug 9, 2013 at 11:52 AM, Rogan Hamby <rogan.hamby at yclibrary.net>
>> wrote:
>>
>> HI Josh,
>>
>>
>>
>> Can you share with folks some more specifics?
>>
>>
>>
>> For example:
>>
>>
>>
>> In regards to optimizing the conf file can you share what kind of
>> optimizations and the benchmarks?  E.g. with X records we see Y performance
>> in activity Z.
>>
>>
>>
>> A lot of other changes obviously touch on changes to code and/or schema
>> changes.  Are these going to be released on a public repo or fed back into
>> master?
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> On Thu, Aug 8, 2013 at 2:01 PM, Joshua D. Drake <jd at commandprompt.com>
>> wrote:
>>
>>
>> On 08/07/2013 10:12 AM, Rogan Hamby wrote:
>>
>> I'm guessing maybe Joshua doesn't keep track of the list serv but is
>> there someone else from Command Prompt or whomever they did the
>> development work for that could chime in?  When he says they've made
>> improvements do those include GPLed code?
>>
>>
>>
>> Sorry folks, I do watch this list but not as much as the postgresql lists.
>> We have also been very busy. Here are some of the basic things we have done:
>>
>> 1. Optimized the postgresql.conf, it is amazing how much you can get from
>> some minor tweaks after some performance analysis.
>>
>> 2. Converted some of the procedures to C, for example translate_isbn1013
>>
>> 3. Modified the holds process to use a look up table.
>>
>> 4. Changed the process for holds so they don't indefinitely exist but get
>> migrated out for reporting but does not affect performance of the active
>> table.
>>
>> 5. Partitioning of larger tables
>>
>> 6. Upgraded versions of PostgreSQL to more modern versions (this can also
>> result in noticeable gains in performance).
>>
>> 7. Lots of query tuning, adding indexes where appropriate, increasing
>> maintenance on particular tables to reduce bloat more aggressively etc...
>>
>> As well as various other things (stabilizing the system so there isn't weird
>> overloads, unexpected apache load events etc..). It certainly has been a
>> rather wild ride over the last 9 months as we get further and further into
>> the adventure that is the Evergreen software.
>>
>> Sincerely,
>>
>> Joshua D. Drake
>>
>>
>>
>>
>> --
>> Command Prompt, Inc. - http://www.commandprompt.com/  509-416-6579
>>
>>
>> PostgreSQL Support, Training, Professional Services and Development
>>
>> High Availability, Oracle Conversion, Postgres-XC, @cmdpromptinc
>> For my dreams of your image that blossoms
>>    a rose in the deeps of my heart. - W.B. Yeats
>>
>>
>>
>>
>>
>> --
>>
>>
>>
>> Rogan Hamby, MLS, CCNP, MIA
>>
>> Managers Headquarters Library and Reference Services,
>>
>> York County Library System
>>
>>
>>
>> "You can never get a cup of tea large enough or a book long enough to suit
>> me."
>> -- C.S. Lewis
>>
>>
>>
>>
>>
>> --
>>
>>
>>
>> Rogan Hamby, MLS, CCNP, MIA
>>
>> Managers Headquarters Library and Reference Services,
>>
>> York County Library System
>>
>>
>>
>> "You can never get a cup of tea large enough or a book long enough to suit
>> me."
>> -- C.S. Lewis
>>
>>
>>
>>
>>
>> --
>>
>>
>>
>> Rogan Hamby, MLS, CCNP, MIA
>>
>> Managers Headquarters Library and Reference Services,
>>
>> York County Library System
>>
>>
>>
>> "You can never get a cup of tea large enough or a book long enough to suit
>> me."
>> -- C.S. Lewis


More information about the Open-ils-general mailing list