[OPEN-ILS-DEV] marc_export and Perl Version >= 5.20

Mike Rylander mrylander at gmail.com
Wed Apr 19 10:36:35 EDT 2017


We could use a cursor to keep the data on the server side and fetch it
a record at a time.  Here's an attempt:

http://git.evergreen-ils.org/?p=working/Evergreen.git;a=shortlog;h=refs/heads/user/miker/marc_export_by_cursor

Thoughts?

--
Mike Rylander
 | President
 | Equinox Open Library Initiative
 | phone:  1-877-OPEN-ILS (673-6457)
 | email:  miker at equinoxinitiative.org
 | web:  http://equinoxinitiative.org


On Wed, Apr 19, 2017 at 10:01 AM, Jason Stephenson <jason at sigio.com> wrote:
> Even more investigation reveals that I was jumping to conclusions.
>
> Turns out that on a Debian 7 Wheezy machine with Perl 5.14, a simple
> script to dump the MARC for all of our 2.7 million, non-deleted bib
> records uses almost 12GB of RAM. Looks like this was always an issue,
> and I was just running the scripts on hardware with more RAM.
>
> I should still build a VM with a more recent Debian or Ubuntu release
> with enough RAM for comparison.
>
> Sorry for all of the noise.
>
> On 04/05/2017 10:30 AM, Jason Stephenson wrote:
>> More investigation points the finger at Perl DBI and/or DBD::Pg. They
>> are now apparently caching all of the results before returning anything.
>> My reading of the documentation seems to imply that this was always the
>> case, however I was able to dump 2.7 million records with Perl 5.14 on a
>> server with 8GB of RAM without running out of memory. With Perl 5.18+
>> this appears to no longer be possible, YMMV.
>>
>> It looks like a comprehensive fix could be found in teaching DBD::Pg to
>> use row caching:
>>
>> https://rt.cpan.org/Public/Bug/Display.html?id=93266
>>
>>
>>
>> On 03/10/2017 11:11 AM, Jason Stephenson wrote:
>>> Hi, all.
>>>
>>> NOTE: This is not https://bugs.launchpad.net/evergreen/+bug/1671845. It
>>> may be related or have a similar cause, but the experience/symptoms are
>>> completely different.
>>>
>>> At this point, consider this a head's up, as well as a problem
>>> description that I don't yet think I have enough information to file as
>>> a bug report. It is also a request for anyone who wants to double check
>>> this report and to help with debugging.
>>>
>>> I've noticed some bizarre behavior with DBI, MARC::Record, and writing
>>> to a file with Perl version 5.20 and 5.22. (These versions of Perl ship
>>> with Debian 8 Jessie and Ubuntu 16.04 Xenial Xerus, respectively.)
>>>
>>> I have a script (https://github.com/Dyrcona/boopsie) that I use to make
>>> a weekly extract of records to send to Boopsie, Inc. on behalf of our
>>> member libraries that use their app.
>>>
>>> What I have seen is that when run with the aforementioned versions of
>>> Perl, the program consumes all of the RAM on the server and gets killed
>>> by OOM killer. No output ever reaches the file. This suggests to me that
>>> the problem occurs in the main loop with extract and converting the
>>> MARCXML from the database, though it could be the Perl output buffer run
>>> amok.
>>>
>>> The main loop of my program is similar, though less complicated, than
>>> that of marc_export. I tried marc_export to see if it would have the
>>> same problem. When extracting my whole database, it does:
>>>
>>> marc_export -a -e UTF-8 > all.mrc
>>>
>>> It also crashes if fed the output of an equivalent psql query to extract
>>> all of the record ids, or if a file of all record ids is piped into
>>> marc_export. It makes no difference if the output format is USMARC or
>>> MARCXML.
>>>
>>> I can split this up into batches of 50,000 or so records (quite possibly
>>> more) and all is well. I figured this out by dumping records for a
>>> branch with around 51,000 items and that worked. My whole database has
>>> just over 2.7 million, non-deleted bib records.
>>>
>>> This worked on Perl version 5.14 on Debian 7 Wheezy and on Ubuntu 14.04
>>> Trusty Tahr.
>>>
>>> I hope to run marc_export with the Perl debugger to figure out the exact
>>> cause. Until this is fixed, I'm using a work around in my scripts of
>>> dumping MARCXML batches and converting them to USMARC and putting them
>>> into 1 file with yaz-marcdump. This seems to work in light of the Lp bug
>>> mentioned in the NOTE.
>>>
>>> Any and all information, contradictory or otherwise, from those using
>>> Debian 8 or Ubuntu 16.04 is most welcome.
>>>
>>> Jason
>>>


More information about the Open-ils-dev mailing list