[OPEN-ILS-DEV] marc_export and Perl Version >= 5.20
Jason Stephenson
jason at sigio.com
Wed Apr 5 10:30:33 EDT 2017
More investigation points the finger at Perl DBI and/or DBD::Pg. They
are now apparently caching all of the results before returning anything.
My reading of the documentation seems to imply that this was always the
case, however I was able to dump 2.7 million records with Perl 5.14 on a
server with 8GB of RAM without running out of memory. With Perl 5.18+
this appears to no longer be possible, YMMV.
It looks like a comprehensive fix could be found in teaching DBD::Pg to
use row caching:
https://rt.cpan.org/Public/Bug/Display.html?id=93266
On 03/10/2017 11:11 AM, Jason Stephenson wrote:
> Hi, all.
>
> NOTE: This is not https://bugs.launchpad.net/evergreen/+bug/1671845. It
> may be related or have a similar cause, but the experience/symptoms are
> completely different.
>
> At this point, consider this a head's up, as well as a problem
> description that I don't yet think I have enough information to file as
> a bug report. It is also a request for anyone who wants to double check
> this report and to help with debugging.
>
> I've noticed some bizarre behavior with DBI, MARC::Record, and writing
> to a file with Perl version 5.20 and 5.22. (These versions of Perl ship
> with Debian 8 Jessie and Ubuntu 16.04 Xenial Xerus, respectively.)
>
> I have a script (https://github.com/Dyrcona/boopsie) that I use to make
> a weekly extract of records to send to Boopsie, Inc. on behalf of our
> member libraries that use their app.
>
> What I have seen is that when run with the aforementioned versions of
> Perl, the program consumes all of the RAM on the server and gets killed
> by OOM killer. No output ever reaches the file. This suggests to me that
> the problem occurs in the main loop with extract and converting the
> MARCXML from the database, though it could be the Perl output buffer run
> amok.
>
> The main loop of my program is similar, though less complicated, than
> that of marc_export. I tried marc_export to see if it would have the
> same problem. When extracting my whole database, it does:
>
> marc_export -a -e UTF-8 > all.mrc
>
> It also crashes if fed the output of an equivalent psql query to extract
> all of the record ids, or if a file of all record ids is piped into
> marc_export. It makes no difference if the output format is USMARC or
> MARCXML.
>
> I can split this up into batches of 50,000 or so records (quite possibly
> more) and all is well. I figured this out by dumping records for a
> branch with around 51,000 items and that worked. My whole database has
> just over 2.7 million, non-deleted bib records.
>
> This worked on Perl version 5.14 on Debian 7 Wheezy and on Ubuntu 14.04
> Trusty Tahr.
>
> I hope to run marc_export with the Perl debugger to figure out the exact
> cause. Until this is fixed, I'm using a work around in my scripts of
> dumping MARCXML batches and converting them to USMARC and putting them
> into 1 file with yaz-marcdump. This seems to work in light of the Lp bug
> mentioned in the NOTE.
>
> Any and all information, contradictory or otherwise, from those using
> Debian 8 or Ubuntu 16.04 is most welcome.
>
> Jason
>
More information about the Open-ils-dev
mailing list