[OPEN-ILS-DEV] Unexpectedly dying of authority_control_fields script when parallel processing

Dan Scott dan at coffeecode.net
Fri May 27 09:07:47 EDT 2011


Hi Chris:

On Thu, May 26, 2011 at 4:55 AM, Chris Roosendaal
<christian.roosendaal at gmail.com> wrote:
> Dear all,
>
> I would like to bring up this subject again because we will try to run the
> authority_control_fields.pl script on our database another time this week.
> We divided the whole process in eight batches, and since every batch takes
> between 8 and 10 hours to complete, we would definitely like a parallel
> approach. Unfortunately this approach did not succeed earlier as can be read
> in this email thread.
>
> I would like to know: do the problems we encounter have to do with our
> Evergreen setup? We have a machine to run the scripts on, which is a
> relatively slow dual core machine, connected to a separate fast machine with
> eight cores. Could the scripts run into trouble because the CPU is too slow
> to run 8 authority_control_fields.pl processes in parallel? Or maybe a lack
> of memory?

The bulk of the processing needs to be done by the database server, so
you're probably not running into a restriction on your dual core
machine. If you were running into processing capacity problems on your
database server, I would expect the PostgreSQL logs to hold some
useful information (you might want to look at those if you haven't
already).

However, there is a slow memory leak in the script that. If you run it
over hundreds of thousands of records, it can use up gigabytes of
memory on the client machine. Running 8 batches of 130K records
simultaneously could end up eating a large amount of memory as the
scripts get near the end of the batch, so it's definitely worthwhile
keeping an eye on the memory consumption of your scripts and, if
necessary, split the batches up so that each batch runs against fewer
records. You can still run 8 copies of the script at one time; the
benefit of running it against a smaller set of records is that when
each batch finishes, the memory is freed and the leak starts at 0
bytes on the next run of the script.

So, if you were running 8 batches of 250,000 records each and hitting
memory capacity issues, you could run 40 batches of 50,000 records
each. That would be 8 processes in parallel, with 5 batches per
process in serial, like:

parallel_process_1$ authority_control_fields.pl --start 0 --end 50000
; authority_control_fields.pl --start 50001 --end 100000; ... --start
200000 --end 250000
...
parallel_process_8$ authority_control_fields.pl --start 1800000 --end
1850000 ; ... --start 195000 --end 2000000

(Obviously, include all of your options, logging, etc...) Make sense?


More information about the Open-ils-dev mailing list