[OPEN-ILS-GENERAL] How to have multiple CPU's used, importing a million bib records in EG 2.0?

Dan Scott dan at coffeecode.net
Tue May 3 14:14:21 EDT 2011


On Tue, May 3, 2011 at 1:39 PM, Repke de Vries <repke at xs4all.nl> wrote:
> Hi Dan
>
> eight piped processes are running fine right now - as per your suggestion.
>
> However: on tying together with Unix pipe:
>
>> So there's no delay when you pipe the
>> commands; the import into the database begins immediately.
>
>
> the evidence is somewhat circumstantial but our observation is that importing does *not* begin immediately and pg_loader seems to be the culprit:  it does *not* pass on to psql but keeps piling up 'till marc2bre has finished and only then starts feeding the database through psql.

Heh, looking at the source for pg_loader.pl (should have done that
before instead of relying on my memory) it's not just circumstantial;
it does indeed wait until marc2bre has finished chewing through the
MARC record before generating output. My mistake, and my apologies for
that.

> Given CPU activity on the database server, only after that the stream of COPY statements starts getting the data imported in the db.  //Times eight because of running in parallel.  For the moment we don't fine tune any of the flags you mention.//

Well, the "times eight" is the good part. You can run the 8
marc2bre.pl | pg_loader.pl processes in parallel, so you're saving
significant time on that front, and then as the SQL can be processed
by 8 cores in parallel you should save significant time on that front
as well.


More information about the Open-ils-general mailing list