[OPEN-ILS-GENERAL] How to have multiple CPU's used, importing a million bib records in EG 2.0?
Dan Scott
dan at coffeecode.net
Tue May 3 14:14:21 EDT 2011
On Tue, May 3, 2011 at 1:39 PM, Repke de Vries <repke at xs4all.nl> wrote:
> Hi Dan
>
> eight piped processes are running fine right now - as per your suggestion.
>
> However: on tying together with Unix pipe:
>
>> So there's no delay when you pipe the
>> commands; the import into the database begins immediately.
>
>
> the evidence is somewhat circumstantial but our observation is that importing does *not* begin immediately and pg_loader seems to be the culprit: it does *not* pass on to psql but keeps piling up 'till marc2bre has finished and only then starts feeding the database through psql.
Heh, looking at the source for pg_loader.pl (should have done that
before instead of relying on my memory) it's not just circumstantial;
it does indeed wait until marc2bre has finished chewing through the
MARC record before generating output. My mistake, and my apologies for
that.
> Given CPU activity on the database server, only after that the stream of COPY statements starts getting the data imported in the db. //Times eight because of running in parallel. For the moment we don't fine tune any of the flags you mention.//
Well, the "times eight" is the good part. You can run the 8
marc2bre.pl | pg_loader.pl processes in parallel, so you're saving
significant time on that front, and then as the SQL can be processed
by 8 cores in parallel you should save significant time on that front
as well.
More information about the Open-ils-general
mailing list