[OPEN-ILS-GENERAL] How to have multiple CPU's used, importing a million bib records in EG 2.0?

Repke de Vries repke at xs4all.nl
Mon May 2 15:40:30 EDT 2011


Hi,

though we worked out our combination of calling marc2bre followed by pg_loader followed by psql as successive, separate steps  - we are dead in the water 'cause PosgreSQL alone (last step) takes four hours for 100K records: meaning 40 hours for all of our one million bib records.

Also does this approach only use one CPU while we have 4 or even 10 x 4 with our CentOS for the moment sitting on top of ten 4 CPU machines.  Memory has plenty of GB.

** What  can we do better ?  ** Need to get the 40+ hours down. 

The million bib records are available as one big chunk and as 8 smaller chunks. 

I studied parallel_pg_loader as alternative, assuming that "parallel" means spewing out such SQL files for pgsql that PostgreSQL starts working parallel processes and thus using all those extra CPU's automatically. 

But I can't a) find out how to call parallel_pg_loader to instruct PostgreSQL to work on my 8 smaller chunks simultaneously b) it seems to be designed to solve a different problem: smaller sizes of working memory (and we have plenty).

Would connecting the steps with UNIX pipes and feeding it the big chunk of one million records, do it?
So: marc2sre [our calling parameters] [input = the one million bib records] | pg_loader [our calling parameters] | psql

We had that "Unix pipes" advice a couple of times but it seems counter-intuitive: isn't the net result still one large file that goes into PostgreSQL and therefore using one single instead of multiple CPU's ?

Those experienced: please assist !  Colleagues of mine have been shopping with this practical problem at last week's EG conference (yeah !) but there was so much else too. Excellent conference I heard. 

Thanks, IISH - Amsterdam, Repke 


More information about the Open-ils-general mailing list