[OPEN-ILS-GENERAL] How to have multiple CPU's used, importing a million bib records in EG 2.0?
Repke de Vries
repke at xs4all.nl
Mon May 2 15:40:30 EDT 2011
Hi,
though we worked out our combination of calling marc2bre followed by pg_loader followed by psql as successive, separate steps - we are dead in the water 'cause PosgreSQL alone (last step) takes four hours for 100K records: meaning 40 hours for all of our one million bib records.
Also does this approach only use one CPU while we have 4 or even 10 x 4 with our CentOS for the moment sitting on top of ten 4 CPU machines. Memory has plenty of GB.
** What can we do better ? ** Need to get the 40+ hours down.
The million bib records are available as one big chunk and as 8 smaller chunks.
I studied parallel_pg_loader as alternative, assuming that "parallel" means spewing out such SQL files for pgsql that PostgreSQL starts working parallel processes and thus using all those extra CPU's automatically.
But I can't a) find out how to call parallel_pg_loader to instruct PostgreSQL to work on my 8 smaller chunks simultaneously b) it seems to be designed to solve a different problem: smaller sizes of working memory (and we have plenty).
Would connecting the steps with UNIX pipes and feeding it the big chunk of one million records, do it?
So: marc2sre [our calling parameters] [input = the one million bib records] | pg_loader [our calling parameters] | psql
We had that "Unix pipes" advice a couple of times but it seems counter-intuitive: isn't the net result still one large file that goes into PostgreSQL and therefore using one single instead of multiple CPU's ?
Those experienced: please assist ! Colleagues of mine have been shopping with this practical problem at last week's EG conference (yeah !) but there was so much else too. Excellent conference I heard.
Thanks, IISH - Amsterdam, Repke
More information about the Open-ils-general
mailing list