[OPEN-ILS-DEV] Local/Collation problems during import

Jason Stephenson jstephenson at mvlc.org
Thu Jul 21 12:01:16 EDT 2011


Quoting Chris Roosendaal <christian.roosendaal at gmail.com>:

[Snip]

> First of all we've started import on the testing server and production
> server at the same time to compare the performance.
> The postgresql configuration and the source MARC XML data are the same
> in both cases (it's important, I guess).

You'll want to run pgtune on both servers to get maximum performance.  
The production server should NOT be configured the same as the testing  
server, or it will not perform at full capacity.

[Snip]

The times you reported are not unusual for a series load on your  
testing server. Given that your production server isn't optimized, I  
wouldn't expect it to perform much better than testing. However, it  
taking 3 days longer just doesn't seem right, unless it really is  
running on a base configuration.

[Snip]

> Can it be true that import performance can be decreased to 2-3 times
> with LOCALE settings, as tsearch2 needs to find the character with
> diacritics and replace it by the same character without?

Yes. You really want to use C collation and C locale for best  
performance. The createdb statement in the README, is your best bet.


Something else you might want to do are set the following  
config.internal_flag entries to true while loading records:

ingest.metarecord_mapping.skip_on_insert
ingest.disable_authority_linking
ingest.assume_inserts_only

Remember to set them back to FALSE after the load. You'll also need to  
run the quick_metarecord SQL script after the load.

If you can split your files up further into batches of about 10,000  
each and load them in parallel with a number of loaders running equal  
to number of cores on the database server -1 , you will likely load  
them more quickly.

I was able to load 900 000 bibs in 7 hours on a server with 8 cores  
and 24GB of RAM using batches of 10 000 records, 7 load threads, C  
collations and the above settings. NB: This was NOT using  
parallel_pg_loader script from extras. It was using a custom bib  
loader that would need a lot of work to be made generically useful to  
others.

HtH,
Jason Stephenson
Merrimack Vally Library Consortium




More information about the Open-ils-dev mailing list