[OPEN-ILS-DEV] Local/Collation problems during import
Jason Stephenson
jstephenson at mvlc.org
Thu Jul 21 12:01:16 EDT 2011
Quoting Chris Roosendaal <christian.roosendaal at gmail.com>:
[Snip]
> First of all we've started import on the testing server and production
> server at the same time to compare the performance.
> The postgresql configuration and the source MARC XML data are the same
> in both cases (it's important, I guess).
You'll want to run pgtune on both servers to get maximum performance.
The production server should NOT be configured the same as the testing
server, or it will not perform at full capacity.
[Snip]
The times you reported are not unusual for a series load on your
testing server. Given that your production server isn't optimized, I
wouldn't expect it to perform much better than testing. However, it
taking 3 days longer just doesn't seem right, unless it really is
running on a base configuration.
[Snip]
> Can it be true that import performance can be decreased to 2-3 times
> with LOCALE settings, as tsearch2 needs to find the character with
> diacritics and replace it by the same character without?
Yes. You really want to use C collation and C locale for best
performance. The createdb statement in the README, is your best bet.
Something else you might want to do are set the following
config.internal_flag entries to true while loading records:
ingest.metarecord_mapping.skip_on_insert
ingest.disable_authority_linking
ingest.assume_inserts_only
Remember to set them back to FALSE after the load. You'll also need to
run the quick_metarecord SQL script after the load.
If you can split your files up further into batches of about 10,000
each and load them in parallel with a number of loaders running equal
to number of cores on the database server -1 , you will likely load
them more quickly.
I was able to load 900 000 bibs in 7 hours on a server with 8 cores
and 24GB of RAM using batches of 10 000 records, 7 load threads, C
collations and the above settings. NB: This was NOT using
parallel_pg_loader script from extras. It was using a custom bib
loader that would need a lot of work to be made generically useful to
others.
HtH,
Jason Stephenson
Merrimack Vally Library Consortium
More information about the Open-ils-dev
mailing list