[OPEN-ILS-DEV] Large Bibliographic Imports
Brandon W. Uhlman
brandon.uhlman at bclibrary.ca
Wed Aug 6 16:30:40 EDT 2008
I have about 960 000 bibliographic records I need to import into an
Evergreen system. The database server is dual quad-core Xeons with
24GB of RAM.
Currently, I've split the bibliographic records into 8 batches of
~120K records each, did the marc_bre/direct_ingest/parellel_pg_loader
dance, but one of those files has been chugging along in psql now for
more than 16 hours. How long should I expect these files to take?
Would more smaller files load more quickly in terms of total time for
the same full recordset?
I notice that the insert into metabib.full_rec seems to be taking by
far the longest. It does have more records than any of the other
pieces to import, but the time taken still seems disproportionate.
I notice that metabib.full_rec has this trigger --
zzz_update_materialized_simple_record_tgr AFTER INSERT OR DELETE OR
UPDATE ON metabib.full_rec FOR EACH ROW EXECUTE PROCEDURE
reporter.simple_rec_sync().
Is the COPY INTO calling this trigger every time I copy in a new
record? If so, can I remove to trigger to defer this update, and do it
en masse afterward? Would it be quicker?
Just looking for any tips I can use to increase the loading speed of
huge-ish datasets.
Cheers,
Brandon
======================================
Brandon W. Uhlman, Systems Consultant
Public Library Services Branch
Ministry of Education
Government of British Columbia
850-605 Robson Street
Vancouver, BC V6B 5J3
Phone: (604) 660-2972
E-mail: brandon.uhlman at gov.bc.ca
brandon.uhlman at bclibrary.ca
More information about the Open-ils-dev
mailing list