[OPEN-ILS-DEV] Large Bibliographic Imports

Brandon W. Uhlman brandon.uhlman at bclibrary.ca
Wed Aug 6 17:54:21 EDT 2008


Thanks, Dan (and also Mike). Great tip!

I think documenting this is a good piece, for sure. Is there any  
reason we also wouldn't want to include it in the default SQL  
generated by pg_loader/parallel_pg_loader?

If we're concerned about it automatically being called without  
checking the data, we could include it as a comment in  
pg_loader_output.sql, just we currently do the commit, as a visual  
reminder.

~B

Quoting Dan Scott <denials at gmail.com>:

> Hey Brandon:
>
> The full text indexes are absolutely the key - check out this thread
> from July 2nd:   
> http://list.georgialibraries.org/pipermail/open-ils-dev/2008-July/003265.html
> - I think it addresses your questions for the most part.
>
> And yeah, as Mike notes, we really should document that in the
> appropriate section of the wiki. Especially as I'm about to embark on
> a refresh of our several-million records :0
>
> Dan
>
> 2008/8/6 Brandon W. Uhlman <brandon.uhlman at bclibrary.ca>:
>> I have about 960 000 bibliographic records I need to import into an
>> Evergreen system. The database server is dual quad-core Xeons with 24GB of
>> RAM.
>>
>> Currently, I've split the bibliographic records into 8 batches of ~120K
>> records each, did the marc_bre/direct_ingest/parellel_pg_loader dance, but
>> one of those files has been chugging along in psql now for more than 16
>> hours. How long should I expect these files to take? Would more smaller
>> files load more quickly in terms of total time for the same full recordset?
>>
>> I notice that the insert into metabib.full_rec seems to be taking by far the
>> longest. It does have more records than any of the other pieces to import,
>> but the time taken still seems disproportionate.
>>
>> I notice that metabib.full_rec has this trigger --
>> zzz_update_materialized_simple_record_tgr AFTER INSERT OR DELETE OR UPDATE
>> ON metabib.full_rec FOR EACH ROW EXECUTE PROCEDURE
>> reporter.simple_rec_sync().
>> Is the COPY INTO calling this trigger every time I copy in a new record? If
>> so, can I remove to trigger to defer this update, and do it en masse
>> afterward? Would it be quicker?
>>
>> Just looking for any tips I can use to increase the loading speed of
>> huge-ish datasets.
>>
>> Cheers,
>>
>> Brandon
>>
>> ======================================
>> Brandon W. Uhlman, Systems Consultant
>> Public Library Services Branch
>> Ministry of Education
>> Government of British Columbia
>> 850-605 Robson Street
>> Vancouver, BC  V6B 5J3
>>
>> Phone: (604) 660-2972
>> E-mail: brandon.uhlman at gov.bc.ca
>>        brandon.uhlman at bclibrary.ca
>>
>>
>
>
>
> --
> Dan Scott
> Laurentian University
>



======================================
Brandon W. Uhlman, Systems Consultant
Public Library Services Branch
Ministry of Education
Government of British Columbia
605 Robson Street, 5th Floor
Vancouver, BC  V6B 5J3

Phone: (604) 660-2972
E-mail: brandon.uhlman at gov.bc.ca
         brandon.uhlman at bclibrary.ca



More information about the Open-ils-dev mailing list