[OPEN-ILS-DEV] Large Bibliographic Imports

Thu Aug 7 10:33:40 EDT 2008

Excellent suggestion, Brandon! So it has been implemented in
parallel_pg_loader and pg_loader.

2008/8/6 Brandon W. Uhlman <brandon.uhlman at bclibrary.ca>:
> Thanks, Dan (and also Mike). Great tip!
>
> I think documenting this is a good piece, for sure. Is there any reason we
> also wouldn't want to include it in the default SQL generated by
> pg_loader/parallel_pg_loader?
>
> If we're concerned about it automatically being called without checking the
> data, we could include it as a comment in pg_loader_output.sql, just we
> currently do the commit, as a visual reminder.
>
> ~B
>
> Quoting Dan Scott <denials at gmail.com>:
>
>> Hey Brandon:
>>
>> The full text indexes are absolutely the key - check out this thread
>> from July 2nd:
>>  http://list.georgialibraries.org/pipermail/open-ils-dev/2008-July/003265.html
>> - I think it addresses your questions for the most part.
>>
>> And yeah, as Mike notes, we really should document that in the
>> appropriate section of the wiki. Especially as I'm about to embark on
>> a refresh of our several-million records :0
>>
>> Dan
>>
>> 2008/8/6 Brandon W. Uhlman <brandon.uhlman at bclibrary.ca>:
>>>
>>> I have about 960 000 bibliographic records I need to import into an
>>> Evergreen system. The database server is dual quad-core Xeons with 24GB
>>> of
>>> RAM.
>>>
>>> Currently, I've split the bibliographic records into 8 batches of ~120K
>>> records each, did the marc_bre/direct_ingest/parellel_pg_loader dance,
>>> but
>>> one of those files has been chugging along in psql now for more than 16
>>> hours. How long should I expect these files to take? Would more smaller
>>> files load more quickly in terms of total time for the same full
>>> recordset?
>>>
>>> I notice that the insert into metabib.full_rec seems to be taking by far
>>> the
>>> longest. It does have more records than any of the other pieces to
>>> import,
>>> but the time taken still seems disproportionate.
>>>
>>> I notice that metabib.full_rec has this trigger --
>>> zzz_update_materialized_simple_record_tgr AFTER INSERT OR DELETE OR
>>> UPDATE
>>> ON metabib.full_rec FOR EACH ROW EXECUTE PROCEDURE
>>> reporter.simple_rec_sync().
>>> Is the COPY INTO calling this trigger every time I copy in a new record?
>>> If
>>> so, can I remove to trigger to defer this update, and do it en masse
>>> afterward? Would it be quicker?
>>>
>>> Just looking for any tips I can use to increase the loading speed of
>>> huge-ish datasets.
>>>
>>> Cheers,
>>>
>>> Brandon
>>>
>>> ======================================
>>> Brandon W. Uhlman, Systems Consultant
>>> Public Library Services Branch
>>> Ministry of Education
>>> Government of British Columbia
>>> 850-605 Robson Street
>>> Vancouver, BC  V6B 5J3
>>>
>>> Phone: (604) 660-2972
>>> E-mail: brandon.uhlman at gov.bc.ca
>>>       brandon.uhlman at bclibrary.ca
>>>
>>>
>>
>>
>>
>> --
>> Dan Scott
>> Laurentian University
>>
>
>
>
> ======================================
> Brandon W. Uhlman, Systems Consultant
> Public Library Services Branch
> Ministry of Education
> Government of British Columbia
> 605 Robson Street, 5th Floor
> Vancouver, BC  V6B 5J3
>
> Phone: (604) 660-2972
> E-mail: brandon.uhlman at gov.bc.ca
>        brandon.uhlman at bclibrary.ca
>
>

-- 
Dan Scott
Laurentian University