[OPEN-ILS-DEV] ***SPAM*** ***SPAM*** script to break up big marc load to avoid cataloging conflict

Melissa Belvadi mbelvadi at upei.ca
Wed Apr 7 12:23:28 EDT 2010


Hi,

We're faced with wanting to reload a set of about 55,000 ebook records
on a probably quarterly basis. I've got the whole source thing figured
out to handle the problems of adding the new/removing the old records,
but am trying to avoid the ritual of having to ask cataloging and
reserves to stop adding bib records for multi-hour stretches while this
loads (because of the bib record id conflict problem), and we can't run
it overnight because of conflicts with other EG things that run
overnight.

My first question is whether Vandeley, like the command-line perl/sql
based method for batch loading, also requires the bib-add freeze. If it
doesn't require the freeze, and Vandeley can handle a set this large,
then please let me know as the rest of this message becomes moot.

So I'm thinking of trying to write a series of perl/bash scripts to
break the ebook marc file into multiple files, with just 1 record per
file, then go through the entire command-line sequence of batch load
steps (beginning each time with identifying the current max bib id) for
each file separately, thereby reducing the odds of a conflict with other
staff to near nil (and if it happens, we'd be able to quickly re-load
the one record that got rejected because it's already by itself in a
single file).

When I described this to Grant Johnson here at UPEI, he expressed
concern that so many open/close commands (on files and sql sessions) so
fast could run into OS problems like memory leaks or such.

I'm also not sure what limits there are on the EG server as to how many
files you can have in a single directory (eg I'd be creating 55,000
individual .mrc files) and if I'd need to generate subdirectories at the
time I broke the main marc up.  I'd really like to have a solution not
just for this particular set but one that I can use with any other large
sets that come along in the future, although I doubt we'd have anything
that exceeded about 200K records at a time.

So what do you all think? Is there a better way to handle this
situation? Will I crash the server trying my one-at-a-time script idea?
Or has someone else in fact already done this?

Thanks!

Melissa




---
Melissa Belvadi
Emerging Technologies & Metadata Librarian
University of Prince Edward Island
mbelvadi at upei.ca
902-566-0581 




More information about the Open-ils-dev mailing list