[OPEN-ILS-DEV] More on marc2bre.pl

Andrews, Mark J. MarkAndrews at creighton.edu
Tue Sep 30 12:31:05 EDT 2008


Picking up on Dan Well's note about revising marc2bre, I've been playing
with an older VMWare image of Evergreen (v1.2.1.3 or something like
that).  Why?  'Cause this image was built with 20 GB of (virtual) disk
space, plenty to handle a copy of Creighton's bibs (669,000) and
associated items (1.1 million).  There are other, newer VMWare images,
but they don't have so much space.

 

Lots of space give me room to run scripts against, say, a 1 GB input
file and get ginormous (1, 2, 4, 5, 7 or more GB) output files out the
other end.  

 

My problem at the moment is (I'm guessing the file name, but you'll know
what I mean) pg_loader_bre.ql (or something like that) contains
duplicate "bre" records.  The target table in PostgreSQL has at least
one column set to "no dups," so the import fails.  Dan Scott suggested I
grep around the duplicate records.  That's always an option, but I
reasoned it'd be quicker to create a clean export file from the source
system, and then have that clean file to process on the target side.

 

I found a way to tell the export program on the source side to put an
integer into the tag and subfield of my choice.  This integer value
simply numbers the bib records in the output file from first to last.
That way I have a guaranteed (sic?) unique ID number in the source file.
However, I discovered on import that there is still some other field
declared as unique, which causes PostgreSQL to do what it does, and stop
the import when it finds a duplicate key.  Hmmm, what to do?

 

I suggest the processing script (somehow) identify duplicate records,
write them to an exception file, and skip to the next record.  This is
potentially difficult because the import scripts, several *.sql files,
contain related records to *.bre records.  So a duplicate *.bre record
would be skipped, along with any related records in other files.  I
wonder how to do this?

 

Mark

Mark Andrews, MLS, Systems Librarian

Academic & e-Learning Technologies, Division of Information Technology

Creighton University, Omaha, NE  68178

402-280-3065 - markand at creighton.edu

 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://list.georgialibraries.org/pipermail/open-ils-dev/attachments/20080930/403a6459/attachment.html


More information about the Open-ils-dev mailing list