[open-ils-commits] r20013 - branches/rel_2_1/Open-ILS/src/extras/import (dbs)

svn at svn.open-ils.org svn at svn.open-ils.org
Thu Apr 7 00:47:57 EDT 2011


Author: dbs
Date: 2011-04-07 00:47:54 -0400 (Thu, 07 Apr 2011)
New Revision: 20013

Modified:
   branches/rel_2_1/Open-ILS/src/extras/import/marc2sre.pl.in
Log:
Enable marc2sre.pl to run reasonably fast with a large set of bibs

Our previous iteration of marc2sre.pl used an ILIKE stanza
beginning with a wildcard to match system control numbers
without having to specify the institution's MARC code.
This worked, but was painfully slow in large bib sets as
the database needed to use a bitmap index scan to find matches.

By adding a --prefix flag, the user can specify the institutional
MARC code for the set of records and we can use an exact match
against metabib.full_rec.value, which is immeasurably faster.
This is, of course, a problem if there are multiple institutional
MARC codes in use for a given set of bibliographic records.


Modified: branches/rel_2_1/Open-ILS/src/extras/import/marc2sre.pl.in
===================================================================
--- branches/rel_2_1/Open-ILS/src/extras/import/marc2sre.pl.in	2011-04-07 04:47:00 UTC (rev 20012)
+++ branches/rel_2_1/Open-ILS/src/extras/import/marc2sre.pl.in	2011-04-07 04:47:54 UTC (rev 20013)
@@ -8,6 +8,7 @@
 use OpenILS::Application::AppUtils;
 use OpenILS::Event;
 use OpenILS::Utils::Fieldmapper;
+use OpenILS::Utils::Normalize qw/naco_normalize/;
 use OpenSRF::Utils::JSON;
 use Unicode::Normalize;
 
@@ -21,7 +22,7 @@
 MARC::Charset->ignore_errors(1);
 
 # Command line options, with applicable defaults
-my ($idsubfield, $bibfield, $bibsubfield, @files, $libmap, $quiet, $help);
+my ($idsubfield, $prefix, $bibfield, $bibsubfield, @files, $libmap, $quiet, $help);
 my $idfield = '004';
 my $count = 1;
 my $user = 'admin';
@@ -31,6 +32,7 @@
 my $parse_options = GetOptions(
     'idfield=s' => \$idfield,
     'idsubfield=s' => \$idsubfield,
+    'prefix=s'=> \$prefix,
     'bibfield=s'=> \$bibfield,
     'bibsubfield=s'=> \$bibsubfield,
     'startid=i'=> \$count,
@@ -192,16 +194,20 @@
     return ($result, $evt);
 }
 
-# Get the biblio.record_entry.id value for the given identifier; note that this
-# approach uses a wildcard to match anything that precedes the identifier value
+# Get the biblio.record_entry.id value for the given identifier
 sub map_id_to_bib {
     my $record = shift;
 
     my ($result, $evt);
 
+    $record = naco_normalize($record);
+    if ($prefix) {
+        $record = "$prefix $record";
+    }
+
     my %search = (
         tag => $bibfield, 
-        value => { ilike => '%' . $record }
+        value => naco_normalize($record)
     );
 
     if ($bibsubfield) {
@@ -256,6 +262,12 @@
 bibliographic record is found. This option is ignored unless it is accompanied
 by the B<--idfield> option.  Defaults to null.
 
+=item * B<-p> I<prefix> B<--prefix>=I<prefix>
+
+Specifies the MARC code for the organization that should be prefixed to the
+bibliographic record identifier. This option is ignored unless it is accompanied
+by the B<--bibfield> option. Defaults to null.
+
 =item * B<--bibfield> I<MARC-field>
 
 Specifies the field in the bibliographic record that holds the identifier
@@ -301,12 +313,28 @@
 
 =head1 EXAMPLES
 
-    marc2sre.pl --idfield 004 --bibfield 035 --bibsubfield a --user cat1 serial_holding.xml
+    marc2sre.pl --user admin --marctype XML --libmap library.map --file serial_holding.xml 
 
+Processes MFHD records in the B<serial_holding.xml> file as a MARC21XML file,
+using the default 004 control field for the source of the bibliographic record
+ID and converting the ID to a plain integer for matching directly against the
+B<biblio.record_entry.id> column. The file B<library.map> contains the mappings
+of library names to integers, and the "admin" user will own the processed MFHD
+records.
+
+    marc2sre.pl --idfield 004 --prefix ocolc --bibfield 035 --bibsubfield a --user cat1 serial_holding.mrc
+
+B<WARNING>: The B<--bibfield> / B<--bibsubfield> options require one database
+lookup per MFHD record and will greatly slow down your import. Avoid if at all
+possible.
+
 Processes MFHD records in the B<serial_holding.xml> file. The script pulls the
 bibliographic record identifier from the 004 control field of the MFHD record
 and searches for a matching value in the bibliographic record in data field
-035, subfield a. The "cat1" user will own the processed MFHD records.
+035, subfield a.  The prefix "ocolc" will be prepended to the bibliographic
+record identifier to provide exact matchings against the
+B<metabib.full_rec.value> column.  The "cat1" user will own the processed MFHD
+records.
 
 =head1 AUTHOR
 



More information about the open-ils-commits mailing list