[OPEN-ILS-DEV] Bugs in marc2bre.pl?

Mike Rylander mrylander at gmail.com
Fri Jun 27 16:44:36 EDT 2008


On Fri, Jun 27, 2008 at 4:12 PM, Dan Wells <dbw2 at calvin.edu> wrote:
> Hello all,
>
> I have been playing around with record loads of various shapes and sizes over the last week or two, and have come to the conclusion that marc2bre.pl is a bit discombobulated in its current state.  It boils down to a consistent state of confusion within the code between the record id (i.e. database id) and the record title control number.  I believe these should generally not be the same thing, and I would say about half of the code agrees with me :)  In particular, the idfield argument seems be the source of most of my problems.  I believe it was originally meant to be a way to specify tcns, not database record ids, but since tcns are often alphanumeric, the regular expression which strips out any non-digits flies in the face of this.  The end result is that there is a bunch of code, particularly in the preprocess subroutine that is supposed to check and intelligently set the tcn but which never gets run under normal circumstances (short of an odd dontuse_file setting).  From what I can tell, there is therefore no good way to get a file out the other end with sane tcns (unless yours happen to be all digits).
>

I'm not at a computer where I can look now, and I'm not sure which
branch you're looking at, but here's what is intended:

idfield is, in fact, meant to specify the field (subfield a) from
which to extract the database id.  More on that later.

If there is no available tcn value (as defined in preprocess()) then
the record id will be used.  There could very well have been
short-circut logic introduced into the trunk of svn that causes the
idfield value to be used, but that is not the intention.  I'll look at
it when I get back to my computer.

The purpose of the dontuse parameter (which could certainly use a
better name) is to inform marc2bre of existing TCN values already in
use in the database, for instance when you are loading new records
into an existing implementation.  That lets it look for alternate TCNs
when there is a collision.

> I have created a new version which hopefully untangles most of this.  I left in the idfield setting for setting the record database id (though I am not sure how useful this actually is) and added tcnfield and tcnsubfield settings which honor common tcn formats and use the preprocess code properly in case of duplicates.  It is currently being tested, but before I post any version of it.  I am wondering if am completely nuts about all of this.
>

You're not nuts, and being able to specify a tcn field (and subfield)
is a great addition!

As for the usefulness of idfield, the point there is to maintain (with
a potenial offset supplied by the adjustid parameter) the identifier
that a legacy system uses to address the record, where applicable, at
migration time.  Most ILSs (Evergreen included) use an internal
identifier, because TCN is too human-supplied to be trustworthy as a
unique identifier (and forcing a user to change the TCN to make it
unique seems ... bad).  Item, hold and other records will usually use
this internal identifier to point at a bib record, and moving the old
id (space-shifted by adjustid) is much easier than trying to stich
things back together using some other means ... impossible in some
cases, in fact.

In any case, please do post your new version (or you can send it to me
directly if you'd prefer) and I'll go over it as soon as I can.  Any
improvment and cleanup of marc2bre is a good thing, as it's a critical
component.

Thanks Dan!

-- 
Mike Rylander
 | VP, Research and Design
 | Equinox Software, Inc. / The Evergreen Experts
 | phone: 1-877-OPEN-ILS (673-6457)
 | email: miker at esilibrary.com
 | web: http://www.esilibrary.com


More information about the Open-ils-dev mailing list