[OPEN-ILS-DEV] Keeping bib / auth / MFHD / record identifiers in sync

Dan Scott dan at coffeecode.net
Thu Nov 19 06:46:26 EST 2009


One of the topics that came up during a discussion with a potential
adopter of Evergreen was the apparent lack of concern about keeping the
001 / 004 fields in the MARC records in the databases in sync with the
internal system ID (or TCN value, in the case of bibs).

For many of us, this hasn't an issue because it doesn't affect our
users.  However, for this institution, it's a pretty serious issue
because they are a supplier of authority records. We also determined
that there are cases in Conifer where libraries had practices that
relied on being able to identify records by their TCN or record ID
(there was an earlier discussion about TCN values and the 001 at
http://groups.google.ca/group/conifer-discuss/browse_thread/thread/d0d6528e5a92f781/5d10bb4f93323c6f?#5d10bb4f93323c6f)

(Aside for those contemplating their own migrations: Most of the TCN
conflicts in Conifer arose because we merged two sets of bib records
that were numbered sequentially starting at 1; and while I adjusted the
record IDs for one set of records by adding 1000000 to their IDs, I
didn't adjust the 001 correspondingly... and hilarity ensued. Learn from
my mistakes!)

My seat-of-the-pants suggestion was that we could add a database
INSERT/UPDATE trigger to ensure that the MARC record was always kept in
sync with its assigned record ID and/or TCN value. I also opined that,
in our case at least, we would be happy to always have the TCN value set
to match the record ID (it would, at least on the face of it, avoid
"conflicting TCN" issues).

Of course, suggesting something as a potential solution and actually
implementing that solution are different things. Putting one toe in the
water, I created and tested a trigger for keeping the MFHD records' 001
in sync with their record ID:

http://evergreen-ils.org/dokuwiki/doku.php?id=scratchpad:random_magic_spells#sync_the_001_field_of_your_serials_records_to_the_linked_bibliographic_record_id

Based on a handful of tests with our data, it worked! Yay.

So now, a couple of questions and thoughts for the more
experienced/world-weary on this topic:

1a) Is there general interest in having such a thing (setting the 001
for authority records, bib records, and MFHD records to the system
record ID as stored in the database) as an option in Evergreen? My
suspicion is a strong "yes".

1b) Should this be a default behaviour, or an optional piece of database
schema that sites would need to apply separately? My suspicion is that
caution would lean towards making it optional. Perhaps 1a and 1b are
really questions for open-ils-general...

1c) If optional, should it be packaged in the Open-ILS/src/sql/Pg/
directory (and a corresponding option added to eg_db_config.pl), or
should it be a less integrated ILS-Contrib thing?

2) My test implementation used regexp_replace() to replace the contents
of the 001 field with the record ID. That's a bit brittle, though; for
example, a record may not have a 001. Would it make more sense to have
the trigger call a pl/perl function that uses MARC::Record and
MARC::File::XML to manipulate the record? It would seem to be a more
robust approach, but I worry a tiny bit about the performance impact of
a pl/perl approach. Not having done any benchmarking, though, perhaps
this isn't a real concern.

3) Are there any obvious technical flaws in the record-ID-as-TCN-value
approach? The TCN doesn't seem to be used much internally in Evergreen,
other than as a limited means of trying to prevent duplicate bib
records. I could see this as being a separate option for the database
schema, again, as sites that are entirely dependent on OCLC for their
TCN values probably don't want the record-ID-as-TCN-value approch.

4) If I do implement this, I assume a corresponding nicety would be to
make marc2?re.pl modify incoming records in the same way, so that the
triggers could be dropped for large imports.

5) Are there alternate implementation approaches to consider? 

* It could be added to Ingest.pm instead as part of the ingest methods,
which would keep all of the pertinent code in one place, and possibly
allow us to modify the behaviour based on actor.org_unit_settings
(although records aren't owned by any given org unit). Direct
modifications to the database wouldn't automatically result in the
corresponding changes to the records, though, and I have a gut feeling
that these sorts of options are better implemented across the entire
database rather than at a library-by-library level.

* There might be a more natural place to do this as part of in-database
ingest; I'm not sure how far Mike is planning on taking in-database
ingest in the near future. I could always implement this as a set of
optional triggers and then it could get rolled into a future in-database
implementation.

It would probably take less time to implement all of this than to write
up this email, but I'm interested in your feedback and thoughts. I'm
feeling rather sleep-deprived so I need to depend on the Evergreen
collective brain rather than trusting that I've been able to foresee all
of the possible consequences of adopting this approach :)

Dan





More information about the Open-ils-dev mailing list