[OPEN-ILS-DEV] Authority control enhancements in the works

Mike Rylander mrylander at gmail.com
Mon Jun 28 00:02:22 EDT 2010


On Sun, Jun 27, 2010 at 5:14 PM, Dan Scott <dan at coffeecode.net> wrote:
> On 25 June 2010 20:08, Mike Rylander <mrylander at gmail.com> wrote:
>> On Fri, Jun 25, 2010 at 12:07 AM, Dan Scott <dan at coffeecode.net> wrote:
>>> Hello:
>>>
>>> In early July, I plan to spend two weeks working with the team at the
>>> International Institute of Social History (IISH) - you may have seen
>>> Repke and Marjolein on various Evergreen mailing lists. One of our goals
>>> is to come out of our time together with some additional functionality,
>>> particularly in the areas of authority control (yay!). Our other,
>>> arguably primary goal, is to share as much knowledge as I can with the
>>> team at IISH and help cultivate more development talent locally at IISH
>>> and, by extension, in the general Evergreen community (double yay!).
>>>
>>> For the proposed authority enhancements that we plan to work towards,
>>> I've created a Launchpad Blueprint at
>>> https://blueprints.launchpad.net/evergreen/+spec/respect-my-authorities ; you'll find the meatier details of the proposed enhancements at http://evergreen-ils.org/dokuwiki/doku.php?id=dev:proposal:authorities
>>>
>>> We welcome your thoughts, suggestions, warnings, and if you have
>>> full-fledged examples including sample authority and bib records to
>>> illustrate your concerns or ideas, those would be fabulous.
>>>
>>>
>>
>> First, thanks both to Dan and the IISH team!
>>
>> I've updated the wiki page (
>> http://open-ils.org/dokuwiki/doku.php?id=dev:proposal:authorities )
>> with some information about implementation that I've been working on
>> over the last couple days.  This is all backend infrastructure, and
>> should not effect the overall implementation, but thoughts and
>> concerns are welcome.
>>
>
> Hi Mike:
>
> Thanks very much for the updates to the proposal and for the
> implementation work you've already committed! You've got to leave us
> something to do, eh? :)
>
> However, I find part of the following update a bit confusing:
>
> """
> Further implementation thought – we can use the ON UPDATE OR INSERT OR
> DELETE trigger, which now exists in trunk for optional update
> propagation (see below), to overwrites the 035$a with the id, preceded
> by a value stored in an OU setting (or defaulting to, say, “EVRGRN”)
> as the agency code, surrounded by parens. IMO  (miker) this should be
> unconditional, as the 035 is enough and it would be best to leave the
> 001 alone. This would also allow us to simply drop the arn_value and
> arn_source columns from authority.record_entry, which would be good
> all around.
> """
>
> I don't know why you think "it would be best to leave the 001 alone".
> We've started to discuss this in the past, but never finished the
> discussion... maybe we can hash it out this time? I'll do my best to
> represent my position.
>
> As I understand it, when a record is imported or created by a given
> institution, it shifts the existing 001 into a 035 (if that 035
> doesn't already exist) and replaces the 001 with its own value. The
> 035 is a repeatable field
> (http://www.loc.gov/marc/authority/ad035.html), whereas the 001 is
> non-repeatable (http://www.loc.gov/marc/authority/ad001.html).
>
> So it makes more sense, to me, to update the 001 with the purely
> numeric ID - as otherwise, there may be a number of authority record
> 035s for a given controlled bibliographic field to point to, but we
> could be guaranteed to be able to create links that point to the
> correct record if we point at the 001-synced-with-record-ID. We still
> need to create the $0 subfield values with the agency source
> identifier + authority record system control number for the controlled
> field, but with all of the authority records stored in a single system
> we would have to add another layer of abstraction (identify a given
> authority record by one of a possible number of 035s - just one more
> map table, I know, but I don't see what that layer of abstraction buys
> us other than more complexity). From
> http://www.loc.gov/marc/bibliographic/ecbdcntf.html:

Most systems I've seen authority data from do not change the 001 on
import.  However, most I've seen also usually just pull in data from
LoC (or similar large sources), which generally have a mixed
alpha-numeric control number scheme (thus avoiding numeric
collisions).  At least for publics, there's a good reason for not
stomping the 001 -- they can dump the records and send them off to
MARChive or the like for batch upgrade based on the original source
(and source identifier) and overlay them based on the 001.

I missed the repeatability of the 035, or perhaps just tried to block
it out.  The fact that the standard says "match the $0 against, er,
one of some unbounded set of 035a values per authority record" seems
to me to be just silly.

I contend that what we need, regardless of what MARC says (and
regardless of what else should happen to a record at create/import
time), in order to make this work properly, is a field that we can
control completely (IOW, isn't used for some other well-established
process like batch upgrade), that is non-repeatable and (well, as a
subpoint to the first) that we can guarantee is unique in a given
instance.  We have a precedent for something like that -- the 901c in
the bib record.

So, if the 001 or 035 can fit those criteria, then I'm fine with
either ... but it doesn't sound like either can.

>
> """
> $0 - Authority record control number
> System control number of the related authority record preceded by the
> MARC code, enclosed in parentheses, for the agency to which the
> control number applies. See Organization Code Sources for a listing of
> sources used in MARC 21 records.
>
> 100     1#$aBach, Johann Sebastian.$4aut$0(DE-101c)31000889
> """
>
> Similarly, when dealing with the linking from MFHD 004 to the
> associated bibliographic records, we'll want to link to the bib's 001,
> as the intro to MFHD specifies "Control Number", not "System Control
> Number" (http://www.loc.gov/marc/holdings/hdintro.html):
>

Of course this assumes that the 001s are unique.  Evergreen doesn't
stomp the 001 for incoming records (but does try to use it as the
external TCN) because in practice many, if not most, institutions (in
the US, anyway) want to maintain the original 001 for things like OCLC
holdings updates and MARChive batch upgrades, where the original
source id is important.  This is the main reason we shove the 901 into
bib records at export time -- maintaining the original 001 is more
important to external system integration than is following the MARC
standard.  I'd rather that weren't the case, but pragmatism and all...

> """
> Separate holdings records - A separate holdings record is linked to
> the related MARC bibliographic record by field 004 (Control Number for
> Related Bibliographic Record).
> """
>
> Well, to muddy things somewhat, the docs for MFHD's 004 do switch
> maddeningly between "bibliographic record control number" and "system
> control number" at http://www.loc.gov/marc/holdings/hd004.html, but it
> seems clear to me - although I might be nuts - that the intention is
> for the non-repeatable 001 to be a unique identifier within a given
> system that can be used for linking within that system - and that only
> when those records are imported by another system does the 001 get
> shifted to the new field.
>

But that also assumes that the 001 is unique and in practice (not just
in Evergreen, but here too because of the data we must ingest and
external processes we must support) it's not.  Again ... 901. :(

> If it's not meant to be used as the unique identifier within a given
> system, what is the 001 possibly useful for?

External integration with our OCLC overlords... (snarky, yes, but only
half joking).

> Having the 001 synced
> with the record ID (whether authority, bibliographic, or holding
> record) within the system makes our lives all a heck of a lot easier,
> I think,

I agree with this statement in principle, however ...

> and I don't understand what the downside would be. (This is
> where somebody unsheathes their Stick +5 of Cluefulness and enlightens
> me!)
>

I can't speak with the authority of a +5 cluebat, but I can say that
we can't go stomping the 001 in bibs.  The solution (granted,
restricted to bibs so far) has been to impose our authority (heh) over
the 901 and use that for shoving a unique identifier into the record.
We've only done this at export time so far.

Therefore ... a modest proposal?

 * Forcibly maintain the 901c of /all/ MARC types, via triggers.
 * Forcibly maintain an 035a on authority records which includes the
agency code (ou setting with default) and the id (aka, 901c).  This
let's us maintain a semblance of MARC-level spec compliance.
 * When linking authority records via $0, use the 901c to generate the
$0 -- but we have a matching 035a in the authority so all looks well
at the MARC level
 * When linking MFHD records via 004, copy the 001 from the bib, but
we have a field on serial.record_entry that holds the internal id (aka
bib 901c) of the bib

So, we maintain the veneer of MARC-level linking, but acknowledge that
it's not possible in practice and use internal ids for the real work.

Eh?

-- 
Mike Rylander
 | VP, Research and Design
 | Equinox Software, Inc. / The Evergreen Experts
 | phone:  1-877-OPEN-ILS (673-6457)
 | email:  miker at esilibrary.com
 | web:  http://www.esilibrary.com


More information about the Open-ils-dev mailing list