[OPEN-ILS-DEV] Serials Schema Proposal - Further De-emphasis of MARC as Record Format

Mike Rylander mrylander at gmail.com
Thu May 27 15:36:23 EDT 2010


On Thu, May 27, 2010 at 2:27 PM, Dan Wells <dbw2 at calvin.edu> wrote:
> Hello Mike,
>
> Thanks for the detailed reply.  It helps to know we are still not on the same page :)
>
> First, I want to be clear about what the serial.caption_and_pattern table actually contains (in my mind).  The confusion seems to have been brought about by my hastily naming a column 'marc' in my original email despite the fact that it doesn't contain any actual MARC, just most of the data which the field would contain.  I am renaming it 'pattern_code' for clarity.  Here is a pseudo-db representation of an example:
>
> row-of-serial.caption_and_pattern [
>  id: 1
>  type: basic
>  active: t
>  enum_1: 'v.'
>  enum_2: 'no.'
>  enum_3: 'pt.'
>  enum_4: null
>  enum_5: null
>  enum_6: null
>  chron_1: '(year)'
>  chron_2: '(month)'
>  chron_3: '(day)'
>  chron_4: null
>  chron_5: null
>  pattern_code: "['2','0','a','v.','b','no.','u','12','v','r','c','pt.','u','3','i','(year)','j','(month)','k','(day)','w','j']"
> ]
>
> I have some good ideas of how to create an editor for such rows, and it won't be a straight MARC editor.  We could also trim down some minor redundancy in 'pattern_code', but I don't think it's worth the trouble on either end.

OK, so just an terminology mismatch so far. That's pretty much what I
was thinking too (though we'd want to use JSON in the pattern_code
column, IMO, which has different quoting semantics from your example
... but that's immaterial to the example).

>
> We all understand that reliance on MARC formatted data in general is frail, but in my experience working through this, here are some more specific reasons to get away from it here:
>
> 1. Once a caption/pattern is created, it should be *immutable*.  Changing it would redefine the meaning of any attached holding statements, and we don't want that.  If it actually changes, we always need to keep the old and create a new one.

Agreed.  That's what I was saying about having a trigger pull the
caption/pattern/enum data out, when changed, and create a /new/ active
row in serial.caption_and_pattern.  So, I think we're still on the
same page.

> 2. (I should have answered this earlier) While pretty rarely seen (in my experience), it is possible for a serial to have two or more active patterns of the same type.  Most common will be serials which receive several different but regular supplements, such as a perhaps 'buyers guide' every December and maybe a 'trends' issue every June.  Even for the 'basic' type, one example I have seen is of a serial titled something like 'Oceanography' which might have issues subtitled 'Animal Life' in odd months and those subtitled 'Plant Life' in even months, and these might be held or bound separately.  Each unit type gets an active caption/pattern.  The entire MFHD standard (even the pattern portion) is designed around describing a snapshot of 'what we have' (the pattern exists for compression and expansion, not really prediction) rather than 'what will come', so it simply doesn't consider this problem.
> 3. MFHD is reliant upon 'link ids' to make sense of itself internally.  Duplicate link ids or deleting a field with a linked id will cause obvious trouble.

OK ... would this mean that a single MFHD would have two patterns, or
there would be two MFHD records?  If the former, the l can see the
duplicate/deleted link id problem, but IMO that's also easy to guard
against by refusing to store an MFHD that is not self-consistent (and
we can check for that in obvious ways).

As for having two active patterns, it sounds like we just need to
1) include the link value in the serial.caption_and_pattern row
(recall, we can enforce correctness on MFHD at insert/update time)
2) use that link value when looking for the pattern to deactivate
3) assume a link value of 1 (or 0, if 0-based counting, dunno) when
there's only one.

But just so I'm sure I'm understanding your plan ...

>
> Based on these observations (and probably more I am not thinking of), I think keeping an ongoing link to a fully editable MFHD record for titles under 'serial control' is inviting disaster unnecessarily.  We could keep pointing to serial.record_entry for the other (non-MARC) DB fields, but I think we gain a lot of clarity, safety, and convenience if we deprecate it, allowing it to continue as an independant legacy/stop-gap solution.  This will become more important as the serial.serial table develops in ways not yet foreseen.
>

Well, OK, I can understand that.  Now, let's look at it from another direction.

If we say "the MARC stored in SRE is completely non-authoritative",
I'm cool with that.  But we still need something that sits at the top
of the "tree" and links through to a bib record.  Ignoring the fact
that SRE contains MARC, it also fills those two roles.  And, there's
the benefit that we /could/ generate patterns in batch from that data.
 But think of that as a one-time conversion, not the "normal" process.
 Also, the legacy/stop-gap is restricted to a single column instead of
a whole table.

If we make the MARC field nullable (so that we don't require a real
MFHD to do our work) would you be OK with using SRE as the top of the
tree and link to BRE, and using your pattern editor to populate SCAP?

> I hope this helps clarify my intentions with these new tables.  The overall goal is to be rid of storing MARC internally for serials under 'control', and certainly avoiding any direct editing of data at the MARC level.  Is this reasonable?  I think so!
>

It does help.  I like the goal of no MARC, or at least non-authoritative MARC.

And I'm sorry if I sounded snippy in my previous email, it wasn't intended.

-- 
Mike Rylander
 | VP, Research and Design
 | Equinox Software, Inc. / The Evergreen Experts
 | phone:  1-877-OPEN-ILS (673-6457)
 | email:  miker at esilibrary.com
 | web:  http://www.esilibrary.com


More information about the Open-ils-dev mailing list