[OPEN-ILS-DEV] ***SPAM*** Re: ***SPAM*** ***SPAM*** Re: ***SPAM*** Re: Serials Schema Proposal - Further De-emphasis of MARC as Record Format

Mike Rylander mrylander at gmail.com
Thu May 27 09:26:39 EDT 2010


On Wed, May 26, 2010 at 6:12 PM, Dan Wells <dbw2 at calvin.edu> wrote:
> Hello Scott,
>
> First, sorry for the conflation of suggested changes in my original email.  It is really two separate proposals:
>
> 1) add the serial.caption_and_pattern table, as outlined
> 2) create a new table (serial.serial (aka serial.base)) in the schema to function in place of serial.record_entry going forward (retaining serial.record_entry for legacy use only).
>
> The logic for (1) was fairly well stated (I think).  The logic for (2) is:
>  a. moving the caption/pattern fields out of the 'marc' column (aka MFHD record) in serial.record_entry doesn't leave enough of value in the 'marc' column, so the column should be dropped or at least nullable
>   b. it doesn't make much sense to have a null 'marc' value in a table named 'record_entry' (implying a MARC record entry, e.g. biblio.record_entry), so the table should be renamed (e.g. serial.serial)
>     c. if we are both repurposing and renaming the table, it makes sense to keep serial.record_entry around untouched for legacy use (i.e. the libraries who have already loaded MFHD records using the current basic functionality).  serial.serial will be largely the same, but with NO MARC AT ALL :)
>

Why deprecate one MARC storage table to add another with the exact
same semantics?

Since we have full (well, full MFHD) records in serial.record_entry, I
don't see why we can't simply derive the data we will store in
serial.caption_and_pattern directly from the MFHD already in
serial.record_entry.  We leave SRE as is, with an non-null marc field,
let catalogers have their way with the data in there for legacy
display purposes (or for those records that they do not want
predicted/projected/controlled by the new serials code, or both) and
only pay attention to the fields that become rows in
serial.caption_and_pattern.

I'll run through an example ... tell me if it's reasonable:

0) pretend that serial.caption_and_pattern (SCAP) has /no/ marc field,
but instead looks like:

serial.caption_and_pattern (
  id               SERIAL  PRIMARY KEY,
  mfhd           INT     NOT NULL REFERENCES serial.record_entry (id)
DEFERRABLE INITIALLY DEFERRED,
  type            TEXT    NOT NULL CHECK (type IN
('basic','supplement','index')),
  active          BOOL    NOT NULL DEFAULT FALSE,
  pat_string    TEXT    NOT NULL,
   enum_1      TEXT    DEFAULT NULL,
   enum_2      TEXT    DEFAULT NULL,
...
   cap_1        TEXT    DEFAULT NULL,
   cap_2        TEXT    DEFAULT NULL,
...

);

1) at migration, tons of MFHD records (with, at least, textual
holdings data, so /something/ will display) are pushed into
serial.record_entry (SRE).  One of them is the fictional Journal of
Herbology (JOH), which we will follow.

2) a process (maybe a script, maybe a human, doesn't matter) creates a
subscription for JOH (along with distributions, streams, etc).  Then
the system pulls out the pat/enum/cap data from the SRE row for JOH,
and builds a row in SCAP that points at the SRE record used.  This can
be done using a pl/perl stored procedure that uses MARC::Record and
friends.  If there is no SCAP-related data in the record, or not
enough, then we stop at this point.  However, if there is SCAP data,
the system goes on to create issuances and predict items for the
duration of the subscription.  "Receiving" the predicted items is left
as an exercise for the reader, but I think an automatic process could
be built after v1.

3) profit!

4) if the SRE is updated such that the data recorded in the SCAP
changes (simple trigger, can use the same proc as above to extract and
compare) then we create a new row in SCAP, mark the old as inactive,
the new as active, and re-predict items not yet received. (Or not,
perhaps, because we don't know exactly when the new pattern data
should be made active, the cataloger should be able to say "re-predict
everything from X on".)


This would allow the MFHD that catalogers know and understand remain
central for the purposes that the serials code cares about, allow
automated tracking to be optional (just don't create a subscription),
retain the ability to display cataloger-supplied holdings fields in
SRE, and phase in the new stuff as practical on a local basis.

But more important, IMO, it reduces duplication of data and removes
and essentially orphaned table (SRE).

Thoughts?

--miker

> So...
>
>> As I understand it, the marc in serial.caption_and_pattern would *not*
>> be a copy of the marc in serial.record_entry, but a subset of it, or
>> somehow derived from a subset of it.  Is that right?
>
> Yes, correct.  Just a stringified version of the directly related 85X field would be kept here.
>
>> Your proposed serial.caption_and_pattern table contains columns named
>> enum_1, enum_2, and apparently (judging from the ellipsis) a series of
>> enum_[0-9]* columns.  On its face, that doesn't look very normalized to
>> me.  Is there a firm, well behaved limit on the number of enum columns?
>> Is that a reflection of how MFHD records work (of which I am supremely
>> ignorant)?  Or would it make sense to add a child table to hold the
>> enums?
>
> Yes, there is a firm limit, sorry for being lazy.  The table will have 6 enum fields and 5 chron fields.  That is the extent of the standard and certainly a reasonable one.
>
>> The name "caption_and_pattern" bothers me a bit too -- not because the
>> name itself matters much, but because it suggests, or induces, a bit of
>> muddlement.  Does a row in this table contain a caption *and* a pattern?
>> Or maybe one or the other?  Or maybe both?  Or neither?  How do we store
>> a caption differently from a pattern -- in different enums?
>
> The caption parts are distinct and knowable.  The pattern parts are much more fluid.  I think it is reasonable to model the caption parts directly, but the pattern parts will be stored in blob form (i.e. the 'marc' column).  As for keeping them in one table, there are three related reasons.  One, they are in the same field in the MFHD standard (maybe not the best reason, but a convenient one).  Two, a pattern only makes sense in the context of the caption (e.g. we have 4 "No." per "V."), so it makes sense to edit and store them together.  Three, due to reason two, captions and patterns will always exist in a one-to-one relationship; storing them together makes that clear.
>
>> "Serial.serial" does have a certain alliterative appeal -- like "Sirhan
>> Sirhan", or "Boutros Boutros-Ghali."
>
> This might have been a joke, but I'll gladly give 'serial.serial' a +1.
>
>> We could also consider creating a whole new "ser" schema, with a
>> "ser.serial" table.  Anybody using the old "serial" schema could keep
>> it around without interference until they're ready to blow it away, or
>> forever if they want.
>
> The *only* table being used in the current serial schema is record_entry.  Keeping our new tables in 'serial' and letting 'record_entry' stick around as legacy shouldn't cause too much confusion, IMO.
>
> Ultimately, as Mike suggested in IRC, this is really not a critical change by any means.  I am currently coding with this setup in mind, but reverting/revising later won't be the end of the world by any means.
>
> Thanks again for all the help,
> Dan
>
>
>
>
>
> --
> *********************************************************************************
> Daniel Wells, Library Programmer Analyst dbw2 at calvin.edu
> Hekman Library at Calvin College
> 616.526.7133
>
>
>>>> On 5/26/2010 at 11:18 AM, Scott McKellar <mck9 at swbell.net> wrote:
>> --- On Mon, 5/24/10, Dan Wells <dbw2 at calvin.edu> wrote:
>>
>>> >>> On 5/24/2010 at 12:25 PM, Scott McKellar
>>> <mck9 at swbell.net>
>>
>> <snip>
>>
>>> 1-3.  Sorry for not being more clear, but I think
>>> serial.base (or whatever it is called) will be a direct
>>> replacement for serial.record_entry everywhere it is used in
>>> the new schema.  So for starters it will have all the
>>> fields in record_entry minus 'marc'.  We might consider
>>> going without the 'edit' related fields as well, since there
>>> won't be much to edit there anymore.
>>
>> I believe Mike Rylander has proposed leaving serial.record_entry in
>> place, to serve the same role as your serial.base.  The new
>> serial.caption_and_pattern table would then be a child of
>> serial.record_entry.  We might want to make the marc column nullable
>> in serial.record_entry.
>>
>
>>
>
>>
>>
>> Maybe a different table name is in order, e.g. record_entry_detail.
>>
>> <snip>
>>
>>> > 5. Can we come up with a better name than
>>> "serial.base"?  It's too
>>> > vague.  It could represent sodium hydroxide, the
>>> std::basic_string
>>> > class, or Wright-Patterson Air Force Base.  Maybe
>>> "serial.periodical"?
>>> >
>>>
>>> 5.  I agree that serial.base seems too generic.
>>> Honestly it should probably be called 'serial.serial', as
>>> some would argue that the term 'periodical' doesn't
>>> technically include newspapers (and probably a few other
>>> minor things).  Again, however, I am willing to be more
>>> pragmatic than technical if people are opposed to a
>>> 'serial.serial' table.
>>
>>
>> Scott McKellar
>



-- 
Mike Rylander
 | VP, Research and Design
 | Equinox Software, Inc. / The Evergreen Experts
 | phone:  1-877-OPEN-ILS (673-6457)
 | email:  miker at esilibrary.com
 | web:  http://www.esilibrary.com


More information about the Open-ils-dev mailing list