[OPEN-ILS-DEV] Serials Schema Proposal - Further De-emphasis of MARC as Record Format

Dan Wells dbw2 at calvin.edu
Mon May 24 11:03:12 EDT 2010


Hello,

I will try to be briefer than usual, as I really want to get some initial reactions before I go to far in trying to think this out.  Basically, when I began working on serials, I kept with the idea that a MFHD record would be central to both predictions and holdings.  Being the opaque and complex format that it is, this adds some fairly serious complexity and overhead to serials management (predicting, receiving, claiming, discarding, and displaying).  What we really want is to maintain full compatibility with the MFHD standard without the overhead of maintaining a full MFHD record.  The simplest way to do this is to keep MFHD field data in some stringified form (where appropriate), but to associate this data directly with the element it represents.  This is precisely what we have done with the issuance table, as it has fields for storing the enumeration/chronology statement (863-5) which the issuance represents.  We have also parted out the textual holdings (866-8) to be directly associated with the appropriate holding library via the distribution table (and friends), and location and item information are handled by the item and unit tables.

At this point we really need to ask, what's left?  The answer: not a whole lot.  Chief among the survivors are the caption and pattern statements (853-5).  I am proposing we get those out as well.  We will continue using the MFHD data, but it will be stored where it is most relevant, not as a central record (which we could always regenerate for export purposes).  We will need at least one new table, something like:

serial.caption_and_pattern (
    id		SERIAL	PRIMARY KEY,
    base		INT	NOT NULL REFERENCES serial.base (id) ON DELETE CASCADE DEFERRABLE INITIALLY DEFERRED,
    type		TEXT	NOT NULL CHECK (type IN ('basic','supplement','index')),
    active		BOOL	NOT NULL DEFAULT FALSE,
    marc		TEXT	NOT NULL,
    enum_1	TEXT	DEFAULT NULL,
    enum_2	TEXT	DEFAULT NULL,
...
);

The first five columns are essential, but I can see benefits from separate columns for each level of enumeration and chronology captions as well.  I also propose (as evidenced by column 2) that we create a new root table called 'serial.base' (or something similar). What do we gain by doing these things?

1) In the most recent schema, holdings data is linked to its matching caption/pattern via subfield 8s in various fields 'hidden' within the marc column of serial.record_entry.  While it is therefore well-defined, it is also invisible to the DB layer.  A separate caption_and_pattern table will allow the holdings data to reference its caption/pattern data directly.
2) If we proceed (as planned) to create columns in both caption_and_pattern and serial.issuance to hold the non-repeatable enumeration and chronology fields, we can reliably query for and display a single issuance or group of issuances without consulting the MARC at all.
3) We can easily augment places where the MFHD standard is weak.  One very obvious place is its failure to identify whether a pattern should be considered active or not, a problem this table easily rectifies.
4) The only 'serial' table currently in production use is serial.record_entry.  Rather than altering or repurposing it, we can allow it to hang on as-is for those libraries (like us, Laurentian, and perhaps others) which have invested in it (we have around 1,650 of these records in use).  Doing so will allow us to phase in the new system on our own terms.

Well, this is getting longish, so I'm going to stop there and hopefully get the first reactions I am seeking.  Thank you very much for your time and thoughts.

Sincerely,
Dan Wells


-- 
*********************************************************************************
Daniel Wells, Library Programmer Analyst dbw2 at calvin.edu
Hekman Library at Calvin College
616.526.7133




More information about the Open-ils-dev mailing list