[OPEN-ILS-DEV] Monograph Parts

Fri Feb 18 11:54:19 EST 2011

I think we're probably closer in practice than our philosophical
viewpoints would suggest -- namely, that you're further down the line
from a theoretical standpoint than I am. [Read on...]

On Thu, Feb 17, 2011 at 6:54 PM, Dan Wells <dbw2 at calvin.edu> wrote:
> Hello Mike,
>
> Thank you for the detailed response.  I'll *try* to keep this reply more
> brief, and see if I can highlight a few things which still concern me (and
> places where I was not clear).
>
>>> It makes good sense, but I think we could ultimately benefit by putting
> less
>>> emphasis on a bib record point of view and tilting things a bit more
> towards
>>> the item point of view.
>>
>> I don't see why items (particularly, barcoded physical items) should
>> be the focus.  On the other hand, records are the core of a
>> bibliographic system -- everything else (items and their attributes,
>> access restrictions for electronic resources, libraries-as-places,
>> etc) is a set of filter axes to apply atop the record when searching
>> for or manipulating it.  The record is the nucleus, and everything
>> else enhances/vivifies/subdivides it.
>>
>
> I do not agree with this, at least not entirely.  Bibliographic records are
> very important, but that is in large part due to the current reality and how we
> got here.  I think we can agree that libraries exist to organize and provide
> access to content (via 'items', whether physical or digital).

Sure, but you have to find and describe the item -- the barcode
doesn't do much for you in that regard.

>  A monolithic
> record is a convenient descriptive tool, but not the only one, and in the
> future may not be the best one.  Slightly loosening the link between items and
> records may be just one small step forward.

Monolithic bib records are also what we have today.  How we break
those down (FRBR-ish entities, KEV, (dare I say it) hstore columns,
something else) will bear heavily on how we attach other data to them.
 But setting aside design concerns, there's still the immediate
problem of existing code -- we can't sever that link in the db without
opening the door to massive application-level breakage.

>
>>>  From the item perspective these proposals are modeling
>>> the same thing, a mapping of items to contents, and the fewer ways we have
>
>> to
>>> do that, the better (as long as we cover all the cases).  With a simpler
>>> mapping table of item to record(-part), we easily traverse in either
>> direction,
>>> and we have ultimate flexibility.
>>
>> But I contend that we don't traverse in either direction for a given
>> relationship, nor do we have a need for ultimate flexibility at the
>> cost of complexity.  (I'm not referring to schema complexity here, but
>> code complexity -- the need for inferences that will certainly come
>> from commingling aggregation and dis-integration.)
>>
>> The direction of traversal is critical, and should be ensconced in the
>> schema.  Not only does this make the code for each function simpler
>> (we don't have to infer a relationship, it's dictated by the fact that
>> we're using part or multi-home) but it models what libraries actually
>> do: barcode parts of a work (volumes, disks, etc); or, collect
>> manifestations of many records into a big binder (or e-reader) with
>> one barcode on the outside.
>>
>>>  So, if I have a bib record on my screen, and
>>> I ask the question, "which items' contents does this record represent?", we
>
>> can
>>> simply go record->part(s)->item(s).
>>
>> IMO, the question one would ask is, "what and where are the things
>> (nominally, barcoded physical items) that contain what I describe?"
>> ISTM that it's very important to know, and perhaps even critical for
>> efficient workflow, to list separately subsets of what a record
>> describes (parts) and "bound" items that contain the described work
>> along with others.  With a unified map there's no mechanism other than
>> a magic value (or a human hoping to interpret another human's label
>> correctly) to distinguish these concepts.  With parts and multi-home
>> separate, it's obviously a natural property.
>>
>
> ***EDIT***
> I might (finally!) understand your perspective, see paragraph near the end
> ***/EDIT***
> I think this is the point where I am missing something.  Why distinguish the
> concepts?  I think all we need to model is the concept of "contains" (copy
> contains part).  If we are dealing with a record/part, we can list the copies
> which contain it, and if we are dealing with a copy, we can list what it
> contains.  What is the source of the ambiguity?  We dis-integrate first (where
> needed), then aggregate the parts.
>

But that's not how the code works right now, which is what we must
build on today.  Just taking stock of all the parts of the system that
use the constraints of the schema to simplify the logic is a massive
undertaking, and designing a suitable solution to that is even bigger.
 The latter need not happen (from the repo's perspective) at the same
time as the former, but the former /must/ come first.  I'm not at all
afraid of improving the schema, but the attendant code changes must be
planned (or, in the least, problems identified) before the schema can
shift under the app.

I can certainly appreciate (and, in a perfect world, agree with) the
perspective of items containing what records describe, but it's not
something we can do in the DB alone.

>>> On the other hand, if I have an item, and
>>> I ask the question "what are the contents of this item?", we can go
>>> item->part(s)->record(s).  Naturally we can traverse related records (via
>>> items) and related items (via records/parts) as well.
>>
>> This is directly supported by multi-homed items, with the exception
>> that you do need to look at the call number to get the primary record.
>>  I don't see a practical drawback to this, since that's what the code
>> already does, and will still have to do as long as the record field
>> exists on asset.call_number (null-ability or elimination of which is
>> mentioned below).
>>
>>> This also eliminates the
>>> primacy of call numbers when managing items, which I see as a benefit.
>>>
>>
>> There are three problems I see here:
>>
>>   * Call numbers will always have a first-class billing, regardless of
>> how they're implemented, since they represent something physical. Two
>> things, actually: the location in a range of other items (shelf order
>> and position), and a tag pasted to the spine of the item.
>>   * I can't see any obvious benefit to eliminating the
>> record<->call_number link (mentioned directly below, and intimated
>> here)
>>   * The mounds and mounds of code that assume and depend on the
>> existence of the record->call number->copy hierarchy that will
>> instantly break
>>
>
> The immediate benefit to breaking this link is that an item (and by
> association its call number) can now fully exist in the context of any record
> which describes it, even if only in part.  We could transition by using code
> which builds the current hierarchy dynamically (that is, go
> record->copy->call_number, then attach the call number to the current record
> context).  So if item 12345 with Call Number ABC123 is linked through the
> contents map to both Record A and Record B, when viewing A we see:
>
> Record A
> --ABC123
> ----12345
>
> and of course the same with B:
>
> Record B
> --ABC123
> ----12345
>
> The item row might somehow indicate its 'special'-ness (which is going to be
> needed in some way regardless), but would be otherwise transparent.  It is also
> not strictly necessary to null the call_number.record_id value, as we can just
> as easily overwrite it temporarily as needed, and it could be a useful
> fallback.
>

I understand the long-term benefit, and in principal I agree with it,
but it's not feasible short term, and to mangle a dictum from the
world of quantum physics, anything we allow to happen in the database
[like, unlinking records from call numbers] eventually will ... and in
that case will break things spectacularly.

[Aside: I feel like I'm harping on this now ... one point to take away
is that I don't necessarily disagree with the general direction you're
pointing (details not withstanding), but that it's not save or
practical right now.]

>>> Or stated more simply, I feel our foundational assumptions in relating
> items
>>> to records should be:
>>> 1) Records describe contents
>>> 2) Items contain contents
>>> 3) Item content boundaries can overlap record content boundaries in
> various
>>> ways
>>>
>>
>> I see this as an oversimplification from the conceptual point of view
>> -- it fails to recognize that the arity of the relationship (which I
>> call direction) is important and different.  IOW, record<<->item
>> serves a completely different function from record<->>item, and
>> forcing them both through a record<<->>item relational model does both
>> a disservice.
>>
>
> I can't agree with "completely" different, and if you view the record/part as
> a sort of really expressive tag of some kind, I feel like they are not so
> different at all.
>

I can't push by brain far enough sideways to look at it like that,
being steeped in the dependent code ... and even without that I still
think the arity of the relationship is important enough (and demanded
by (what I see as) proper schema design) that it must be taken into
account at the db level.  On this point I think we'll have to agree to
disagree. :)

>>> All that said, I know from experience to trust your judgement (most of the
>>> time ;).  For my own future benefit, do you have cases already in mind
> where
>>> this flexibility would end up causing 'split-brain' logic?  (Or maybe I
> have
>> a
>>> split brain...)
>>>
>>
>> Split-brain is probably a misnomer ... we have to commingle the logic
>> for aggregation and dis-integration (disaggregation?) wherever we use
>> either.
>>
>> From a practical point of view, here are some more random-ish thoughts
>> that don't seem to fit directly into this response elsewhere ... ;)
>>
>> When going from records to items (via the Monograph Parts
>> infrastructure as described), we need to be able to name label the
>> subdivision that the part represents in relation to the record as a
>> whole -- we need to be able to say "barcode X contains only Volume 1
>> of the content described by record A".  This is not something we need
>> to do for binding in the general case (note, however, that you can
>> indeed use both at the same time -- multi-home and parts -- to get the
>> effect of "volume 1 of record A is bound with some other things").
>
> Correct me if I am wrong, but you can only do this if that "some other thing"
> is not another part.  So if, for instance, I have a set of books, each with a
> different record, and each including a 'CD supplement', I cannot create a copy
> which is a binder containing all the CD supplements.  Or, if I have a
> multi-volume work in two languages, I cannot bind the English and French V.1s
> (etc.) together.  Or if I buy a few e-book Bibles, I cannot put all the Old
> Testaments on reader 1 and all the New Testaments on reader 2.  These limits
> are a direct result of one-part-per-copy, and multi-home doesn't change that,
> does it?
>

You're correct.  The second use case is a great one (the first is too,
but I think less likely in practice). Not allowing more than one part
per copy has always been an explicit restriction, and one I've been
willing to lift given the right argument.  I think you've done that,
and I've pushed that change into the topic branch (see below).  As it
stands now, you can use a part once per copy (which seems to me to be
a valid restriction, as parts are per-record) at the DB level, and the
UI will provide choices that pertain to only one record at a time,
which effectively restricts you to one part per copy per record.  I
plan to add a check constraint to make sure that this is enforced, but
haven't yet.

>> Also, the only purpose of the record-to-item path is to dis-integrate
>> the record into constituent, separately barcoded items, so there is
>> only one relationship type.
>>
>> However, going in the other direction, from items to records (via
>> Multi-homed Items as described) we do not need a label -- what we need
>> instead is a /reason/ for the relationship.  Bound-with, e-reader,
>> etc.  IOW, there are multiple potenial causes for the relationship
>> being created.
>>
>
> With the possible exception of bilingual, it seems to me that the records
> themselves have no special relationship, but rather that the relationship only
> exists at the item level.  As such, we don't actually need a reason.  These
> labels can usefully describe the character of an item, so it makes sense to
> include them as a copy attribute if one does not wish to make a new item type.
>

A bound series would, I'd think, and that's just from a pure
bibliographic point of view.  Supporting the invention of new reasons
for local purposes means we can't know whether the relationship is
important or not, so we should assume it is.  Would you agree with
that premise?

>> Not surfacing these differences explicitly (in my case, by using
>> separate, though admittedly superficially similar mechanisms) is
>> inviting trouble down the road, IMO.
>>
>> Now (fastforwarding to your schema outline below), IIUC, what you're
>> attempting to do with the copy_type table is to have a magic value of
>> "Multi-part" inform us that the direction is from record to item, and
>> all others are the other direction.  From a bibliographic point of
>> view this is incorrect -- it's not the copy that is Multi-part, it is
>> the record.  From a normalization point of view, this is not modeling
>> reality IMO, and because it uses magic rows in a table it's brittle
>> against DML.
>>
>
> That was not my intention.  The copy_type does not need to be set at all,
> other than for convenience of labeling as I noted above.  "Multi-part" is not
> intended as magic, just a generic way to say "this item shows up on more than
> one record, but the reason why can't be neatly expressed in a label" (and maybe
> not the best choice of term at that, especially since I used the word 'part'
> (multi-record'?)).  Probably should have left it out!
>
>>> Also, I think this quote from Elaine deserves a bit more attention:
>>>
>>>>> I'm particularly interested in how this would function in a consortium
>>>>> like PINES where different libraries might process a multipart set
>>>>> differently. For example, one library might process and circulate a 3
>>>>> part DVD set as one item, where another might put each in a separate
>>>>> container with a separate barcode.
>>>
>>> If we want the complete-set copy from Library A to conclusively fulfill a
>>> P-level hold from Library B, we will want to allow multiple parts per copy.
>
>> Or
>>> am I missing something?
>>>
>>
>> You're not ... I interpreted what she was saying differently (that
>> different libraries would be /able/ to spit records along different
>> lines), and I see what you're saying.  We could allow a copy to belong
>> to multiple parts (it's a trivial change to the schema), but it would
>> be the responsibility of the cataloger with the item in hand to make
>> sure that the copy is in the appropriate parts -- not hard, except
>> that some parts may not exist yet. ;)  (And, of course, this
>> existential problem exists no matter the scheme*.)
>>
>
> I was not expecting that libraries in the same system could divvy up the
> record differently, but rather that the parts would be set globally at
> reasonable common denominator and then assembled locally as needs dictated.  I
> am certainly fine with allowing local divvying to happen, but by not even
> allowing multiple parts per copy, we are effectively forcing an immediate
> choice between local part-bundling practice and accurate resource sharing.
>

Well, neither of our schemas does that, actually.  Mine restricts you
in that it requires you to name your grouping based on reality ("disk
1-3" vs "disk 1", "disk 2", "disk 3") and yours doesn't scale well in
the UI  (imagine: "vol 1", "vol 2", ... "vol 546") nor reflect P-type
holds that in reality target a /group/ of (conceptual, not physical)
bibliographic subcomponents.

Once we see the impact of the simpler case we can look at inventing a
grouping mechanism if warranted.  As an initial implementation, on
balance, I prefer (shockingly) my scheme, since it is simpler in the
common case because it more closely models physical reality.

>> Converting from one part per copy to multiple is simple at the
>> database level, and would be nearly trivial in higher level code, but
>> until we have use in the field I think it's a solution without a
>> problem, because of the cataloging overhead of trying to keep every
>> copy current across all parts as parts are added to a bib when each
>> library adds their own subdivision scheme for the bib.  For that
>> reason I left it out explicitly.  (*It also invites the desire for a
>> "collection of parts" concept that is a much bigger, and more
>> importantly, controversial project.  That too, though, is not barred
>> from the future with the design as it stands.)
>>
>>> Finally, for those it may help, here is a quick version of a simple
>>> item-record schema.  The part concerning copy_type is optional, but I
> wanted
>> it
>>> to show a more complete replacement for the proposed tables:
>>>
>>> CREATE TABLE biblio.part (
>>>        id SERIAL PRIMARY KEY,
>>>        record BIGINT NOT NULL REFERENCES biblio.record_entry (id),
>>>        label TEXT NOT NULL,
>>>        label_sortkey TEXT NOT NULL,
>>>        CONSTRAINT record_label_unique UNIQUE (record,label)
>>> );
>>>
>>> CREATE TABLE asset.copy_contents_map (
>>>        id SERIAL PRIMARY KEY,
>>>        --record BIGINT NOT NULL REFERENCES biblio.record_entry (id),
>>> --optional path to partless items, or we force records to have at least
> one
>>> part
>>>        part INT NOT NULL REFERENCES biblio.part (id) ON DELETE CASCADE
>>>        target_copy BIGINT NOT NULL -- points to asset.copy
>>> );
>>>
>>> CREATE TABLE asset.copy_type (
>>>        id SERIAL PRIMARY KEY,
>>>        name TEXT NOT NULL UNIQUE -- i18n
>>> );
>>>
>>> INSERT INTO asset.copy_type (name) VALUES
>>>        (‘Bound Volume’),
>>>        (‘Bilingual’),
>>>        (‘Back-to-back’),
>>>        (‘Set’),
>>>        (‘Multi-part’);  --generic type
>>>
>>> -- ALSO:
>>> -- asset.copy grows a nullable reference to asset.copy_type
>>
>>
>>> -- asset.call_number.record is nullable (should be null for new-style
> copies)
>>>
>>
>> Given the codebase, that will be a large and separate project, if ever
>> undertaken, and is not something we can look at now if we want
>> anything discussed here to happen in a near-term release.  I won't
>> discount it out of hand for all time, just for this time. ;)
>>
>
> While it took me a (long) while to realize it, I think the source of our
> disagreement may be what I will call the "is-ness" factor.  Does a bib record
> tell us what an item *contains*, or does it tell us what an item *is*?  Well,
> traditionally it tries to do both, and it has always been a problem.  I am
> unwittingly assuming that describing contents matters more and more, and
> describing containers matters less and less.  Doing so makes it difficult to
> truly represent a content-less container record (like an e-book reader record),
> but if we no longer need such things (because the item already appears wherever
> the contents are described), maybe it is not such a loss.
>

This is the philosophical distance (not strictly difference) I
mentioned before.  In today's world, the description matters more, but
yes, I see that may change down the road ... I'm looking at the
implications on the existing codebase, the demand for features sooner
rather than later, and plotting what I see as the shortest path there.

> I understand that my perspective is not always (ever?) the most realistic.  My
> aim is only to try to encourage a little more pain now if it even *might* save
> us from greater pain in the future.  Since I know you are a speedy and tireless
> worker, it may be best at this point to just wait and see the code, which will
> probably illuminate for me some of the issues I don't yet see.
>

You can grab both the bib_parts and multi_home branches from
http://git.esilibrary.com/?p=evergreen-equinox.git;a=summary to see
the code right now.  If you diff against the master branch you can see
the changes as they stand today, and I believe (with the part per copy
per record change) many of the concerns you raise are addressed.

Thanks again, Dan!

> Dan
>
>> --miker
>>
>>> Dan
>>>
>>>
>>>
>>>> --miker
>>>>
>>>>> Dan
>>>>>
>>>>> --
>>>>>
>>>>
>>>
>>
> *****************************************************************************
>>>> ****
>>>>> Daniel Wells, Library Programmer Analyst dbw2 at calvin.edu
>>>>> Hekman Library at Calvin College
>>>>> 616.526.7133
>>>>>
>>>>>
>>>>>>>> On 2/15/2011 at 3:09 PM, Mike Rylander <mrylander at gmail.com> wrote:
>>>>>> I'll be starting work on an implementation of Monograph Parts (think:
>>>>>> DIsks 1-3; Volume A-D; etc), well, right now, in a git branch that
>>>>>> I'll push to
> http://git.esilibrary.com/?p=evergreen-equinox.git;a=summary
>>>
>>>>>> but I wanted to get the basic plan out there for comment.  So,
>>>>>> attached you'll find a PDF outlining the plan.  Comments and feedback
>>>>>> are welcome, but time for significant changes is slim.
>>>>>>
>>>>>> This is not intended to cover every single possible use of the concept
>>>>>> of Monograph Parts, but I believe it is a straight-forward design that
>>>>>> offers a very good code-to-feature ratio and should be readily used by
>>>>>> existing sites after upgrading to a version containing the code.
>>>>>
>>>>>
>>>>
>>>
>>
>

-- 
Mike Rylander
 | VP, Research and Design
 | Equinox Software, Inc. / The Evergreen Experts
 | phone:  1-877-OPEN-ILS (673-6457)
 | email:  miker at esilibrary.com
 | web:  http://www.esilibrary.com