[OPEN-ILS-DEV] MFHD Patch - Third Try

Dan Wells dbw2 at calvin.edu
Sat Nov 21 11:02:53 EST 2009


Hello Dan,

Thanks for the thorough review.  See more comments inline.

> 
> Thanks Dan. On the whole, I'm pretty happy with the direction you're
> taking things, based on my understanding of MFHD (I'm certainly no
> expert!). I do have a few questions / concerns / comments, though, that
> should be easy for you to respond to.
> 
> In general:
> 
>   * It would be awesome (but, with the exception of David's MFHD code,
> largely unprecedented, so don't feel bad!) to have tests that go
> hand-in-hand with the code that you're introducing, so we can see the
> expected input & behaviour. But I live in a dream world...
>

I will be happy to add tests in the next patch.
 
> MFHD.pm:
>   * active_captions() subroutine
>     * In your comments, you mention that the standard is hazy about how
> multiple active patterns should be handled. I've looked in
> http://www.loc.gov/marc/holdings/hd853855.html and
> http://www.loc.gov/marc/holdings/echdcntf.html and don't see the term
> "active" used in either of those places, so perhaps there's a
> terminology mismatch, or I'm looking in the wrong place.

This is simply a case where the standard is not well thought out when it comes to predictions.  In 2001 it was proposed that $o be added to the 853 field to allow for describing holdings like this:

853 10 $8 1 $a v. $b no. $i (year) $j (month) $x 10 $z dclatn $o Marine Biology
853 10 $8 2 $a v. $b no. $i (year) $j (month) $x 10 $z dclatn $o Marine Geology and Physiography
863 41  $8 1.1 $a A1 $b 1 $i 1982 $j 10 
863 41  $8 2.1 $a B1 $b 1 $i 1982 $j 10

Here we have a single publication which publishes in distinct alternating parts.  At the time of the proposal, however, it was noted that "use of subfield $o in 853/863 is confusing because usually repetition of the field implies a change in pattern rather than indication of part."  In other words, in almost all cases, the presence of an 853 with '$8 2' will supercede an 853 with '$8 1', not continue to run alongside it.  Unfortunately, this proposal was adopted five years later in 2006 without addressing this concern.  I suspect we will need to extend the standard in some small way to accurately support predictions when we have multiple simultaneous 'active' patterns.  See:
http://www.loc.gov/marc/marbi/2001/2001-dp07.html and
http://www.loc.gov/marc/marbi/2006/2006-05.html 
I should also note that this should ultimately affect a very small number of serials.

> 
>   * generate_predictions() subroutine
>     * It would be useful to have more comments describing the approach
> you're taking to generate the predictions
>     * I don't understand why perl2JSON is being invoked on one element
> of the returned array. OpenSRF normally handles the conversion to/from
> JSON for us, so I don't think you added it for OpenSRF service support. 
>     * The new dependency on OpenSRF::Utils::JSON also complicates reuse
> of the the MFHD modules. Until now, they could be picked up and reused
> by / collaborated on with another project (hello Koha!) because they
> haven't had any dependencies on the rest of the OpenSRF or Evergreen
> libraries.
> 

The generate_predictions() subroutine is at best in the alpha stage.  It is a first attempt to use the MFHD code to generate all the pieces needed to populate the issuance table as described in the wiki page you linked to.  I am using perl2JSON to flatten the subfields for easier storage in the database in a way which is simple to rebuild accurately.  There are certainly many ways to accomplish this same thing, but I thought I would start by taking advantage of a reliable, available method.

> MFHDParser.pm:
>   * No concerns
> 
> MFHD/Caption.pm:
>   * next() subroutine:
>     * We enter into a for() loop over a..m, and then, if the holding
> that was passed in is compressed and open-ended, we return from the
> subroutine. It looks like we could test for compression + open-endedness
> and return immediately before we even enter the for() loop, no?
>

You are absolutely right, that entire if/else structure should be moved out the loop, as there is no need to reset the index every time either.
 
> MFHD/Holdings.pm:
> 
>   * notes() subroutine:
>     * There's a lot going on in here that doesn't appear to be in use
> yet but will be built on subsequently. I'm not sure how you're planning
> on calling it, but perhaps passing the notes list by reference, and
> making it the first argument, rather than passing the list by value as
> the second argument, would avoid some of the shift / test / unshift
> gyrations? Hmm, well, it sort of looks like it tries to handle being
> called with the notes passed as a reference, too, as there are also
> tests for whether the first element of @notes is itself a reference. Ah,
> it also acts as a getter, I see. Would it be cleaner as separate
> get_notes() / set_notes() subroutines? In any case, some comments would
> be helpful here.
>     * Also, a typo: 'specifiying'
> 

More code comments are always good, but I would have to disagree about separate get_notes() and set_notes() methods.  A single method which does both seems like a more common and convenient object interface style, even if it means the code is a bit messier underneath.  As for accepting either a list or an array reference, it was simply an attempt to be more robust, but not at all necessary.

>   * format() subroutine:
>     * I'm not too happy having the English "Note: " hardcoded here. In
> format_textual_holdings in MFHDParser.pm, I delimited the holdings and
> the note via a single dash ('$holdings - $note)', but I suppose there is
> a possibility of the single dash introducing some confusion with
> compressed holdings. Maybe a double-dash as a delimiter?
> 

I wasn't too happy with it either!  I tried to find a clear rule about note formatting/punctuation and came up empty.  Double-dash seems like a good option, or maybe ' : ' or ' ; '?  (I am not looking at a statement at the moment to see what works)  Or maybe nothing at all?  This remains an open question.

>   * renaming match() to matches() subroutine
>     * In my opinion, this wasn't really a necessary change.
> 

In this case, from what I could tell, match() was not being used anywhere, and not changing it seemed like a missed opportunity.  The word 'match' when standing alone implies an imperative action verb with no implied truth value, while 'matches' is a relationship comparison verb which normally results in a true or false distinction.  Necessary?  Absolutely not!  Helpful?  I think so, or at least I thought so :)

>   * chron_to_date() subroutine:
>     * Would be nice to have a comment describing what the 3 elements of
> the @chron_start and @chron_end lists represent.
> 
>   * _uncombine() subroutine:
>     * Typo of the sub name in carp(): "uncombine"
> 

You are right in both cases!

> Well, I've kind of run out of steam at this point. I did want to send
> some feedback your way, based on reading the patches (I haven't had time
> yet to actually run the code, although I would like to compare the
> formatting of the holdings for our serials to check for significant
> differences). I haven't seen any showstoppers, and I think the more
> concerns that I've expressed could be addressed in subsequent patches if
> this patch were committed to trunk as-is.
> 
> I'm not sure what the requirements are for DCOs when a patch has been
> resubmitted. I'll ask for one anyway, just in case.
> 
> Thanks very much for this, Dan - both for the code and the patience that
> you've demonstrated waiting for a review of your patches. And for
> helping spur a better perltidy format. And for writing up your serials
> thoughts at
> http://evergreen-ils.org/dokuwiki/doku.php?id=acq:serials:basic_predicting_a 
> nd_receiving (I'll probably have some comments about that next week, if 
> you're interested).

Thanks for all the careful reviewing and comments.  I will certainly address the mentioned shortcomings in a future patch, but would prefer to wait until after this patch is in trunk, if only for the benefit of keeping clear what is being changed at each iteration and what has already been reviewed.

Thanks again,
Dan

================================================
Developer's Certificate of Origin 1.1

By making a contribution to this project, I certify that:

(a) The contribution was created in whole or in part by me and I
    have the right to submit it under the open source license
    indicated in the file; or

(b) The contribution is based upon previous work that, to the best
    of my knowledge, is covered under an appropriate open source
    license and I have the right under that license to submit that
    work with modifications, whether created in whole or in part
    by me, under the same open source license (unless I am
    permitted to submit under a different license), as indicated
    in the file; or

(c) The contribution was provided directly to me by some other
    person who certified (a), (b) or (c) and I have not modified
    it.

(d) I understand and agree that this project and the contribution
    are public and that a record of the contribution (including all
    personal information I submit with it, including my sign-off) is
    maintained indefinitely and may be redistributed consistent with
    this project or the open source license(s) involved.

Signed-off-by: Dan Wells <dbw2 at calvin.edu>


More information about the Open-ils-dev mailing list