[OPEN-ILS-DEV] Business::ISBN Patch for public.translateisbn1013()
Dan Scott
dan at coffeecode.net
Mon Aug 2 11:52:58 EDT 2010
Addendum to my patch - change "COMMENT ON FUNCTION
public.translate_isbn1013 IS $$" to "COMMENT ON FUNCTION
public.translate_isbn1013(TEXT) IS $$" - forgot the function
signature.
On 2 August 2010 11:24, Dan Scott <dan at coffeecode.net> wrote:
> On 2 August 2010 09:01, Jason Stephenson <jstephenson at mvlc.org> wrote:
>> Hi, all.
>>
>> Attached is a patch that replaces the body of public.translateisbn1013() in
>> 002.schema.config.sql with a PLPERLU function that uses Business::ISBN as
>> was discussed in IRC on Friday.
>
> Thanks Jason - this is a nice step forward!
>
>> I have tested it with two batches with a total of 19,999 bib records that
>> failed previously with the old version of the function. This new version
>> does not throw an error when an input ISBN is invalid, it simply does not
>> try to translate the ISBN to the other format and returns the input as the
>> output.
>
> Could you provide a small set of those test records where the input
> ISBN fails? This would be useful for comparing expected
> before-and-after behaviour, and also for having a small test bed for
> ISBN indexing (we could augment that with ISBNs that have hyphens,
> trailing metadata, etc). That way, we can trace the behaviour through
> ingest and the config.metabib_field /
> config.metabib_field_index_norm_map / config.indexing_normalizer
> indexing chain to ensure that we're getting what we want in the end.
>
>> Some notes about the implementation:
>
> Some quick notes on Perl in general - you'll also want to add the "use
> strict;" and "use warnings;" pragmas to pretty much any Perl that you
> write; it's really helpful for avoiding syntax problems like eq vs. ==
> and similar gotchas.
>
>> 1. It tries to mimic the output of the previous function implementation
>> exactly. It will in all cases except where the input ISBN contains dashes.
>> The old implementation would have simply output the input in that case. The
>> new implementation parses the ISBN and returns the input with the translated
>> ISBN tacked on the the end. (From the code, I believe that this was the
>> intent of the original implementation. Plus, it is not likely to encounter
>> that situation since ISBNs have dashes stripped when stored in MARC.)
>
> I can (sadly) attest that there are many sources of MARC records that
> include hyphens in the ISBNs.
>
> Note that in the reporter.simple_record view on which
> reporter.materialized_simple_record is based, we're not stripping
> hyphens from the ISBNs, so having hyphens in the original ISBN could
> cause missed searches for the ISBN quick search.
>
> In my ideal world, Evergreen would strip out the hyphens in the
> original MARC record as well, but the normalized ISBN that
> Business::ISBN returns (unsurprisingly) does not include the trailing
> metadata that sometimes follows ISBNs. But this is all about indexing,
> so we don't need that for this exercise.
>
>> 2. It works with ISBN13s that cannot be converted to ISBN10. In that case,
>> it simply returns the input because the translation failed.
>>
>> 3. Finally, it obviously adds a requirement for the Business::ISBN Perl
>> module to Evergreen (at least the database portion).
>>
>> You will also find a DCO attached.
>
> Great! I've attached a revised patch that works just a little differently:
>
> 0. It normalizes the input ISBN so that it has no hyphens on output.
> This way we can ensure that our ISBN searches in the full-text index
> should always be stripped of hyphens.
>
> 1. It returns the original (normalized) ISBN if there was a checksum
> error, along with the ISBN with the fixed checksum, to support cases
> where someone might have the bad ISBN in hand and is trying to search
> for it.
>
> So in the case of a 13-digit hyphenated ISBN with a bad checksum
> ("978-1444710518"), the revised patch should index the following ISBN
> values:
>
> 9781444710518 9781444710519 1444710516
>
> I also added comments so that others will know what we were aiming to
> do at some undefined point in the future, and added copyright
> statements acknowledging your original work and my own modifications
> to it.
>
> Let me know what you think; I believe we're headed in the right direction.
>
--
Dan Scott
Laurentian University
More information about the Open-ils-dev
mailing list