[OPEN-ILS-DEV] Business::ISBN Patch for public.translateisbn1013()

Dan Scott dan at coffeecode.net
Mon Aug 2 11:24:20 EDT 2010


On 2 August 2010 09:01, Jason Stephenson <jstephenson at mvlc.org> wrote:
> Hi, all.
>
> Attached is a patch that replaces the body of public.translateisbn1013() in
> 002.schema.config.sql with a PLPERLU function that uses Business::ISBN as
> was discussed in IRC on Friday.

Thanks Jason - this is a nice step forward!

> I have tested it with two batches with a total of 19,999 bib records that
> failed previously with the old version of the function. This new version
> does not throw an error when an input ISBN is invalid, it simply does not
> try to translate the ISBN to the other format and returns the input as the
> output.

Could you provide a small set of those test records where the input
ISBN fails? This would be useful for comparing expected
before-and-after behaviour, and also for having a small test bed for
ISBN indexing (we could augment that with ISBNs that have hyphens,
trailing metadata, etc). That way, we can trace the behaviour through
ingest and the config.metabib_field /
config.metabib_field_index_norm_map / config.indexing_normalizer
indexing chain to ensure that we're getting what we want in the end.

> Some notes about the implementation:

Some quick notes on Perl in general - you'll also want to add the "use
strict;" and "use warnings;" pragmas to pretty much any Perl that you
write; it's really helpful for avoiding syntax problems like eq vs. ==
and similar gotchas.

> 1. It tries to mimic the output of the previous function implementation
> exactly. It will in all cases except where the input ISBN contains dashes.
> The old implementation would have simply output the input in that case. The
> new implementation parses the ISBN and returns the input with the translated
> ISBN tacked on the the end. (From the code, I believe that this was the
> intent of the original implementation. Plus, it is not likely to encounter
> that situation since ISBNs have dashes stripped when stored in MARC.)

I can (sadly) attest that there are many sources of MARC records that
include hyphens in the ISBNs.

Note that in the reporter.simple_record view on which
reporter.materialized_simple_record is based, we're not stripping
hyphens from the ISBNs, so having hyphens in the original ISBN could
cause missed searches for the ISBN quick search.

In my ideal world, Evergreen would strip out the hyphens in the
original MARC record as well, but the normalized ISBN that
Business::ISBN returns (unsurprisingly) does not include the trailing
metadata that sometimes follows ISBNs. But this is all about indexing,
so we don't need that for this exercise.

> 2. It works with ISBN13s that cannot be converted to ISBN10. In that case,
> it simply returns the input because the translation failed.
>
> 3. Finally, it obviously adds a requirement for the Business::ISBN Perl
> module to Evergreen (at least the database portion).
>
> You will also find a DCO attached.

Great! I've attached a revised patch that works just a little differently:

0. It normalizes the input ISBN so that it has no hyphens on output.
This way we can ensure that our ISBN searches in the full-text index
should always be stripped of hyphens.

1. It returns the original (normalized) ISBN if there was a checksum
error, along with the ISBN with the fixed checksum, to support cases
where someone might have the bad ISBN in hand and is trying to search
for it.

So in the case of a 13-digit hyphenated ISBN with a bad checksum
("978-1444710518"), the revised patch should index the following ISBN
values:

9781444710518 9781444710519 1444710516

I also added comments so that others will know what we were aiming to
do at some undefined point in the future, and added copyright
statements acknowledging your original work and my own modifications
to it.

Let me know what you think; I believe we're headed in the right direction.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: translateISBN.patch
Type: text/x-diff
Size: 2532 bytes
Desc: not available
Url : http://libmail.georgialibraries.org/pipermail/open-ils-dev/attachments/20100802/a930b4da/attachment.patch 


More information about the Open-ils-dev mailing list