[OPEN-ILS-DEV] Business::ISBN Patch for public.translateisbn1013()
Dan Scott
dan at coffeecode.net
Mon Aug 2 23:46:53 EDT 2010
On 2 August 2010 17:08, Jason Stephenson <jstephenson at mvlc.org> wrote:
> Quoting Dan Scott <dan at coffeecode.net>:
>
>>
>> Could you provide a small set of those test records where the input
>> ISBN fails?
>
> Here are 10 in a MARC21slim collection.
Thanks Jason.
I realized that the intention of the translate_isbn1013 function was
to take a single string containing all of the ISBN values in a given
record and to process those to produce the indexed ISBNs, so the
implementation changed a little further. Luckily, Business::ISBN is
pretty good at gleaning ISBNs from bad data like the records you
provided. Here are the results, in a nutshell, from the version of the
function that I committed earlier tonight from loading your "bad ISBN"
records (available in Open-ILS/tests/datasets/badisbns.xml for future
testing). The "value" column contains the original value(s) from the
020a fields in the source record, and the index_vector shows the
indexed values:
evergreen=# SELECT * FROM metabib.identifier_field_entry WHERE field =
18 ORDER BY source;
id | source | field | value
|
index_vector
----+--------+-------+------------------------------------------------------------------+----------------------------------------------------------------------------------------
------------
14 | 5 | 18 | HL00361126 :
| 'hl00361126':1
16 | 6 | 18 | 089542262X295
| '089542262x295':1
18 | 7 | 18 | 0553101315175 0553204300 0553268473 (pbk.) :
| '0553101315172':2 '0553101315175':1
20 | 8 | 18 | 4980858214bbd :
| '4980858214':1 '498085821x':2 '9784980858219':3
22 | 9 | 18 | 0440487331pbk.
| '0440487331':1 '9780440487333':2
24 | 10 | 18 | JL03022802 set :
| 'jl03022802':1
26 | 11 | 18 | 978184533213X (hbk.) 184533213X (hbk.)
9781845332136 (hbk.) | '184533213x':3 '9781845332136':2
'978184533213x':1
28 | 12 | 18 | (0373197527pbk.) :
| '0373197527':1 '9780373197521':2
30 | 13 | 18 | 978788430094X
| '788430094x':3 '9787884300945':2 '978788430094x':1
32 | 14 | 18 | 8901452100503
| '8901452100503':1,2
(14 rows)
I would say that's not bad at all; not only did it extract some ISBNs
from some messed up fields like '(0373197527pbk.) :', but it was also
able to fix the checksum and provide the appropriate values.
Thanks for all of your attention on this, Jason, it has helped clean
up one of this area of Evergreen 2.0 nicely, and helped me sharpen my
understanding of the new and improved indexing process that Mike
brought to the table.
More information about the Open-ils-dev
mailing list