[OPEN-ILS-DEV] Bib that blows up the import

Dan Scott denials at gmail.com
Thu Jul 3 15:44:40 EDT 2008


2008/7/3 Dan Scott <denials at gmail.com>:
> 2008/7/3 Frances Dean McNamara <fdmcnama at uchicago.edu>:
>> We are using yaz to convert (we already have a setup using that for our AquaBrowser weekly dumps of the db, so they used that)  So this happened when it was running with the xml parameter on a yaz file, then I reproduced the problem with a straight marc file using the perl.
>>
>> I'll ask Dale to look at your yaz command line as opposed to the one we have been using.  Thanks.
>>
>> I guess what we have discovered is that we may have to spend some time on a custom conversion bib program if we went with this as all sorts of interesting issues may show up in such a big file.  Turns out the process would skip that record and go on but I don't think it writes an error which we would need.
>>
>> That was LC cataloging, so apparently sometimes the do add a 500 with no subfield code.  The problem looks like it happens when the subfield delimeter and code are missing AND the text start with a quotation mark.  We won't try to fix right now, just note it as an issue
>>
>
> Ah, it's actually very helpful to provide the exact toolset /
> processing chain you're using when looking for help debugging a
> problem. I retract any aspersions that may have been cast on
> MARC::Record / MARC::File::XML!
>
> And embarrassingly for me, if you look at the XML record I sent, it
> has <subfield code="&quot;">August 1993"</subfield> for the offending
> subfield rather than <subfield code="a">August 1993"</subfield>. So
> yaz 2.1.56 doesn't resolve that problem. I wouldn't be surprised if a
> newer version resolves that, though.

Well, just tried 3.0.34 (released just a few weeks ago) and it shows
exactly the same problem. And reading the MARC21 specs for subfield
codes, it's not a bug, the " symbol is one of the characters reserved
for local definition as a data element identifier:

http://www.loc.gov/marc/specifications/specrecstruc.html#varifields

So yes, you'll have to either preprocess your MARC21 data, or
post-process the MARC21XML data, if you really want a subfield 'a'
where you currently have a subfield '"'. I've used the latter approach
for this problem in the past; it's pretty straightforward to parse
through the XML and globally change:

<subfield code="&quot;">

to:

<subfield code="a">"

-- 
Dan Scott
Laurentian University


More information about the Open-ils-dev mailing list