[OPEN-ILS-DEV] Issues with direct_ingest.pl
Warren Layton
warren.layton at gmail.com
Mon Nov 17 21:46:39 EST 2008
I recently imported nearly 300,000 Unicorn records into Evergreen,
many with accented characters (we have a lot of French and bilingual
English-French material).
However, 17 records got rejected by direct_ingest.pl. A handful of
these rejects are our fault. However, I think I'm hitting some sort of
corner case for most of the others.
What I suspect is happening is that direct_ingest.pl rejects records
that have an accented character between square brackets ("[" and "])
in a field. For example, a record with the following 260 subfield will
be rejected:
<subfield code=\"b\">[Bibliothèque nationale du Canada],</subfield>
However, if I remove _either_ the square brackets _or_ the "̀",
the record will be successfully processed. My first guess was that the
square brackets ("[" and "]") would need to be escaped for JSON but
since all of these tags are in a string, the only character that needs
to be escaped are double-quotes (").
I should also note that other records with accented characters (e.g.,
"è") and/or square brackets were processed just fine, but I
think those with accented characters _between_ square brackets, as
above, are all in the reject pile (not 100% sure, but I'll be
verifying this ASAP).
I'm attaching a single problem record that's already been passed
through marc2bre.pl in case anyone wants to try reproducing the
problem. Field 260, subfield b contains the problem. I can provide
others if necessary.
I'll continue debugging but I would appreciate any insights!
Cheers,
Warren
-------------- next part --------------
A non-text attachment was scrubbed...
Name: eg_sample.bre.gz
Type: application/x-gzip
Size: 834 bytes
Desc: not available
Url : http://libmail.georgialibraries.org/pipermail/open-ils-dev/attachments/20081117/846e83b9/attachment-0001.bin
More information about the Open-ils-dev
mailing list