[OPEN-ILS-DEV] direct_ingest.pl, biblio_fingerprint.js and Unicode chars

Warren Layton warren.layton at gmail.com
Fri Nov 27 15:33:33 EST 2009


I'm trying to import a number of bib records with "special" characters
in the MARC fields. I've gotten as far running direct_ingest.pl but
I'm noticing that biblio_fingerprint.js chokes on a few of them.

Looking a bit closer, I noticed that biblio_fingerprint.js chops
character codes down to two least significant hex digits. For example,
biblio_fingerprint.js turns "Č" (Č) into "&#x0c" ("form feed"),
which causes the direct_ingest.pl to skip the record and output the
following error:

  "Couldn't process record: invalid character encountered while
parsing JSON string"

Attached is a sample record that causes this problem for me (the
tarball includes both the original MARCXML and the BRE file generated
from it by marc2bre.pl). Any help would be appreciated! I can open a
bug on Launchpad, too, if needed.

Cheers,
 Warren
-------------- next part --------------
A non-text attachment was scrubbed...
Name: iii_prob_record.tar.gz
Type: application/x-gzip
Size: 1231 bytes
Desc: not available
Url : http://libmail.georgialibraries.org/pipermail/open-ils-dev/attachments/20091127/2c715bf5/attachment.bin 


More information about the Open-ils-dev mailing list