[OPEN-ILS-DEV] ***SPAM*** Problem with utf8 and MARC Edit

Alan Rykhus alan.rykhus at mnsu.edu
Wed Jun 9 09:57:21 EDT 2010


Hello,

We're having a problem in MARC Edit where when we try to save a record
we get the following:


Network or server failure.  Please check your Internet connection to
balsam.mnpals.net and choose Retry Network.  If you need to enter
Offline Mode, choose Ignore Errors in this and subsequent dialogs.  If
you believe this error is due to a bug in Evergreen and not network
problems, please contact your help desk or friendly Evergreen
administrators, and give them this information:
method=open-ils.cat.biblio.record.xml.update
params=["3cb05162d451a7aa5640490ffde742ca",99254,"<record
xsi:schemaLocation=\"http://www.loc.gov/MARC21/slim
http://www.loc.gov/standards/marcxml/schema/MARC21slim.xsd\" xmlns:xsi=
\"http://www.w3.org/2001/XMLSchema-instance\" xmlns=
\"http://www.loc.gov/MARC21/slim\">\n  <leader 
.
.
.
<subfield code=\"c\">99254</subfield>\n  </datafield>\n</record>"]
THROWN:
{"payload":[],"debug":"osrfMethodException :  *** Call to
[open-ils.cat.biblio.record.

I've traced the problem down to the function 'sub entityize' in
Application::AppUtils.

In this function there is a call to:

     $string = decode_utf8($string);

The problem seems to be that the record(string) is already in utf8. If
you check the string with:

     is_utf8($string)

a true response will be returned. Should this call to decode_utf8() be
wrapped? for example:

        if (! is_utf8($string)) {
            $string = decode_utf8($string);
        }

It seems that the object of decode_utf8 is to put the string into the
perl internal utf8 format used by perl and to turn the utf8 flag on. If
the flag is already on, as determined by the is_utf8 call, it does not
make sense to decode_utf8 a string that is already utf8.

In addition, according to the perl documentation:

is_utf8(STRING [, CHECK]) 

[INTERNAL] Tests whether the UTF8 flag is turned on in the STRING. If
CHECK is true, also checks the data in STRING for being well-formed
UTF-8. Returns true if successful, false otherwise.


So the is_utf8 call makes sure we have a well-formed string when the
utf8 flag is indeed on.

gosh I hope this makes sense(because it fixes the problem we're seeing)
-- al


-- 
Alan Rykhus
PALS, A Program of the Minnesota State Colleges and Universities 
(507)389-1975
alan.rykhus at mnsu.edu
"It's hard to lead a cavalry charge if you think you look funny on a
horse" ~ Adlai Stevenson



More information about the Open-ils-dev mailing list