[OPEN-ILS-DEV] ***SPAM*** Re: ***SPAM*** Problem with utf8 and MARC Edit

Dan Scott dan at coffeecode.net
Wed Jun 9 11:50:43 EDT 2010


On Wed, 2010-06-09 at 08:57 -0500, Alan Rykhus wrote:
> Hello,
> 
> We're having a problem in MARC Edit where when we try to save a record
> we get the following:
> 
> 
> Network or server failure.  Please check your Internet connection to
> balsam.mnpals.net and choose Retry Network.  If you need to enter
> Offline Mode, choose Ignore Errors in this and subsequent dialogs.  If
> you believe this error is due to a bug in Evergreen and not network
> problems, please contact your help desk or friendly Evergreen
> administrators, and give them this information:
> method=open-ils.cat.biblio.record.xml.update
> params=["3cb05162d451a7aa5640490ffde742ca",99254,"<record
> xsi:schemaLocation=\"http://www.loc.gov/MARC21/slim
> http://www.loc.gov/standards/marcxml/schema/MARC21slim.xsd\" xmlns:xsi=
> \"http://www.w3.org/2001/XMLSchema-instance\" xmlns=
> \"http://www.loc.gov/MARC21/slim\">\n  <leader 
> .
> .
> .
> <subfield code=\"c\">99254</subfield>\n  </datafield>\n</record>"]
> THROWN:
> {"payload":[],"debug":"osrfMethodException :  *** Call to
> [open-ils.cat.biblio.record.
> 
> I've traced the problem down to the function 'sub entityize' in
> Application::AppUtils.
> 
> In this function there is a call to:
> 
>      $string = decode_utf8($string);
> 
> The problem seems to be that the record(string) is already in utf8. If
> you check the string with:
> 
>      is_utf8($string)
> 
> a true response will be returned. Should this call to decode_utf8() be
> wrapped? for example:
> 
>         if (! is_utf8($string)) {
>             $string = decode_utf8($string);
>         }
> 
> It seems that the object of decode_utf8 is to put the string into the
> perl internal utf8 format used by perl and to turn the utf8 flag on. If
> the flag is already on, as determined by the is_utf8 call, it does not
> make sense to decode_utf8 a string that is already utf8.
> 
> In addition, according to the perl documentation:
> 
> is_utf8(STRING [, CHECK]) 
> 
> [INTERNAL] Tests whether the UTF8 flag is turned on in the STRING. If
> CHECK is true, also checks the data in STRING for being well-formed
> UTF-8. Returns true if successful, false otherwise.
> 
> 
> So the is_utf8 call makes sure we have a well-formed string when the
> utf8 flag is indeed on.
> 
> gosh I hope this makes sense(because it fixes the problem we're seeing)
> -- al
> 
> 

Hi Alan:

You forgot to mention which version of Evergreen you are running and
what Linux distribution you're running on.

Also, on decode_utf8() vs. is_utf8(), Perl best practices
(http://juerd.nl/site.plp/perluniadvice) suggest that you stay the hell
away from is_utf8(). decode_utf8() is supposed to detect if the incoming
string is already UTF8, and if it is, pass it back untouched. 

If you have the buggy version of Encode.pm, as Dan Wells pointed out,
decode_utf8() is probably giving the string a bad touch and causing your
problems.

Dan



More information about the Open-ils-dev mailing list