[open-ils-commits] r15320 - in trunk/Open-ILS/src/perlmods/OpenILS: Application WWW (dbs)

svn at svn.open-ils.org svn at svn.open-ils.org
Wed Jan 13 23:54:51 EST 2010


Author: dbs
Date: 2010-01-13 23:54:48 -0500 (Wed, 13 Jan 2010)
New Revision: 15320

Modified:
   trunk/Open-ILS/src/perlmods/OpenILS/Application/AppUtils.pm
   trunk/Open-ILS/src/perlmods/OpenILS/WWW/SuperCat.pm
Log:
Move the decode_utf8() call for various feeds to entityize()

decode_utf8() is special in that it won't transform any string once
its 'utf8' flag has been set, so it is a safe subroutine to call 
multiple times for a given string.

As it turns out, SRU (and in turn Z39.50) is suffering from a double-encoding
of the search terms, so we have to forcefully double-decode the terms with
the decode('utf8') variation that does not respect the 'utf8' string.

This will enable Z39.50 and SRU queries to actually return results for
queries like 'Montr?\195?\169al' and 'Qu?\195?\169bec'. Eventually we need to figure out
where in the SRU/CGI stack the strings are being incorrectly encoded in
the first place, but for now a much-improved Z39.50 server seems acceptable.


Modified: trunk/Open-ILS/src/perlmods/OpenILS/Application/AppUtils.pm
===================================================================
--- trunk/Open-ILS/src/perlmods/OpenILS/Application/AppUtils.pm	2010-01-14 04:22:13 UTC (rev 15319)
+++ trunk/Open-ILS/src/perlmods/OpenILS/Application/AppUtils.pm	2010-01-14 04:54:48 UTC (rev 15320)
@@ -14,6 +14,7 @@
 use Unicode::Normalize;
 use OpenSRF::Utils::SettingsClient;
 use UUID::Tiny;
+use Encode;
 
 # ---------------------------------------------------------------------------
 # Pile of utilty methods used accross applications.
@@ -1485,6 +1486,10 @@
     my($self, $string, $form) = @_;
 	$form ||= "";
 
+	# If we're going to convert non-ASCII characters to XML entities,
+	# we had better be dealing with a UTF8 string to begin with
+	$string = decode_utf8($string);
+
 	if ($form eq 'D') {
 		$string = NFD($string);
 	} else {

Modified: trunk/Open-ILS/src/perlmods/OpenILS/WWW/SuperCat.pm
===================================================================
--- trunk/Open-ILS/src/perlmods/OpenILS/WWW/SuperCat.pm	2010-01-14 04:22:13 UTC (rev 15319)
+++ trunk/Open-ILS/src/perlmods/OpenILS/WWW/SuperCat.pm	2010-01-14 04:54:48 UTC (rev 15320)
@@ -874,7 +874,7 @@
 
 
 	print "Content-type: ". $feed->type ."; charset=utf-8\n\n";
-	print $U->entityize(decode_utf8($feed->toString)) . "\n";
+	print $U->entityize($feed->toString) . "\n";
 
 	return Apache2::Const::OK;
 }
@@ -951,7 +951,7 @@
 
 
 	print "Content-type: ". $feed->type ."; charset=utf-8\n\n";
-	print $U->entityize(decode_utf8($feed->toString)) . "\n";
+	print $U->entityize($feed->toString) . "\n";
 
 	return Apache2::Const::OK;
 }
@@ -1676,9 +1676,14 @@
 	my ($shortname, $holdings) = $url =~ m#/?([^/]*)(/holdings)?#;
 
 	if ( $resp->type eq 'searchRetrieve' ) {
-		my $cql_query = decode_utf8($req->query);
-		my $search_string = decode_utf8($req->cql->toEvergreen);
 
+		# These terms are arriving to us double-encoded, so until we
+		# figure out where in the CGI/SRU chain that's happening, we
+		# have to # forcefully double-decode them a second time with
+		# the outer decode('utf8', $string) call
+		my $cql_query = decode('utf8', decode_utf8($req->query));
+		my $search_string = decode('utf8', decode_utf8($req->cql->toEvergreen));
+
 		# Ensure the search string overrides the default site
 		if ($shortname and $search_string !~ m#site:#) {
 			$search_string .= " site:$shortname";



More information about the open-ils-commits mailing list