[OPEN-ILS-DEV] PATCH: osrf_json.c (rewrite of jsonObjectToJSON[Raw])
Bill Erickson
erickson at esilibrary.com
Tue Nov 18 17:32:00 EST 2008
On Tue, 18 Nov 2008 16:01:51 -0500, Scott McKellar <mck9 at swbell.net> wrote:
> --- On Tue, 11/18/08, Bill Erickson <erickson at esilibrary.com> wrote:
>
>> From: Bill Erickson <erickson at esilibrary.com>
> <snip>
>
>> I've run into one issue with these last 2 patches, in
>> particular, there
>> seems to be a subtle difference in uescape() and
>> buffer_append_uescape().
>> The majority of the JSON parsing succeeds, but parsing
>> certain
>> bibliographic data (MARC records) always results in NULL
>> when attempting
>> to encode the containing object (biblio_record entry
>> objects). If I
>> replace the call to buffer_append_uescape() with a call to
>> uescape() in
>> osrf_json_object.c, then all is well.
>>
>> I would send you some sample data to test, but I've yet
>> to recreate the
>> issue outside of a running system. For now, I'm adding
>> some logging to
>> see if I can track it down. I just wanted to pass this on
>> in case any
>> lightbulbs went off. I'll let you if I find
>> something...
>
> I'm not sure what you mean by "always results in NULL". What is NULL?
> The return from the original uescape()? That would indicate invalid
> UTF-8 characters in the input string.
>
> Maybe you mean that the buffer_append_uescape() seems to work, but the
> resulting JSON string later fails to parse correctly.
>
> In my test harness I translated things both ways and tested with strcmp()
> to verify that I got the same results. However I didn't test with
> non-ASCII characters, so that's likely to be the weakest link. It may
> be that I'm mangling the UTF-8.
>
> I'm more inclined to think that you have invalid UTF-8 characters in some
> of your MARC records. You wouldn't know about it because uescape()
> silently discards an entire input string if any of it is invalid. The
> new function translates as far as it can before it hits the invalid
> characters. I *think* that the result would still be valid JSON, because
> we only translate the values of strings, which get enclosed by double
> quotes anyway. However the semantics of that string might be invalid
> at some higher level.
>
> For diagnostic purposes, I suggest either or both of the following:
>
> 1. Check the return code from buffer_append_uescape(), and log a
> message if it's non-zero.
>
> 2. Tweak buffer_append_uescape() so that, if it detects invalid UTF-8,
> it restores the growing_buffer to its original length. In effect, this
> change would make buffer_append_uescape() behave like uescape() in the
> face of invalid UTF-8. I can send you a patch for this if you like, but
> I don't know if you would want it based on the original code or on the
> already patched version.
>
> In the long run we should probably think about how best to respond to
> invalid UTF-8. For now we just want to identify the problem and fix it.
>
> Meanwhile you can get most of the speedup by reverting to the original
> uescape function but otherwise keeping the rewrite of jsonObjectToJSON().
>
> Scott McKellar
> http://home.swbell.net/mck9/ct/
>
OK, I tracked down 2 problems, the first makes sense, the second, not so
much.
The first problem was the result of attempting to escape quotes (") and
backslashes (\) in the final if/else block of buffer_append_uescape().
Since " and \ are both >= ' ', they were not getting escaped.
The second was caused by NULL jsonObject's being passed from
OSRF_LIST_GET_INDEX (osrf_json_object:337) into add_json_to_buffer().
osrfList allows NULL entries, so it seems the old could should have been
dying here, but for some reason only the new code suffers.
Slightly modified patch committed:
http://svn.open-ils.org/trac/OpenSRF/changeset/1498
Thanks again, Scott. Eyes on on the latest patch would be appreciated.
-b
--
Bill Erickson
| VP, Software Development & Integration
| Equinox Software, Inc. / The Evergreen Experts
| phone: 877-OPEN-ILS (673-6457)
| email: erickson at esilibrary.com
| web: http://esilibrary.com
More information about the Open-ils-dev
mailing list