[OPEN-ILS-DEV] PATCH: osrf_json.c (rewrite of jsonObjectToJSON[Raw])

Scott McKellar mck9 at swbell.net
Tue Nov 18 16:01:51 EST 2008


--- On Tue, 11/18/08, Bill Erickson <erickson at esilibrary.com> wrote:

> From: Bill Erickson <erickson at esilibrary.com>
<snip>

> I've run into one issue with these last 2 patches, in
> particular, there  
> seems to be a subtle difference in uescape() and
> buffer_append_uescape().   
> The majority of the JSON parsing succeeds, but parsing
> certain  
> bibliographic data (MARC records) always results in NULL
> when attempting  
> to encode the containing object (biblio_record entry
> objects).    If I  
> replace the call to buffer_append_uescape() with a call to
> uescape() in  
> osrf_json_object.c, then all is well.
> 
> I would send you some sample data to test, but I've yet
> to recreate the  
> issue outside of a running system.  For now, I'm adding
> some logging to  
> see if I can track it down.  I just wanted to pass this on
> in case any  
> lightbulbs went off.  I'll let you if I find
> something...

I'm not sure what you mean by "always results in NULL".  What is NULL?
The return from the original uescape()?  That would indicate invalid
UTF-8 characters in the input string.

Maybe you mean that the buffer_append_uescape() seems to work, but the
resulting JSON string later fails to parse correctly.

In my test harness I translated things both ways and tested with strcmp()
to verify that I got the same results.  However I didn't test with
non-ASCII characters, so that's likely to be the weakest link.  It may
be that I'm mangling the UTF-8.

I'm more inclined to think that you have invalid UTF-8 characters in some
of your MARC records.  You wouldn't know about it because uescape()
silently discards an entire input string if any of it is invalid.  The
new function translates as far as it can before it hits the invalid
characters.  I *think* that the result would still be valid JSON, because
we only translate the values of strings, which get enclosed by double
quotes anyway.  However the semantics of that string might be invalid
at some higher level.

For diagnostic purposes, I suggest either or both of the following:

1. Check the return code from buffer_append_uescape(), and log a 
message if it's non-zero.

2. Tweak buffer_append_uescape() so that, if it detects invalid UTF-8,
it restores the growing_buffer to its original length.  In effect, this
change would make buffer_append_uescape() behave like uescape() in the
face of invalid UTF-8.  I can send you a patch for this if you like, but
I don't know if you would want it based on the original code or on the 
already patched version.

In the long run we should probably think about how best to respond to
invalid UTF-8.  For now we just want to identify the problem and fix it.

Meanwhile you can get most of the speedup by reverting to the original
uescape function but otherwise keeping the rewrite of jsonObjectToJSON().

Scott McKellar
http://home.swbell.net/mck9/ct/




More information about the Open-ils-dev mailing list