[OPEN-ILS-DEV] PATCH: osrf_json_object.c (storing numbers as character strings)

Scott McKellar mck9 at swbell.net
Wed Dec 19 21:15:43 EST 2007


The attached patches supersede the patches attached to the parent 
post.  The comments in my previous post are still valid, with the
following exceptions:

1. jsonObjectSetNumberString() now returns -1 if the string passed
as a parameter is not valid, and 0 otherwise.

2. I renamed the validation routine to jsonIsNumeric.

3. More importantly, the validation routine enforces a narrower
concept of what constitutes a numeric string.  I followed the
JSON grammar found at:

    http://www.json.org/

This grammar is more strict than strtod().  Most obviously, it
doesn't accept hexadecimal numbers, NANs, or infinities.  More
subtly:

a) A number may have a leading minus sign but not a leading plus sign.

b) There must be at least one digit to the left of the decimal.

c) If the first digit is a zero, it must be the only digit to the
left of the decimal.

d) If there is an explicit decimal point, it must be followed by
at least one digit.

Thus, for example, the following strings are not treated as numeric,
although strtod() will happily translate them into doubles:

    "040"    (leading zero is not only dgit to left of decimal)  
    "00.3"   (multiple leading zeros)
    "200."   (explicit decimal but no fractional digits)
    "+4e-7"  (leading plus sign)
    ".92"    (no digits to left of decimal)

(On the other hand the JSON rules permit numbers that strtod() will
choke on, like 1.23e+675867564839989645,)

We may want to relax these rules, but I wanted to start with the
puritanically most correct and rigorous validation.

Much of the JSON we parse is generated internally, from configuration
files or whatever.  We can ensure that the JSON we generate follows
the rules, though I have not begun to do so.  If we receive JSON
from other sources, we may have to either scrub the numbers into a
canonical format or relax the validation rules.

Scott McKellar
http://home.swbell.net/mck9/ct/

Developer's Certificate of Origin 1.1 By making a contribution to
this project, I certify that:

(a) The contribution was created in whole or in part by me and I
have the right to submit it under the open source license indicated
in the file; or

(b) The contribution is based upon previous work that, to the best
of my knowledge, is covered under an appropriate open source license
and I have the right under that license to submit that work with
modifications, whether created in whole or in part by me, under the
same open source license (unless I am permitted to submit under a
different license), as indicated in the file; or

(c) The contribution was provided directly to me by some other person
who certified (a), (b) or (c) and I have not modified it; and

(d) In the case of each of (a), (b), or (c), I understand and agree
that this project and the contribution are public and that a record
of the contribution (including all personal information I submit
with it, including my sign-off) is maintained indefinitely and may
be redistributed consistent with this project or the open source
license indicated in the file.

--- Scott McKellar <mck9 at swbell.net> wrote:

> These patches are a first attempt at storing numerical values as
> character strings, rather than doubles, in jsonObjects.
> 
> However they not ready for prime time, so please don't apply them to 
> the main trunk.  First, I have done only the most rudimentary testing
> at this point.  Secondly, the validation of character strings as 
> numeric doesn't really work properly yet; it's just a stopgap until I
> 
> can finish the real function.
> 
> Note that I have NOT done anything yet with the parser, so there 
> should be no visible change in behavior, unless of course I have 
> introduced bugs.
> 
> The purpose of these patches is to serve as a basis for 
> experimentation and discussion.
> 
> -----------------
> 
> First let me mention something which is unrelated to the storage of
> numbers but is obscured by those changes.
> 
> The original JSON_INIT_CLEAR macro freed the string of a JSON_STRING
> only if the object was changing into a non-string.  If it was still
> going to be a JSON_STRING, we still freed the string, but only after
> calling the macro.  I changed the macro to free the string of a 
> JSON_STRING unconditionally, and removed the other free.
> 
> Now, of course, I also free the string if the object is a
> JSON_NUMBER.
> 
> ------------------
> 
> 1. I store a numeric string using the same character pointer that we 
> use for JSON_STRING strings.  The alternative would be to use one
> string for numerics and a separate pointer for non-numerics, but I
> don't see any compelling reason to do it that way.
> 
> 2. I adopted the convention that a NULL pointer is equivalent to
> zero.
> 
> 3. In jsonNewNumberObject() I use a new doubleToString function to
> translate the parameter (a double) into a character string.  Yes, I
> know about the DOUBLE_TO_STRING macro, but I don't like it.  The
> macro always makes two calls to snprintf(), once to find out how big
> a buffer to allocate and once for real.  doubleToString only calls it
> once, except in the rare case where the we need more than the
> preallocated buffer of 64 bytes.
> 
> 4. In jsonObjectFree() I added a line to free the string for a
> JSON_NUMBER.
> 
> 5. add_json_to_buffer() got simpler.  Since the number is already
> in a character string, I don't have to format it into a character
> string before appending it to the growing_buffer.
> 
> 6. Originally jsonObjectGetString() returned NULL for anything but a
> JSON_STRING.  I extended it to return non-NULL for a JSON_NUMBER as
> well.  This change in semantics may or may not be welcome.  I don't
> know if any existing code relies on getting a NULL for a non-string
> object.  If so, we can easily revert to the original semantics.
> 
> 7. jsonObjectGetNumber now uses strtod() to translate the string
> into a double.
> 
> 8. I created a new jsonObjectSetNumberString function to supplement
> the existing jsonObjectSetNumber function.  Instead of accepting a
> double as the numeric value, it accepts a string, which it expects
> to encode a number.
> 
> At this point we need to decide what to do if the character string
> we receive is not numeric.  I have adopted the simplest possible
> policy: If the string is not numeric, I store a NULL pointer, which
> is treated elsewhere as the equivalent of a zero.
> 
> Other possibilities: return an error code, issue error messages, 
> coredump, or whatever suits your fancy.  We can discuss the options.
> 
> 9. I plan to implement a function similar to the existing
> jsonNewNumberObject(), accepting the numerical value as a string 
> instead of a double.  The same issue arises: what do we do with a
> non-numeric string?  Whatever we decide to do, we should do it
> consistently.
> 
> 10. jsonObjectSetNumber uses doubleToString() to format its
> double argument into a character string.
> 
> 11. In jsonObjectClone, I use a two-step procedure to clone a
> JSON_NUMBER.  I first create a JSON_STRING using the numeric string
> from the object to be cloned; then I change the object type to 
> JSON_NUMBER.  It should be possible to do it with a single step,
> but I haven't written the appropriate function yet (see item 9
> above).
> 
> 12. jsonObjectToSimpleString() got simpler, because I don't have to
> translate from a double to a string.
> 
> 13. The new isStringNumeric function is a bit of a cheat at this
> point.  I simply call strtod() and let it do the dirty work.
> 
> This is a temporary stopgap.  It is not a correct solution because
> strtod() accepts things as numeric that we don't want to treat as
> numeric: hexadecimal, NANs, and infinities.  I am working on a better
> solution that will march through the string looking for a valid
> character sequence, but it's not done yet.
> 
> -----------------
> 
> I expect to send at least one more new improved version of 
> osrf_json_object.c, if only to fix isStringNumeric().  I'm not sure
> whether I should send it as a patch based on the current trunk
> version
> or based on the patch I'm sending now.  Or maybe I should send the
> whole thing rather than a patch.
> 
> Scott McKellar
> http://home.swbell.net/mck9/ct/
-------------- next part --------------
A non-text attachment was scrubbed...
Name: osrf_json_h_9.patch
Type: text/x-patch
Size: 499 bytes
Desc: 2161725874-osrf_json_h_9.patch
Url : http://list.georgialibraries.org/pipermail/open-ils-dev/attachments/20071219/18b7a693/osrf_json_h_9.bin
-------------- next part --------------
A non-text attachment was scrubbed...
Name: osrf_json_object_c_6.patch
Type: text/x-patch
Size: 8337 bytes
Desc: 2539560665-osrf_json_object_c_6.patch
Url : http://list.georgialibraries.org/pipermail/open-ils-dev/attachments/20071219/18b7a693/osrf_json_object_c_6.bin


More information about the Open-ils-dev mailing list