[OPEN-ILS-DEV] C nits: json_parser.c (Part 1)

Scott McKellar mck9 at swbell.net
Sat Apr 21 13:54:17 EDT 2007


For certain kinds of input, the parser in json_parser.c will leak
memory, and possibly misbehave in other respects.

In practice, I expect that the inputs are normally created by other 
programs.  As long as those programs produce sensible output, these
issues won't arise.  However the code remains vulnerable to buggy
programs, if not to deliberate mischief.

The code in this module parses a dialect of the JSON format, extended
so as to include comments.  The comments may be either C-style
( /*...*/ ) or C++-style ( //... ).  The C++-style comments are not an
issue, and I shall not discuss them further.  The C-style comments are
interesting, however, because they can carry hints that are
semantically meaningful.

After some preliminaries we call _json_parse_string(), which is the
top level of a recursive descent parser.  This function skips over
any leading white space and then processes any comments before
descending into the guts of the JSON.

Problems arise if there are multiple hint comments at this point.
At a minimum, we will leak memory.

When we process each comment, we assume that it may carry a classname
hint.  The json_eat_comment function copies the hint to a dynamically 
allocated buffer and assigns it to the classname variable.  If there
is a second hint comment, we overwrite the pointer provided from the
first one, without first freeing the memory to which it points.  If
there is a third hint comment, we leak the memory from the second 
one, and so forth.

As a result, only the last comment counts, and we lose any memory
allocated for the preceding ones.

A hint comment should be of the form /*--E hint--*/ or /*--S hint--*/.
Oddly enough, it makes no difference whether it's an E hint or an S
hint -- the parser treats them identically.

If the comment is well formed, the json_eat_comment function will
parse it correctly, so far as I can tell.  If the comment is not
wll-formed, howeer, the parser will not detect any problem.  It will 
extract a hint one way or another.

If the comment is not well formed, then the resulting hint will
probably be an empty string.  In that case we ignore it -- we don't
overwrite the pointer in the calling function, and no harm is done.

However it is possible to construct a non-hint comment that the
parser will treat as a valid hint.  For example:

     /* This comment is an ill-considered, ill-formed Error */

We march through the characters one by one, ignoring everything but
the hyphens.  When we see the second hyphen, we start looking for
'E' or 'S'.  When we see the first letter of "Error" we start 
collecting characters into our buffer.  The resulting hint is 
"rror ".

Other possible perversities are left as an exercise for the reader.

Even a well-formed hint can get slightly mangled.  For example:

    /*--S well-formed hint --*/

...yields the hint "wellformed hint ", because we don't collect
the hyphen embedded within the hint.  In practice I suspect that
hints don't include hyphens anyway, but the parser shouldn't be
expected to know that.

---------

In summary: _json_parse_string() should implement some explicit
policy about multiple hint comments.  Probably the second hint 
comment should kick out as an error.

Also, json_eat_comment() should do tighter validation of the hint
syntax.  This validation will probably be easier if the finite
state machine uses a single state variable rather than a collection
of booleans.

Warning: tighter validation may unmask bugs in other programs that
produce ill-formed hints.

Scott McKellar
http://home.swbell.net/mck9/aargh/



More information about the Open-ils-dev mailing list