[OPEN-ILS-DEV] Optimizing uescape()
Bill Erickson
erickson at esilibrary.com
Mon Nov 17 15:10:31 EST 2008
On Fri, 14 Nov 2008 14:19:54 -0500, Scott McKellar <mck9 at swbell.net> wrote:
> After taking a bit of a sabbatical to work on the election, I returned to
> the Evergreen project a few days ago. I was rummaging through utils.c
> just to get back into the groove, and lit upon uescape().
Welcome back :)
<snip>
> Both replacement candidates also eliminate the original full_escape
> parameter, which is a boolean. Internally we branch on it in order to
> decide whether to encode control characters or leave them unchanged.
>
> In all cases where we call uescape(), we pass a value of 1 for
> full_escape,
> meaning that we encode control characters. It's a useless parameter
> because in practice it makes no difference.
>
> I don't like boolean parameters because I always have trouble remembering
> whether true means *this* and false means *that*, or the other way
> around.
> If we ever need to provide the "false" behavior we should provide a
> separate function for it, with a name that indicates how it differs.
I can certainly get on board with that.
<snip>
> -------------
>
> There are some issues to resolve concerning the treatment of errors.
>
> If uescape() or full_uescape() encounters invalid Unicode, it returns
> NULL. The calling code can detect the error by checking for NULL, but
> in fact it doesn't. Without noticing or reporting the error, it just
> adds
> nothing to the growing_buffer that it's building, even if some of the
> unescaped string was valid.
>
> If that's a sensible way to respond to invalid Unicode, then I can make
> buffer_append_uescape() behave the same way.
>
> If we want to be able to detect invalid Unicode, then
> buffer_append_uescape() will have to change. Instead of returning a
> pointer to the growing_buffer, it could return a status code. In that
> case it would have to insist on getting a non-NULL pointer to the
> growing_buffer. That may be a wiser policy anyway.
I like the approach of returning a status (and requiring a non-NULL
growing_buffer pointer). In particular, with JSON parsing, it would be
nice to know invalid data was encountered so it can be reported in a
meaningful way.
>
> -------------
>
> My recommendation is to go with some form of buffer_append_uescape().
> It incorporates all of the optimizations in full_uescape(), plus it
> eliminates two mallocs and two frees, plus it doesn't have to do a
> strlen() to guess at a buffer size.
You have my vote on buffer_append_uescape() with full_uescape()
optimizations. That's clearly geared toward how we use the code.
>
> We could also keep the original uescape() around, just in case we ever
> need it again. Otherwise we can reuse the same name for a function with
> a different signature.
I think we should leave the original uescape() in, just to ease the
process of coupling different versions Evergreen and OpenSRF.
-b
--
Bill Erickson
| VP, Software Development & Integration
| Equinox Software, Inc. / The Evergreen Experts
| phone: 877-OPEN-ILS (673-6457)
| email: erickson at esilibrary.com
| web: http://esilibrary.com
More information about the Open-ils-dev
mailing list