[OPEN-ILS-DEV] Optimizing uescape()

Bill Erickson erickson at esilibrary.com
Mon Nov 17 15:10:31 EST 2008


On Fri, 14 Nov 2008 14:19:54 -0500, Scott McKellar <mck9 at swbell.net> wrote:

> After taking a bit of a sabbatical to work on the election, I returned to
> the Evergreen project a few days ago.  I was rummaging through utils.c
> just to get back into the groove, and lit upon uescape().

Welcome back :)

<snip>


> Both replacement candidates also eliminate the original full_escape
> parameter, which is a boolean.  Internally we branch on it in order to
> decide whether to encode control characters or leave them unchanged.
>
> In all cases where we call uescape(), we pass a value of 1 for  
> full_escape,
> meaning that we encode control characters.  It's a useless parameter
> because in practice it makes no difference.
>
> I don't like boolean parameters because I always have trouble remembering
> whether true means *this* and false means *that*, or the other way  
> around.
> If we ever need to provide the "false" behavior we should provide a
> separate function for it, with a name that indicates how it differs.

I can certainly get on board with that.

<snip>


> -------------
>
> There are some issues to resolve concerning the treatment of errors.
>
> If uescape() or full_uescape() encounters invalid Unicode, it returns
> NULL.  The calling code can detect the error by checking for NULL, but
> in fact it doesn't.  Without noticing or reporting the error, it just  
> adds
> nothing to the growing_buffer that it's building, even if some of the  
> unescaped string was valid.
>
> If that's a sensible way to respond to invalid Unicode, then I can make
> buffer_append_uescape() behave the same way.
>
> If we want to be able to detect invalid Unicode, then
> buffer_append_uescape() will have to change.  Instead of returning a
> pointer to the growing_buffer, it could return a status code.  In that
> case it would have to insist on getting a non-NULL pointer to the
> growing_buffer.  That may be a wiser policy anyway.


I like the approach of returning a status (and requiring a non-NULL  
growing_buffer pointer).  In particular, with JSON parsing, it would be  
nice to know invalid data was encountered so it can be reported in a  
meaningful way.


>
> -------------
>
> My recommendation is to go with some form of buffer_append_uescape().
> It incorporates all of the optimizations in full_uescape(), plus it
> eliminates two mallocs and two frees, plus it doesn't have to do a
> strlen() to guess at a buffer size.

You have my vote on buffer_append_uescape() with full_uescape()  
optimizations.  That's clearly geared toward how we use the code.

>
> We could also keep the original uescape() around, just in case we ever
> need it again.  Otherwise we can reuse the same name for a function with
> a different signature.

I think we should leave the original uescape() in, just to ease the  
process of coupling different versions Evergreen and OpenSRF.

-b


-- 
Bill Erickson
| VP, Software Development & Integration
| Equinox Software, Inc. / The Evergreen Experts
| phone: 877-OPEN-ILS (673-6457)
| email: erickson at esilibrary.com
| web: http://esilibrary.com


More information about the Open-ils-dev mailing list