[OPEN-ILS-DEV] Dead functions

Tue Dec 16 14:01:31 EST 2008

--- On Mon, 12/15/08, Mike Rylander <mrylander at gmail.com> wrote:

<snip -- my whining about jsonParseFile() and its friends>

> > Two questions:
> >
> > 1. Can we get rid of this whole gaggle of unused
> functions?
> >
> > 2. If we want to keep some form of jsonParseFile(),
> would it make sense
> >   to use mmap() instead of loading the file into a
> growing_buffer?  I
> >   suspect that a memory-mapped file would be faster to
> read, but I
> >   haven't tried to benchmark it.
> >
> 
> YES YES A THOUSAND TIMES YES!
> 
> eh hem... Pardon me.  I was stuck in some dark and twisty
> passages,
> all alike just earlier today.
> 
> Thinking about it a bit more rationally, I'd like to
> see a mmap-based
> function.  I can see a good use for this as the basis of a
> dump/reload
> utility.  I wonder if it's time to just take an axe to
> the legacy code
> in trunk, though.  We've got 1.0 out there if
> there's a need for a
> legacy-to-new bridge.

When you say YES * 1000, I'm not sure if you mean yes let's get rid of
these functions or yes let's use mmap().  In case you meant the latter,
please see the attached.

This is a benchmark program trying to compare the existing
jsonParseFile() with a new version using mmap().  My experience so far
is that there's no clear difference in performance.  I suspect, however
that the mmap() version will be faster for large files.

Benchmarking is a tricky business, so I have to go into some detail.

The program needs three files of JSON text, with identical contents,
named json_sample_1, json_sample_2, and json_sample_3.  My own sample
files were each 202 bytes long.

First I call the old jsonParseFile and throw away the resulting
jsonObject.  The reason is that we cache and reuse jsonObjects to
reduce the churn on memory.  Without such a first pass, the first
trial would suffer from a spurious disadvantage because the second
trial would benefit from the caching.

I use three separate but identical files to eliminate the effects of
IO caching by the operating system.  To really eliminate this IO caching
I have to run the program after a reboot (unless there's ome other way
that I don't know about).  I tried that once, and it made no obvious
difference.

Because of this IO caching, I didn't apply the usual trick of doing my
trials in long-runnig loops in order to even out the numbers.  As a
result my timings are a bit jumpy, ranging roughly between 110 microsec
and 160 microsec per trial.

For the trial of the old version I call a local copy of jsonParseFile(),
to eliminate the possibility that a local function might be faster
or slower than a function in some other module.

Almost always, the new version runs a bit slower than the old version,
by several tens of microsec.  However this difference appears to be
spurious, or at least mostly so.  It results from the fact that I try
the old version first and the new version second.  When I reverse the
sequence, the new version runs a little faster than the old, by a similar
amount.  Even if I call the old version twice in a row, the second one
usually runs a little slower.

I can't account for this sequence-dependent difference in timing.  More
commonly the second time I do something it goes faster, because of
caching at some level.  I don't know why it would go slower.

The bottom line is that, in my benchmark, there's no demonstrable
difference in timing between the old version and the new.

Note however that my test file is small enough to fit easily into
a single IO buffer, and into the first allocation of a growing_buffer.
The results may be different for large files.  The time complexity of
growing a growing_buffer is (I think) O(nlogn), because of the need to 
repeatedly copy the contents as you repeatedly expand the buffer.  (It
would be O(n-squared) if we didn't allocate successively larger
buffers.)  The time complexity of mmap() is presumably O(n), so it
should be faster for sufficiently large n.

I can't readily generate a large enough JSON file to test my theory.
Maybe somebody else can.  However it may be that any difference in
the loading is swamped by the noise of parsing the text and building
the jsonObject.

If you want to build a dump/reload utility it will probably need to
work with large files.  Note however that both the old and the new
methods require the entire file to fit into memory. and that may be
a problem for sufficiently large files.  It might be necessary to
change the parser so that it looks at a character at a time, or maybe
a chunk at a time, rather than a complete preloaded string.

Scott McKellar
http://home.swbell.net/mck9/ct/
-------------- next part --------------
A non-text attachment was scrubbed...
Name: tjload.c
Type: text/x-csrc
Size: 3800 bytes
Desc: not available
Url : http://libmail.georgialibraries.org/pipermail/open-ils-dev/attachments/20081216/408e3ab1/attachment.c