[OPEN-ILS-DEV] PATCH: osrf_hash[ch] (new implementation)

Scott McKellar mck9 at swbell.net
Sun Apr 6 20:16:27 EDT 2008


These patches provide a new and more efficient implementation of
osrfHash, using a hash table for random lookups and a doubly linked
list for iterations.

It is more efficient for two main reasons.  First, for iterations
it doesn't maintain a separate copy of the keys, that then have to go 
through the hashing algorithm.  Second, when updating an existing
item, it updates it in place rather than deleting it and creating 
a new one.

---------

These changes should be mostly transparent to the the existing uses
of osrfHash.  However there are some differences having to do with
sequencing.

When updating an existing item, the old implementation would first
delete it, and then add it back with different data.  As a result,
the item would move to the end of the array used for iterations.
The new implementation updates it in place, leaving it in the same
position for later iterations.

The old implementation of osrfHashKeys() found the items by
traversing the hash table.  The resulting sequence was scrambled
with respect to the sequence of insertion.  The new implementation
finds the items by traversing the linked list.  The resulting
sequence reflects the sequence of insertion.

I don't think any existing code makes any assumptions about the
sequence of iterations, or the sequence of the osrfStringArray
returned by osrfHashKeys().  If I'm mistaken, I can restore either
or both of the original behaviors with a little further tweaking
(at some cost in performance).  However I would expect that if the
sequence makes any difference at all, it is desirable to retain
the sequence of insertion.  The main alternative would be to
sequence by key, but if we want to do that we shouldn't be using
a hash table.

-----------

Another change concerns the behavior of an osrfHashIterator.  What 
happens when you delete the item to which an iterator is pointing?

In either implementation, the deleted item is no longer available,
so any attempt to access it through the iterator will fail.  That's
not surprising.  The difference is in what happens when you try
to advance the iterator.

Let's say we have four items, keyed by Alpha, Bravo, Charlie, and
Delta.  Our iterator points at Bravo.  We delete Bravo and then
advance the iterator.

With the old implementation we'll skip over Charlie and get Delta,
because Charlie now occupies Bravo's old slot in the key array.
With the new implementation we'll get Charlie, just as one might
reasonably expect.

As a result, the new implementation makes it very easy to loop
through the osrfHash and delete selected items.  Under the old
implementation such an operation is possible but more awkward.
So far as I can tell, we don't currently have any code that uses
such a loop, so this change should have no immediate impact.

Similar considerations apply to a loop that iterates through an
osrfHash and updates some or all of the items.  The old implementation
would confuse such a loop by continually rearranging the sequence
along the way.

There is, however, a potential impact on the performance of hashed
searches.  Because a deleted item is deleted only logically rather
than physically, subsequent searches may have skip over it.  This
penalty will be very minor, and is not likely to amount to much
except under extreme circumstances -- lots of hashed searches where
there have been lots of deletions.

--------------

When the hash table has fewer than six items, the new osrfHashGet()
searches the linked list rather than using the hashing algorithm,
in the hope that it will be a bit faster that way.  I haven't tried
to justify this choice by any careful benchmarking, because the
results would be data-dependent.  The longer the key, the longer
the hashing will take.  But a linear search with strcmp() will be
relatively insensitive to key length, depending on how many 
characters strcmp() has to look at before it finds a mismatch.

---------------

I include a test driver program that exercises most of the osrfHash
functionality.  It works with either the old implementation or the
new one, but gives different results accordingly as described above.
In particular, the test driver runs an iterator loop to delete
selected items.  This loop doesn't work correctly with the old
implementation.

I have tried to be fairly thorough in my testing, but for a rewrite
like this it's hard for me to be sure that I've covered all the
possibilities.  I trust that Bill and Mike will do their own testing
before committing.

--------------

I prepared these patches using a fresh download of the svn trunk --
a download that does not reflect the patches I sent yesterday, which
have not been applied.  So these patches include a mixture of
changes.

If you prefer, I can wait till you apply yesterday's patches and
create new patches then.  Or I can send you entire files instead of
patches.  Whatever is least troublesome for you.

Scott McKellar
http://home.swbell.net/mck9/ct/

Developer's Certificate of Origin 1.1 By making a contribution to
this project, I certify that:

(a) The contribution was created in whole or in part by me and I
have the right to submit it under the open source license indicated
in the file; or

(b) The contribution is based upon previous work that, to the best
of my knowledge, is covered under an appropriate open source license
and I have the right under that license to submit that work with
modifications, whether created in whole or in part by me, under the
same open source license (unless I am permitted to submit under a
different license), as indicated in the file; or

(c) The contribution was provided directly to me by some other person
who certified (a), (b) or (c) and I have not modified it; and

(d) In the case of each of (a), (b), or (c), I understand and agree
that this project and the contribution are public and that a record
of the contribution (including all personal information I submit
with it, including my sign-off) is maintained indefinitely and may
be redistributed consistent with this project or the open source
license indicated in the file.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: osrf_hash_h_5.patch
Type: text/x-patch
Size: 2041 bytes
Desc: 465638889-osrf_hash_h_5.patch
Url : http://list.georgialibraries.org/pipermail/open-ils-dev/attachments/20080406/d31c934c/osrf_hash_h_5-0001.bin
-------------- next part --------------
A non-text attachment was scrubbed...
Name: osrf_hash_c_5.patch
Type: text/x-patch
Size: 15062 bytes
Desc: 397513661-osrf_hash_c_5.patch
Url : http://list.georgialibraries.org/pipermail/open-ils-dev/attachments/20080406/d31c934c/osrf_hash_c_5-0001.bin
-------------- next part --------------
A non-text attachment was scrubbed...
Name: testhash.c
Type: text/x-csrc
Size: 6510 bytes
Desc: 2553492959-testhash.c
Url : http://list.georgialibraries.org/pipermail/open-ils-dev/attachments/20080406/d31c934c/testhash-0001.bin


More information about the Open-ils-dev mailing list