SPAM: Re: [OPEN-ILS-DEV] osrfHash: alternative implementation

Fri Mar 28 14:10:32 EDT 2008

--- Mike Rylander <mrylander at gmail.com> wrote:

> On Mon, Mar 24, 2008 at 8:53 AM, Bill Erickson
> <erickson at esilibrary.com> wrote:
> > Scott McKellar wrote:

<snip>

> I can't get too worked up over hash-key ordering ... associative
> arrays don't have a guaranteed  order in other implementation that I
> know of.

True enough.  However an osrfHash is not purely an associative array.
It's a hybrid.

Consider jsonObjectToJSON(), which translates a jsonObject into text
in JSON format.  Internally, when it outputs a jsonHash, it allocates
an osrfHashIterator and loops through the osrfHash, formatting each
of the sub-objects recursively.  The iterator returns the sub-objects 
in the same sequence that they were originally added.

As a result: if you load JSON text into a jsonObject, and then spit
it out again as text, you can expect to get exactly the same JSON
that you originally read, apart from possible differences in white
space between tokens.

If the iterator didn't preserve the original sequence, then the output
would be unpredictably scrambled relative to the input.  Maybe that
wouldn't matter -- but maybe it would, depending on the application.

I would argue that sequence preservation is a desirable feature, 
even if it isn't essential, and especially if you can get it at no 
extra cost.  Both the existing implementation and my proposed design
preserve the sequence automatically.

<snip>

> >  My feeling is that the ability to change a hash during iteration
> is a
> >  bad thing and retaining sort order on the hash keys is not
> necessary.
> >  If that simplifies your test implementation, great.  

> I've been mulling this over for a couple days, and I've got to
> disagree with Bill.  IFF mid-iteration changes (removal, in
> particular, of a key) can be handled transparently and safely, and
> that guarantee can be carved into the requirements for the API, then
> I
> think it is a decidedly /good/ thing.

Removal is the scenario I had in mind as well.

For an artificial example: suppose your task is to remove all the
entries whose keys start with a 'W'.  It would be simplest and most
convenient to examine each item in turn and delete it if it starts
with a 'W'.  However if deletion invalidates the iterator, things
would get very awkward.  You couldn't delete something until you 
had advanced the iterator past it.  Doable, but clumsy.

Currently, in order to remove something from a hash, you have to
provide the key.  Then osrfHashRemove expands the key, hashes it,
and searches the hash table to find the right item.

That's a lot of wasted effort if you already have the right item
in hand.  I plan to provide an additional removal function that takes
an osrfHashIterator* as its argument.  Since it wouldn't have to go
through the process of finding the item from scratch, it could do
the deletion very efficiently -- as long as it doesn't have to
delete the object physically.  At least in my current design, the
iterator doesn't know where the item is in the hash table.  It
would have to do a hashed search to find it, and we're back where
we started.

When we use this function to delete an item, we need to decide what
to do with the iterator afterwards.  If we leave the now-deleted
item in place, having deleted it logically but not physically, we 
can leave the iterator pointing to it.  The next time we advance
or reset the iterator, the dead item becomes inaccessible.

The other options are: repoint it to the next node, or to the 
previous node, or reset it, or wag our fingers and say "Don't you 
dare use this iterator again."  I think that any of these options is
more likely to invite bugs in the client code than leaving the 
iterator pointing to a logically deleted node.  We can't do that
if we delete the item physically.

Scott McKellar
http://home.swbell.net/mck9/ct/