[OPEN-ILS-GENERAL] Call numbers in Evergreen

Dan Wells dbw2 at calvin.edu
Thu Jul 29 10:11:56 EDT 2010


At some point we will likely be generating 'keys' for sorting our call numbers.  For any reading not familiar with the idea, you simply add extra characters to pad the call number in the right places to allow it to sort properly using a simple and speedy ascii sort.  For example, in LC:

PS35.A54 A6 V.1

could become:

PS000035.A54 A6 V.000001

In this case I am padding values which much sort numerically out to six digits (an arbitrary width for this example, not sure what the largest space we need would be) while keeping everything else the same.  Now any time we need sorted call numbers, we sort on the keys but show the real thing.

So, what about adding another scheme to the mix, like SUDOC?  That might key as:

C 1.2:C 3

becoming:

$C........... 000000000001.000000000002############$C........... 000000000003

(could be way off there, but bear with me)  SUDOCs are a pain because letter groups and number groups are arbitrary in length and position, and letters sort *before* numbers, and colons are somehow special.

Anyway, assuming the keys work, we now have proper sorting within a group.  But what about between the two groups?  Well, we have a few options.  First, we can just sort the keys as-is, ignoring their wildly different formats.  Since sorting between schemes is indeterminate anyway, that seems fair enough, and will be at least predictable.  Second, we could predetermine a single character prefix for each scheme.  If we want all LC numbers to sort before all SUDOC numbers, we could simply prefix all the LC keys with 'a' and all the SUDOC keys with 'b'.  Since these will be static, they will not affect sorting within the class, but only between classes.  Third, we could try to be a little more crafty and have the classes at least interfile by first letter.  To do this, we could add an otherwise meaningless '$' to the start of every LC call number key to accommodate the '$' we added to the SUDOC (which itself was added to force 'letter-groups' before 'number-groups').  This keeps 'B's with 'B's, 'C's with 'C's, and so on.  If there are cases where it doesn't work out, at least we tried, and sorting within each group is still sound.  (Also, I am not certain if SUDOCs can even start with a number, or if they always start with a letter.  If so, we could instead leave out the initial '$' in the SUDOC key.)

Just my two cents,
Dan


>>> On 7/28/2010 at 10:14 AM, "Kathy Lussier" <klussier at masslnc.org> wrote:

>> 
>> Figuring out how to create sort keys that sort "correctly" across mixed
>> sets of classification schemes, though - for some reason that seems
>> hard. Could just be too late for my brain to work properly.
> 
> [KL] When I was reviewing the previous discussion about LC sorting, Jason
> Etheridge had suggested the possibility of nesting sort algorithms -
> http://markmail.org/message/xxbfp63g3yzeqxqa. A possible configuration could
> be:
> 
> Sort #1 -> LCCN 
> Sort #2 -> DDS
> Sort #3 -> Default / ASCIIbetically
> 
> I think this is what we have in mind for sorting across schemes.
> 



More information about the Open-ils-general mailing list