[OPEN-ILS-GENERAL] dewey call number normalization

Hardy, Elaine ehardy at georgialibraries.org
Fri Apr 25 15:50:15 EDT 2014


Dewey cutters shouldn't have decimal points preceding them.

 

Many publics use just the first few letters of the authors name for the
cutter and often don't use a year designation when they build a DDC call
number. However, just like in LCC, you can generate alpha-numeric cutters
using the first letter of the authors name and then numbers to represent
the rest. To do so, catalogers use either the Cutter Four-Figure Table or
the Cutter-Sanborn Four-Figure Table. (you can find cutter generators
online. OCLC has a good one). Couple that with pub year in the DDC number
and you can create a unique call number as you do in LCC.

 

I suggest removing those unneeded decimals before the cutter before you
address the other sort issue Ben identified.  

 

Elaine

 

J. Elaine Hardy

PINES & Collaborative Projects Manager

Georgia Public Library Service

1800 Century Place, Ste 150

Atlanta, Ga. 30345-4304

 

404.235-7128

404.235-7201, fax

ehardy at georgialibraries.org

www.georgialibraries.org

www.georgialibraries.org/pines

 

 

-----Original Message-----
From: open-ils-general-bounces at list.georgialibraries.org
[mailto:open-ils-general-bounces at list.georgialibraries.org] On Behalf Of
Dan Wells
Sent: Friday, April 25, 2014 1:36 PM
To: Evergreen Discussion Group
Subject: Re: [OPEN-ILS-GENERAL] dewey call number normalization

 

Okay, looking more carefully at the sortkey Ben pointed out, you really do
have two different problems affecting your sort.  Sorry for focusing on
the smaller one!

 

Everything I advocated earlier still stands, but in the meantime, we do
need to fix the misplaced padding in the 'no decimal but we have a year'
case.

 

Dan

 

 

Daniel Wells

Library Programmer/Analyst

Hekman Library, Calvin College

616.526.7133

 

-----Original Message-----

From:  <mailto:open-ils-general-bounces at list.georgialibraries.org>
open-ils-general-bounces at list.georgialibraries.org [
<mailto:open-ils-general-bounces at list.georgialibraries.org>
mailto:open-ils-general-bounces at list.georgialibraries.org] On Behalf Of
Dan Wells

Sent: Friday, April 25, 2014 10:15 AM

To: Evergreen Discussion Group

Subject: Re: [OPEN-ILS-GENERAL] dewey call number normalization

 

Hello Paul,

 

You've pretty much nailed your problem as being the extra decimals before
your Cutters.  While that's normal for an LC call number, I've looked
around and found nothing to make me believe that's common or accepted
practice for DDC.

 

That said, as far as I can tell, the "standard" format for Dewey is:

 

[Dewey Decimal Number] [Whatever else you want to make it unique]

 

In my experience, the second part is *usually* the first few letters of
the author's last name, or a cutter-ized version of the same.  Can anyone
point to an authoritative source on how to build the non-DDC part of the
call number?  It would be a great help if we could at least reference
something and say "this is what our normalizer supports."

 

Naturally, if we can cook up a normalizer that works 100% with our agreed
upon form (whatever that might be), yet also make it flexible enough to
accommodate variances, we absolutely should do that.  I also think the
code you included here is on the right track for being more flexible.
Still, our first step must be to establish a canonical support format
before we consider any code to handle exceptions.

 

Thanks,

Dan

 

 

Daniel Wells

Library Programmer/Analyst

Hekman Library, Calvin College

616.526.7133

 

-----Original Message-----

From:  <mailto:open-ils-general-bounces at list.georgialibraries.org>
open-ils-general-bounces at list.georgialibraries.org [
<mailto:open-ils-general-bounces at list.georgialibraries.org>
mailto:open-ils-general-bounces at list.georgialibraries.org] On Behalf Of
Paul Hoffman

Sent: Friday, April 25, 2014 4:45 AM

To: Evergreen Discussion Group

Subject: Re: [OPEN-ILS-GENERAL] dewey call number normalization

 

On Thu, Apr 24, 2014 at 05:15:35PM -0400, Ben Shum wrote:

> This will be a slightly more technical answer that may require some 

> direct database access to ascertain more details.

> 

> You mention that the call numbers are identified as DDC.  So that's 

> label_class of 2, I believe.  We are using Dewey (DDC) for all of our 

> materials by default as well in our consortium.

> 

> I'd be curious to know what the label_sortkey values were for those 

> call numbers you mention.  That field is what actually drives the 

> sorting values for a given set.

 

Here's what our DB shows (Adam and I work together):

 

SELECT   label_class, label, label_sortkey

FROM     asset.call_number

WHERE    label_sortkey like '720%'

ORDER BY label_sortkey;

 

label_class |      label      |         label_sortkey         

-------------+-----------------+-------------------------------

           1 | 720 H47 1979    | 720 H47 1979

           2 | 720 a           | 720_000000000000000_A

           2 | 720 .H47        | 720_000000000000000__H47

           2 | 720.1 H74 1979  | 720_100000000000000_H74_1979

           2 | 720.1 .H47 1980 | 720_100000000000000__H47_1980

           2 | 720.1 .H74 1979 | 720_100000000000000__H74_1979

           2 | 720 H74 1979    | 720_H74_197900000000000

           2 | 720 .H47 1980   | 720__H47_198000000000000

           2 | 720 .H47 1980   | 720__H47_198000000000000

           2 | 720 .H74 1979   | 720__H74_197900000000000

(10 rows)

 

So the problem appears to be caused by the periods that sometimes occur
before the Cutter number.  I don't know if that's kosher or not, but I can
see that it occurs plenty in our (Voyager) catalog.

 

Looking at the function asset.label_normalizer_dewey it seems to me that
it can be done much more simply and efficiently if you leverage the fact
that space (ASCII 32) and tilde (ASCII 126) come before and after
(respectively) anything else meaningful that might be found in a call
number.  Except periods, which complicate things.  Anyhow, here's a first
stab at it:

 

use strict;

use warnings;

sub ddcnorm {

    local $_ = uc shift;

    # Strip leading or trailing space and any slashes or apostrophes

    s/^\s+|\s+$|[\/']//g;

    # Insert a space at digit/non-digit boundaries

    s/(?<=[0-9])(?=[^0-9])|(?<=[^0-9])(?=[0-9])/ /g;

    # Replace some punctuation with a space

    tr/-/ /;  # XXX What else?

    # Strip extra junk -- XXX make this work on non-ASCII call numbers

    tr/A-Za-z0-9. //cd;

    s/ \. /~/g;

    s/ \.|\. / /g;

    tr/ //s;

    return $_;

}

 

When I run our Deweys in the 720s through this, I get what seems to be the
right order:

 

           2 | 720 a           | 720 A

           2 | 720 .H47        | 720 H 47

           2 | 720 .H47 1980   | 720 H 47 1980

           2 | 720 .H47 1980   | 720 H 47 1980

           2 | 720 .H74 1979   | 720 H 74 1979

           2 | 720 H74 1979    | 720 H 74 1979

           2 | 720.1 .H47 1980 | 720~1 H 47 1980

           2 | 720.1 .H74 1979 | 720~1 H 74 1979

           2 | 720.1 H74 1979  | 720~1 H 74 1979

 

If there's any interest, I'll run our entire set of Deweys through it and
see if I can make sense of the results.  Hmm... should prefixes like "j"
or "C"

(juvenile or Canadian) be ignored?

 

Paul.

 

> On Thu, Apr 24, 2014 at 3:32 PM, Adam Shire < <mailto:adam at flo.org>
adam at flo.org> wrote:

> 

> > Hi Everyone,

> >

> > We are testing in evergreen 2.5.2

> >

> > I'm noticing what I think looks like incorrect behavior when using 

> > the call number browse feature.

> >

> > Doing a call number browse search for 720 results in the following 

> > call number sort order:

> >

> > 720 H47 1979

> > 720 .H47

> > 720.1 H74 1979

> > 720.1 .H47 1980

> > 720.1 .H74 1979

> > 720 H74 1979

> > 720 .H74 1979

> >

> >

> > It looks like the decimal point might be throwing things off. I 

> > think that should be taken care of in a normalizer, but maybe there 

> > is a reason not to. I think the 720.1's should come at the end of 

> > this list, regardless of the decimal point before the cutter.

> >

> > All of the call numbers are identified as DDC.

> >

> > you can probably replicate this here

> >  <http://emerson.eg.flo.org/eg/opac/cnbrowse?cn=715&locg=2>
http://emerson.eg.flo.org/eg/opac/cnbrowse?cn=715&locg=2

> >

> >

> > I didn't see any bug reports that seemed to address this specific 

> > issue, so I'm wondering if there could be something else causing this
behavior.

> >

> > thanks,

> > Adam

> >

> > --

> >

> > Adam Shire

> > Member Services Librarian

> > Fenway Libraries Online < <http://flo.org> http://flo.org>

> > 617-442-2384

> >

> 

> 

> 

> --

> Benjamin Shum

> Evergreen Systems Manager

> Bibliomation, Inc.

> 24 Wooster Ave.

> Waterbury, CT 06708

> 203-577-4070, ext. 113

 

--

Paul Hoffman < <mailto:paul at flo.org> paul at flo.org>

Systems Librarian

Fenway Libraries Online

c/o Wentworth Institute of Technology

550 Huntington Ave.

Boston, MA 02115

(617) 442-2384 (FLO main number)

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://libmail.georgialibraries.org/pipermail/open-ils-general/attachments/20140425/36cb9b94/attachment-0001.htm>


More information about the Open-ils-general mailing list