[OPEN-ILS-DEV] Deduping, 856's and located URIs

Galen Charlton gmc at esilibrary.com
Fri Dec 7 11:51:55 EST 2012


Hi,

On Thu, Dec 6, 2012 at 1:12 PM, Justin Hopkins
<justin at mobiusconsortium.org>wrote:
>
> There are other things that come to mind but this is what's really holding
> us up. We're starting to get a whole lot of duplicates in Missouri
> Evergreen so we need to dedupe but since we do have quite a few electronic
> resources with $9's we can't afford to clobber them.
>
> I'd appreciate any suggestions.
>

I hope this doesn't come across as flippant, but the sclends_dedupe.sql
script is GPL2+ software and can be modified like any other, either by
yourself or somebody else.  In fact, anybody considering doing a
large-scale deduplication with it *should* plan on taking a close look at
it.  In particular, the record matching and record quality criteria are not
set in stone; a consortium could easily come up with a different ranking
based on the quality of their members' cataloging, their tolerance for
dealing with mismatches, and simply differences of opinion about
cataloging.  For example, if you want to include the 245$h in the
normalized title, edit the norm_title() function.  If a particularly
library is acknowledged to have consistently high-quality cataloging,
adjust the get_quality() routine to give their records a bump.

In order to do field-level merging as with your 856 field example, one
avenue to explore is using the vandelay.merge_record_xml() function that
comes with Evergreen.

The code in the migration-tools repository as a whole is GPL2+, and patches
are welcome.

Regards,

Galen
-- 
Galen Charlton
Director of Implementation
Equinox Software, Inc. / The Open Source Experts
email:  gmc at esilibrary.com
direct: +1 770-709-5581
cell:   +1 404-984-4366
skype:  gmcharlt
web:    http://www.esilibrary.com/
Supporting Koha and Evergreen: http://koha-community.org &
http://evergreen-ils.org
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://libmail.georgialibraries.org/pipermail/open-ils-dev/attachments/20121207/21a0ab25/attachment.htm>


More information about the Open-ils-dev mailing list