[OPEN-ILS-DEV] Deduping, 856's and located URIs

Steve Wills swills at beyond-print.com
Fri Dec 7 13:53:33 EST 2012


Thanks for that, Galen. 

Balsam has just started a DeDup project as well and I certainly plan to ingest every bit of standing code I can find prior to writing something new. In our case one of the criteria that needs to be taken into consideration is bre.owner since the consortium has agreed that there is likely a quality hierarchy in the mix. We also have our fair share of ebooks using the $9 subfield and I am looking forward to having multiple libraries serving eResources that reference a common bib record.

Like Justin, I'll take all the suggestions I can get going forward.

Steve Wills
Balsam Consortium
"Which 9xx field stores 'simple differences of opinion' again?"



-----Original Message-----
From: Galen Charlton [mailto:gmc at esilibrary.com]
Sent: Friday, December 7, 2012 11:51 AM
To: 'Evergreen Development Discussion List'
Subject: Re: [OPEN-ILS-DEV] Deduping, 856's and located URIs

Hi,
On Thu, Dec 6, 2012 at 1:12 PM, Justin Hopkins <justin at mobiusconsortium.org> wrote:There are other things that come to mind but this is what's really holding us up. We're starting to get a whole lot of duplicates in Missouri Evergreen so we need to dedupe but since we do have quite a few electronic resources with $9's we can't afford to clobber them.

I'd appreciate any suggestions.



I hope this doesn't come across as flippant, but the sclends_dedupe.sql script is GPL2+ software and can be modified like any other, either by yourself or somebody else. In fact, anybody considering doing a large-scale deduplication with it *should* plan on taking a close look at it. In particular, the record matching and record quality criteria are not set in stone; a consortium could easily come up with a different ranking based on the quality of their members' cataloging, their tolerance for dealing with mismatches, and simply differences of opinion about cataloging. For example, if you want to include the 245$h in the normalized title, edit the norm_title() function. If a particularly library is acknowledged to have consistently high-quality cataloging, adjust the get_quality() routine to give their records a bump.


In order to do field-level merging as with your 856 field example, one avenue to explore is using the vandelay.merge_record_xml() function that comes with Evergreen.


The code in the migration-tools repository as a whole is GPL2+, and patches are welcome.


Regards,

Galen
-- 
Galen Charlton
Director of Implementation
Equinox Software, Inc. / The Open Source Experts
email: gmc at esilibrary.com
direct: +1 770-709-5581
cell: +1 404-984-4366
skype: gmcharlt
web: http://www.esilibrary.com/
Supporting Koha and Evergreen: http://koha-community.org & http://evergreen-ils.org



-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://libmail.georgialibraries.org/pipermail/open-ils-dev/attachments/20121207/ebe79c4d/attachment.htm>


More information about the Open-ils-dev mailing list