[OPEN-ILS-GENERAL] Programmatic Merging of Bibliographic Records

Justin Hopkins 📖 justin at mobiusconsortium.org
Tue Apr 26 15:57:15 EDT 2016


Seems like a lively party, so I'll join in.

I think all the matchpoints are bad. Ultimately we used many different
factors to arrive at a sort of "match score". This seemed to be a good
approach. Obviously, we can't plan for all the possible variations and
inconsistencies in MARC, but we can do a pretty good job of
quantifying the similarities. Our work on this was, I think, a good
start. I'd love for someone with more experience in this sort of
analysis to come along and really dig deep into this process.

We've made tremendous progress with the output of Blake's scripting
and the work of a single part-time cataloger going over the "needs
humans" spreadsheets. I'd urge anyone and everyone who is having a
problem with duplicate records to take a look at this. We'd love your
feedback. At some point, we'd like to pursue a more official solution
with the ultimate end of something like what Jason just proposed: a
staff client interface for discovery, comparison, and resolution of
duplicate records.

Justin

On Tue, Apr 26, 2016 at 2:37 PM, Jason Etheridge <jason at esilibrary.com> wrote:
>> We liked your fingerprinting idea. We expanded it a bit:
>
> Awesome.
>
> There was another idea we had (and implemented) back when I worked for
> PINES, though I don't know how worthwhile it is these days:
>
> A dedupe interface that can allow/expedite user processing of proposed
> merges from algorithms similar to the ones mentioned.
>
> Imagine the merge record function with EG's record buckets, but you
> can choose whether to merge or skip a given grouping, and then cycle
> automatically to the next grouping.
>
> And imagine you being able to divvy up and parallelize the work
> amongst multiple catalogers.
>
> I liked it.
>
> --
> Jason Etheridge
> | Community and Migration Manager
> | Equinox Software - Open Your Library
> | 1-877-OPEN-ILS (673-6457)
> | jason at esilibrary.com
> | http://www.esilibrary.com



-- 
Regards,
Justin


More information about the Open-ils-general mailing list