[OPEN-ILS-GENERAL] Relevance ranking

Fri Apr 6 11:28:33 EDT 2012

On Thu, Feb 9, 2012 at 11:27 AM, Mike Rylander <mrylander at gmail.com> wrote:
>
> On Feb 9, 2012 10:57 AM, "Kathy Lussier" <klussier at masslnc.org> wrote:

<snip>

>> Early in the MassLNC project, we had discussed the idea of incorporating
>> popularity metrics into the relevance ranking (e.g. records get a higher
>> boost if they have more copies, high circ counts, etc.). I was reminded of
>> this idea recently when reading this blog post -
>>
>> http://www.evilreads.com/blog/why-barnes-and-noble-is-doomed-in-one-screensh
>> ot.html - about the shortcomings of Barnes and Noble's search algorithm.
>> If
>> Evergreen were able to boost a record in the search results based on
>> popularity, it would not only improve our specific issue with the brief
>> records, but would also help out the patron who is searching "Lincoln" to
>> find that book she just heard about on tv.
>>
>> Obviously, this would take development, but is it realistically feasible
>> to
>> incorporate popularity metrics in the ranking without negatively impacting
>> search speeds?
>>
>
> If it's treated as a nightly job that creates relative scoring adjustment
> per record, say, then it is entirely doable.
>
>> Thanks for your thoughts on this!

To follow up on this, I guess a very coarse approach on Mike's nightly
job idea would be to create a new table - biblio.record_relevancy -
with two columns: record BIGINT, score INT.

"score" would be determined by whatever factors a given site wanted to
use, and could be updated on a regular basis. The score would boost
the relevancy of the record in search results.

The simplest score would be a stock calculation based on something
like how many circs in the past X interval, how many current holds are
on the record, and how many copies a given record has. That algorithm
would probably best be built as a database function that just gets
invoked on a nightly basis. (Lots of room for amusing results if, say,
a site has cataloged all of their serials as individual copies of the
same record, or for encyclopedia with umpteen volumes, but with real
serials and bib parts these days that will hopefully become less of an
issue!)

A site could also use a script to read a set of ISBNs from one or more
best-seller lists or local promotional lists and bump the scores for
matching records in accordingly. On further thought, switch up "ISBN"
for "identifier" because ISSN, UPC, etc play their part for different
formats. (Perhaps an opportunity to build on the existing
record-matching logic?)

On further further thought, maybe biblio.record_relevancy would also
need a third column to identify which part of the org unit hierarchy
should be associated with a given relevancy boost. For example, if our
library is running a promotion for Sudbury's bicentennial, that
shouldn't affect relevancy for Windsor's libraries :)

A few well-known dangers:
  * Getting a set of search results that are all currently unavailable.
  * The "deadly scent trail" that rewards the mainstream and limits
serendipitous discovery of lesser-used resources.

However, there's a lot to be said for getting the search result you
meant, even if it's currently unavailable.

As an aside, it would also be useful to build in a "others who
borrowed this also borrowed this..." recommendation engine to broaden
the discovery path slightly. I participated in a hackfest to build a
basic recommendation engine at a provincial scale a few months back,
but it would be pretty straightforward to build a service like this
into Evergreen.