[OPEN-ILS-GENERAL] Relevance ranking

Kathy Lussier klussier at masslnc.org
Thu Feb 9 10:56:47 EST 2012


Thanks Mike!
 
>>What does removing #CD_uniqueWords do for you?

Thank you! This tweak seemed to make a big difference. My search for "dogs"
no longer retrieves a bunch of records for puppets in my first page of
results. 

For the logs, in case anyone else experiences this problem, I adjusted CD
modifiers in the opensrf.xml file so that it now just says:

<default_CD_modifiers>#CD_meanHarmonic</default_CD_modifiers>


>>So, the problem (which may be obvious) is that shorter records (as
>>brief records would be) provide a better match in terms of cover
>>density, all else being equal.  There's nothing in there today to mark
>>a record as brief per se (that is, "fast add" or the like) but that is
>>one use for the bib source field.  There are probably may solutions to
>>this problem, but I think an effective, though relatively large,
>>hammer would be to sort any records that have a source above any
>>records that don't.  Then, adjust ranking based on the quality field
>>from the source -- perhaps with an over-under threshold value that
>>provides a stark dividing line between "good" and "bad" sources.
>>
>>This would all be development, of course, though not particularly
>>intensive.


It would be interesting to look at other approaches since I'm guessing the
tweak I made to push brief records lower in the list might negatively impact
relevancy among other records. However, this suggested change fixes just one
specific issue, and I would love to consider changes that improve the
overall relevance of search results. For example, if I remember correctly,
we previously could tweak settings in the search.relevance_ranking so that
words appearing in a title or subject could be given more weight than those
appearing elsewhere in a record. Reintroducing that functionality (without
negatively impacting search performance) would be a big step forward in
returning more relevant results.

Early in the MassLNC project, we had discussed the idea of incorporating
popularity metrics into the relevance ranking (e.g. records get a higher
boost if they have more copies, high circ counts, etc.). I was reminded of
this idea recently when reading this blog post -
http://www.evilreads.com/blog/why-barnes-and-noble-is-doomed-in-one-screensh
ot.html - about the shortcomings of Barnes and Noble's search algorithm. If
Evergreen were able to boost a record in the search results based on
popularity, it would not only improve our specific issue with the brief
records, but would also help out the patron who is searching "Lincoln" to
find that book she just heard about on tv. 

Obviously, this would take development, but is it realistically feasible to
incorporate popularity metrics in the ranking without negatively impacting
search speeds? 

Thanks for your thoughts on this!

Kathy


>>-----Original Message-----
>>From: open-ils-general-bounces at list.georgialibraries.org [mailto:open-
>>ils-general-bounces at list.georgialibraries.org] On Behalf Of Mike
>>Rylander
>>Sent: Monday, January 30, 2012 3:20 PM
>>To: Evergreen Discussion Group
>>Subject: Re: [OPEN-ILS-GENERAL] Relevance ranking
>>
>>On Wed, Jan 25, 2012 at 11:47 AM, Kathy Lussier <klussier at masslnc.org>
>>wrote:
>>> Hi all,
>>>
>>> Can anyone provide some insight as to how we might be able to make
>>some
>>> tweaks to the relevance ranking using the new rank_cd() method? We
>>have
>>> noticed an issue in one of our systems with 2.2 alpha1 where brief
>>bib
>>> records tend to appear first in a list of search results. After
>>reading
>>> through the documentation in the opensrf.xml file (BTW - I love the
>>level of
>>> documentation that is provided here!), I removed #CD_documentLength
>>from the
>>> list of default CD modifiers so that it now reads:
>>>
>>> <default_CD_modifiers>#CD_meanHarmonic
>>> #CD_uniqueWords</default_CD_modifiers>
>>>
>>
>>What does removing #CD_uniqueWords do for you?
>>
>>> However, those brief bib records still keep floating to the top of
>>search
>>> results. I understand why you might want to reduce the relevancy for
>>longer
>>> documents when you are doing full-text searches, but I'm not sure the
>>same
>>> reasoning should be applied when searching metadata in MARC records.
>>>
>>> Has anyone else encountered similar problems with brief bib records?
>> Has
>>> anyone made successful tweaks to the relevance ranking that they
>>would be
>>> willing to share with the list?
>>
>>So, the problem (which may be obvious) is that shorter records (as
>>brief records would be) provide a better match in terms of cover
>>density, all else being equal.  There's nothing in there today to mark
>>a record as brief per se (that is, "fast add" or the like) but that is
>>one use for the bib source field.  There are probably may solutions to
>>this problem, but I think an effective, though relatively large,
>>hammer would be to sort any records that have a source above any
>>records that don't.  Then, adjust ranking based on the quality field
>>from the source -- perhaps with an over-under threshold value that
>>provides a stark dividing line between "good" and "bad" sources.
>>
>>This would all be development, of course, though not particularly
>>intensive.
>>
>>Aside: I just experimented with a subtle approach of adjusting the
>>rank by the 1-(1/logN) of the quality of the bib source, if set ... it
>>looks promising as a fine tuning tool, but not useful in this context.
>>
>>--
>>Mike Rylander
>> | Director of Research and Development
>> | Equinox Software, Inc. / Your Library's Guide to Open Source
>> | phone:  1-877-OPEN-ILS (673-6457)
>> | email:  miker at esilibrary.com
>> | web:  http://www.esilibrary.com



More information about the Open-ils-general mailing list