[OPEN-ILS-GENERAL] Improving relevance ranking in Evergreen

Kathy Lussier klussier at masslnc.org
Wed Mar 7 14:57:41 EST 2012


Hi Mike,

>>To be clear, weighting hits that come from different index definitions
>>has always been possible.  2.2 will have a staff client interface to
>>make it easier, but the capability has been there all along.

Is this staff client interface already available in master? If so, can you
give me a little more information on how this is done?

Thanks!
Kathy
 
 

>>-----Original Message-----
>>From: open-ils-general-bounces at list.georgialibraries.org [mailto:open-
>>ils-general-bounces at list.georgialibraries.org] On Behalf Of Mike
>>Rylander
>>Sent: Wednesday, March 07, 2012 10:11 AM
>>To: Evergreen Discussion Group
>>Subject: Re: [OPEN-ILS-GENERAL] Improving relevance ranking in
>>Evergreen
>>
>>On Wed, Mar 7, 2012 at 8:35 AM, Hardy, Elaine
>><ehardy at georgialibraries.org> wrote:
>>> Kathy,
>>>
>>> While the relevance display is much improved in 2.x, it would be good
>>to
>>> have greater relevance given, in a keyword search, to title
>>(specifically
>>> the 245)and then subject fields. I also see where having a popularity
>>> ranking might be beneficial.
>>>
>>> I just had to explain to a board member of one of our libraries why
>>his
>>> search for John Sandford turned up children's titles first. So having
>>MARC
>>> field 100s ranked higher than 700 in author searches would be
>>beneficial
>>> as well.
>>>
>>
>>To be clear, weighting hits that come from different index definitions
>>has always been possible.  2.2 will have a staff client interface to
>>make it easier, but the capability has been there all along.
>>
>>Weighting different parts of one indexed term -- say, weighting the
>>title embedded in the keyword blob higher than the subjects embedded
>>in the same blob -- would require the above-mentioned "make use of
>>tsearch class weighting".  But one can approximate that today by
>>duplicating the index definitions from, say, title, author and subject
>>classes within the keyword class.
>>
>>--
>>Mike Rylander
>> | Director of Research and Development
>> | Equinox Software, Inc. / Your Library's Guide to Open Source
>> | phone:  1-877-OPEN-ILS (673-6457)
>> | email:  miker at esilibrary.com
>> | web:  http://www.esilibrary.com
>>
>>
>>> I can't comment on any of the coding possibilities other than to say
>>which
>>> every way doesn't negatively impact search return time is preferable.
>>>
>>> Elaine
>>>
>>>
>>> J. Elaine Hardy
>>> PINES Bibliographic Projects and Metadata Manager
>>> Georgia Public Library Service,
>>> A Unit of the University System of Georgia
>>> 1800 Century Place, Suite 150
>>> Atlanta, Ga. 30345-4304
>>> 404.235-7128
>>> 404.235-7201, fax
>>>
>>> ehardy at georgialibraries.org
>>> www.georgialibraries.org
>>> http://www.georgialibraries.org/pines/
>>>
>>>
>>> -----Original Message-----
>>> From: open-ils-general-bounces at list.georgialibraries.org
>>> [mailto:open-ils-general-bounces at list.georgialibraries.org] On Behalf
>>Of
>>> Kathy Lussier
>>> Sent: Tuesday, March 06, 2012 4:43 PM
>>> To: 'Evergreen Discussion Group'
>>> Subject: [OPEN-ILS-GENERAL] Improving relevance ranking in Evergreen
>>>
>>> Hi all,
>>>
>>> I mentioned this during an e-mail discussion on the list last month,
>>but I
>>> just wanted to hear from others in the Evergreen community about
>>whether
>>> there is a desire to improve the relevance ranking for search results
>>in
>>> Evergreen. Currently, we can tweak relevancy in the opensrf.xml, and
>>it
>>> can look at things like the document length, word proximity, and
>>unique
>>> word count. We've found that we had to remove the modifiers for
>>document
>>> length and unique word count to prevent a problem where brief bib
>>records
>>> were ranked way too high in our search results.
>>>
>>> In our local discussions, we've thought the following enhancements
>>could
>>> improve the ranking of search results:
>>>
>>> * Giving greater weight to a record if the search terms appear in the
>>> title or subject (ideally, we would like these field to be
>>configurable.)
>>> This is something that is tweakable in search.relevance_ranking, but
>>my
>>> understanding is that the use of these tweaks results in a major
>>reduction
>>> in search performance.
>>>
>>> * Using some type of popularity metric to boost relevancy for popular
>>> titles. I'm not sure what this metric should be (number of copies
>>attached
>>> to record? Total circs in last x months? Total current circs?), but
>>we
>>> believe some type of popularity measure would be particularly helpful
>>in a
>>> public library where searches will often be for titles that are
>>popular.
>>> For example, a search for "twilight" will most likely be for the
>>Stephanie
>>> Meyers novel and not this
>>> http://books.google.com/books/about/Twilight.html?id=zEhkpXCyGzIC.
>>Mike
>>> Rylander had indicated in a previous e-mail
>>> (http://markmail.org/message/h6u5r3sy4nr36wsl) that we might be able
>>to
>>> handle this through an overnight cron job without a negative impact
>>on
>>> search speeds.
>>>
>>> Do others think these two enhancements would improve the search
>>results in
>>> Evergreen? Do you think there are other things we could do to improve
>>> relevancy? My main concern would be that any changes might slow down
>>> search speeds, and I would want to make sure that we could do
>>something to
>>> retrieve better search results without a slowdown.
>>>
>>> Also, I was wondering if this type of project might be a good
>>candidate
>>> for a Google Summer of Code project.
>>>
>>> I look forward to hearing your feedback!
>>>
>>> Kathy
>>>
>>> -------------------------------------------------------------
>>> Kathy Lussier
>>> Project Coordinator
>>> Massachusetts Library Network Cooperative
>>> (508) 756-0172
>>> (508) 755-3721 (fax)
>>> klussier at masslnc.org
>>> IM: kmlussier (AOL & Yahoo)
>>> Twitter: http://www.twitter.com/kmlussier
>>>
>>>
>>>
>>>



More information about the Open-ils-general mailing list